Computational Ensemble Gene Co-Expression Networks for the Analysis of Cancer Biomarkers

: Gene networks have become a powerful tool for the comprehensive examination of gene expression patterns. Thanks to these networks generated by means of inference algorithms, it is possible to study different biological processes and even identify new biomarkers for such diseases. These biomarkers are essential for the discovery of new treatments for genetic diseases such as cancer. In this work, we introduce an algorithm for genetic network inference based on an ensemble method that improves the robustness of the results by combining two main steps: first, the evaluation of the relationship between pairs of genes using three different co-expression measures, and, subsequently, a voting strategy. The utility of this approach was demonstrated by applying it to a human dataset encompassing breast and prostate cancer-associated stromal cells. Two gene networks were computed using microarray data, one for breast cancer and one for prostate cancer. The results obtained revealed, on the one hand, distinct stromal cell behaviors in breast and prostate cancer and, on the other hand, a list of potential biomarkers for both diseases. In the case of breast tumor, ST6GAL2 , RIPOR3 , COL5A1 , and DEPDC7 were found, and in the case of prostate tumor, the genes were GATA6-AS1 , ARFGEF3 , PRR15L , and APBA2 . These results demonstrate the usefulness of the ensemble method in the field of biomarker discovery.


Introduction
The incidence rates of breast and prostate cancer are among the highest among women and men, respectively.The number of breast cancer cases is expected to increase from 2.3 million in 2020 to over 3 million by 2040 [1], while prostate cancer cases will double from 1.4 million to 2.8 million during the same period [2].Despite improvements in therapeutic strategies, it remains essential to increase our understanding of the mechanisms underlying cancer progression and to identify new targets for cancer therapy, such as molecules in the tumor microenvironment (TME).The TME plays a key role in tumor growth, development, and progression and includes different cell types, including stromal cells.Stromal cells and their associated components, such as the extracellular matrix (ECM), interact with cancer cells to create a permissive microenvironment that supports cancer progression and may even serve as potential biomarkers for cancer research [3][4][5][6].However, it is essential to keep in mind that the tumor stromal is not a static entity but a dynamic and constantly changing one.Therefore, to better understand the functions and mechanisms by which stromal components and their interactions with tumor cells drive cancer initiation and progression, it is necessary to identify potential biomarkers.
For the discovery of diagnostic, prognostic, and predictive biomarkers for diseases such as cancer, as well as for testing the efficacy of potential therapeutic agents, sequencing technologies such as RNA-Seq, microarrays, and single-cell sequencing have become essential tools [7].These technologies allow comprehensive analysis of the genetic profile, gene expression, and gene-gene interactions when computational (in-silico) approaches are used to analyze these data.These computational approaches, such as gene co-expression networks (GCNs), can enhance this analysis using the data generated by sequencing technologies to predict and model gene-gene interactions.This can facilitate the understanding of biological processes and help identify biomarkers involved in different diseases [8].
GCNs depict the relationship between genes using a graph, in which the vertices represent the genes and the edges represent their relationship.These relationships exhibit similar patterns of expression.For a relationship to be deemed significant, the degree of relationship between each pair of genes must surpass a minimum threshold of required expression pattern similarity.Common methods for measuring the degree of co-expression between two genes include Pearson's, Spearman's, and Kendall's correlation coefficients [9].For the reconstruction of the GCN, mutual information and other methods have also been extensively employed [10].The development of hypotheses has been aided by GCN models validated through empirical methodologies.Thus, the dependability of GCNs is supported by the subsequent experimental confirmation of numerous predicted interactions [11].As a result, algorithms and computational techniques for GCN reconstruction have gained prominence within the bioinformatics community [12].
In this context, the traditional methods for reconstructing GCNs using one of these correlation coefficients frequently have the drawback that the inference of gene-gene interactions is wholly dependent on the results of the selected correlation coefficient.For instance, the weighted gene co-expression network analysis (WGCNA) method allows the construction of a network utilizing the Pearson correlation coefficient [13], whereas Spearman is used in another GCN reconstruction method [14].It is common knowledge that these correlation coefficients have certain limitations, such as the degree of dependence on the normalization of the data in the Pearson coefficient [15] or the inability of the Spearman coefficient to obtain monotonic relationships [16].The use of a single correlation coefficient in GCN reconstruction methodologies may therefore produce unreliable results, as significant relationships may exist even if one of these correlation coefficients has a value of zero.In order to overcome these limitations, the combination of multiple correlation coefficients to measure the degree of gene-gene interaction may help to improve the reliability of the results and provide more accurate biological information by overcoming the individual limitations of the coefficients [17].
This paper presents a study for the identification of candidate biomarkers for breast and prostate cancer using an ensemble co-expression network algorithm.The expression data used were obtained from stromal tissue within the respective tumor microenvironments.The algorithm applied to infer gene networks consists of an ensemble strategy combining three widely used correlation coefficients, in order to classify gene-gene interactions.As a result, a network was generated from stromal tissue surrounding invasive primary breast and prostate tumors and compared with another network generated from non-cancerous breast and prostate stromal tissues.Cross-analysis of these networks provided significant information on the biological functions affected in both situations, helping to identify potentially novel breast and prostate carcinoma biomarkers.
The main contributions of this work can be summarized as follows: • We propose a case study based on three different widely GCN-inferred algorithms.

•
The use of the ensemble strategy derives from a more reliable inferred GCN.

•
A case study of breast and prostate cancer is presented.

•
We propose several genes as potential biomarkers for both breast and prostate cancers.
The rest of the paper is organized as follows: The main related works are presented in Section 1.The datasets and the case study used are detailed in Section 2. The results related to the generation and analysis of the GCNs are described in Section 3, while the discussion of the work is presented in Section 4. Finally, the conclusions are given in Section 5.

Related Works
GCNs have emerged as a useful instrument for data analysis and the discovery of new biomarkers.For instance, despite a high computational cost as a result of its reliance on permutation tests to determine significance [18], the WGCNA method [13] is widely used, as highlighted in studies such as Tang et al. [19] in metastatic breast cancer, identifying key modules and core genes.In addition, Zhou et al. [20], Adhami et al. [21], and Ye [22] found biomarkers associated with breast cancer progression and subtypes.In prostate cancer, WGCNA was also used by the authors in Song et al. [23], Xu et al. [24], and Liu et al. [25] to identify key genes and biomarkers for diagnosis and prognosis.
Other works related to the identification of biomarkers for breast and prostate cancer have used other methods or networks.Protein-protein interaction (PPI) networks have been used to determine that the mitotic cell cycle and the epithelial-to-mesenchymal transition pathway are associated with breast cancer progression [26].The authors Hsu et al. [14] discovered that integration between gene co-expression network analysis (GCNA) and integrated microarray analysis [27] can yield more accurate diagnoses.The ARACNE network reconstruction method [28] has also been used to identify drug targets to advance the treatment of prostate cancer patients [29].
Inferring gene-gene relationships can be hampered by the reliance on works that employ a single network reconstruction method or a single correlation coefficient.One solution to this problem is the development of new ensemble strategies.In one work, for instance, the authors used a combination of methods to infer with high confidence connections between dynamically active factors [30].Their strategy employed the following methods: PLSR [31], similarity index [32], TIGRESS [33], random forest [34], ARACNE, CLR [35], and MRNET [36].The studies conducted by Hsu et al. [14] in breast cancer and Ferraz et al. [37] in prostate cancer demonstrated that the combination of methodologies and correlation measures improved the robustness of the results.In another work, the authors chose to implement an ensemble strategy at the level of co-expression measures, as opposed to the level of methods [17].The correlation between each deferentially expressed genegene interaction was determined using Pearson's, Kendall's, and Blomqvist's correlation coefficients.This ensemble strategy reconstructed two GCNs, one corresponding to the cancer state and the other to the normal state, which could be used as a control network for further gene analysis.The EnGNet approach [38] was another work that employed an ensemble strategy to generate gene co-expression networks.The co-expression network was generated using Spearman, Kendall, and NMI correlation measurement combinations and then optimized using a greedy approach.
In summary, the preceding works highlighted the importance of GCNs in breast and prostate cancer biomarker discovery.While the use of WGCNA in breast and prostate cancer research has yielded promising results, there has been a trend towards the diversification of network analysis methodologies.The use of alternative network types and correlation coefficients increases the robustness of results, ensuring that the identified biomarkers are not artifacts of a particular method or dataset.Indeed, relying on a single network reconstruction technique or correlation coefficient can lead to biased results and make it difficult to identify gene-gene relationships with precision.Ensemble strategies, which utilize a variety of correlation methods and measurements, have emerged as a viable response to this situation.These strategies ensure that connections between dynamically active factors are inferred with a high degree of confidence, thereby contributing to the more precise identification of potential biomarkers and cancer treatment targets.

Materials and Methods
This section describes the dataset used and outlines the subsequent stages of the analysis.Specifically, the microarray dataset came from research on breast and prostate cancer.Additionally, the workflow used to carry out the study (depicted in Figure 1) is also detailed in the following subsections, as follows: First, the employed dataset is described in Section 2.1, followed by the exploratory analysis performed on the expression data in Section 2.2.Subsequently, Section 2.3 provides an explanation of the differential expression analysis; the method used for GCN reconstruction and candidate GCN selection is described in Sections 2.4 and 2.5, respectively.Finally, in Section 2.6, the analysis of GCNs is conducted.Figure 1 depicts the entire pipeline, including all the previously mentioned steps.After the exploratory analysis, a differential expression analysis was conducted, followed by GCN generation via three different co-expression measures using an inference algorithm based on a major voting strategy.Validation and selection of the best candidate GCNs for both breast and prostate cancer were performed with a Thr score .Finally, an analysis including hub identification, clustering, and enrichment analysis was conducted.

Sample Dataset
The microarray dataset used in this study was published by Planche et al. [39].It was obtained from the National Centre of Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) database with the accession number GSE26910.The gene profiling data were obtained using the Affymetrix Human Genome U133 Plus 2.0 Array (Platform GPL570).The dataset contained 24 samples, including 6 samples of stroma surrounding invasive breast primary tumors, 6 matching samples of normal stromal breast tissues, 6 samples of stroma surrounding invasive prostate primary tumors, and 6 matching samples of normal stromal prostate tissues, for a total of 20,824 genes.The breast and prostate data were provided by the authors as raw CEL files and normalized with the RMA algorithm (quantile normalization at probe level data).

Data Preprocessing and Exploratory Analysis
A preliminary exploratory analysis was carried out with the aim of obtaining an initial and detailed understanding of the data to be analyzed.This prior exploratory analysis of the expression data obtained from the dataset aimed to identify the quality of the data, detect patterns and trends, as well as determine their distribution.To carry out this analysis, the R programming language was employed.Multidimensional scaling plots (MDS) were applied to further separate examples according to their features.The R plotMDS function was used from the Limma package [40].
The next step involved examining whether the data had been correctly normalized after applying the RMA algorithm.This is crucial to obtaining accurate and reliable results in gene expression analysis because it guarantees the comparability of gene expression values between samples.The R function boxplot was used for this purpose [41].

Differential Gene Expression Analysis
Differential expression analysis is a statistical technique that enables the investigation of normalized data, to identify quantitative alterations in the expression levels between different samples.The primary objective of this task was to identify the genes that show differential expression in our sample sets.This filter is crucial to the analysis of pathologically relevant genes.The samples were grouped into two categories for analysis, male and female, resulting in two sets of differentially expressed genes (DEGs).
The Limma [40] R package was also employed to execute this task, which involved several steps.First, a design matrix was created in our study to specify the contrasts of interest.Specifically, these contrasts were defined as the comparison of normal tissue against tissue derived from cancer patients, both male and female.A linear model was fitted to the expression values of each gene, and empirical Bayes moderation was carried out by borrowing information across all the genes to obtain more precise estimates of genewise variability [42].Subsequently, genes that exhibited statistically significant changes in their expression levels, significantly upregulated or downregulated (with FC > 1 and FC < −1), were detected.This involved determining a threshold for statistical significance based on an adjusted p-value, which was set to 5% for breast stromal samples and 10% for prostate stromal samples to obtain a list of DEGs of comparable size, as explained in Planche et al. [39].
After identifying the DEGs in both prostate and breast tissues, we generated four distinct datasets from the log2-transformed DEG data.These four datasets represented the gene expression profiles of normal prostate, prostate tumor, normal breast, and breast tumor samples.
From the DEGs generated for breast and prostate, respectively, four datasets were produced: normal breast, tumor breast, normal prostate, and tumor prostate, containing the DEGs with symbol correspondence for the Affy names, no duplicates, and the different samples for each condition.

Gene Co-Expression Network Reconstruction
In this section, the proposed method for ensemble GCNs is introduced.For the generation of GCNs, an algorithm was developed in Python.The algorithm used comprised two main steps: evaluation of the relationship between each gene pair based on three different co-expression measures, followed by a major voting strategy.As a result, the final co-expression network exhibited more reliable interactions and sparseness than other techniques that adopt single co-expression measurements.
In the first phase, the algorithm made an evaluation of the gene-gene relationship using three different evaluation measures.The co-expression measures used for this task were Pearson's, Spearman's, and Kendall's coefficients [43,44].This choice was motivated by the following observations: The Pearson coefficient is widely used to test dependencies between gene expression levels.It allows us to quantify the strength of the linear relationship between two variables, which in this case corresponds to the expression level of each gene that is being compared.The Spearman coefficient does not require assumptions of linearity in the relationship between variables.Thus, it can identify monotonic relationships between pairs of genes, meaning that as the expression level of one gene increases, so does the expression level of the other, but not necessarily at a constant rate.This property makes it particularly useful for capturing complex relationships between genes that are not strictly linearly correlated.Finally, the Kendall measure evaluates the degree of strength of monotonic relationships.However, it differs in that it is a non-parametric measure, which implies that it does not necessitate an assumption about the distribution of the genes expression levels being compared, making it more robust to outliers and non-normality.
The Python SciPy.statslibrary was used to evaluate the correlation values of each gene pair using the correlation coefficients contained in the stats subpackage [45].

•
Pearson's correlation coefficient: The values of Pearson's correlation coefficient range from −1 to 1, with a score of −1 indicating a perfectly negative correlation, a score of 1 indicating a perfectly positive correlation, and a score of 0 indicating no correlation between the expression levels of the two genes under consideration.
where ρ refers to Pearson's correlation value between gene x and gene y.

•
Spearman's correlation coefficient: The values of Spearman's correlation coefficient range from −1 to 1, as in the case of Pearson's.
where ρ refers to Spearman's correlation value, d 2 to the differences between the ranks of x and y and n to the number of observations.• Kendall's correlation coefficient: this coefficient ranges from −1 to 1 and represents a valuable parameter for detecting non-linear relationships among genes.
where τ refers to Kendall's correlation value, n c to the number of concordant pairs of observations and n d to the number of discordant pairs of observations.Upon calculation of the three correlation measurements for each pair of genes, a correlation threshold was established.
The final significance assessment was carried out through a voting system, wherein a relationship between a pair of genes was considered significant if two of the three correlation values exceeded a chosen threshold [38].Hence, a relationship was added to the final co-expression network if it was considered correct.
Multiple co-expression networks were constructed for each group of deferentially expressed genes using varying threshold values.A total of 20 networks were generated by applying threshold values ranging from 0.7 to 0.9.Specifically, we constructed five coexpression networks using gene expression data from stromal tissue surrounding invasive primary breast tumors, another five networks from normal breast stroma, five networks from gene expression data derived from stromal tissue surrounding invasive primary prostate tumors, and the last five networks from normal prostate stroma.

Candidate Gene Co-Expression Network Selection
This section outlines the process employed to select the most suitable GCNs for further analysis.To evaluate the GCNs from the different thresholds, we employed the Cytoscape GNC-app [46] to calculate the gene network coherence (GNC) value.This software leverages network representations of biological databases to evaluate the biological coherence of the selected GCNs.In this study, BioGrid (human) served as the reference network for our analysis because it is a repository of biomedical interaction data that have been meticulously curated [47].It is crucial to note that, during this evaluation, the co-expression networks were not explicitly filtered against the BioGrid network.
For this purpose, a mathematical formula was derived (obtaining a value called Thr score ).The Thr score provided the corresponding value, representing the percentage by which the GNC value decreased per deleted edge concerning the network generated with the preceding lower threshold.Upon reaching minimal values, this indicated that the removed edges made minimal contributions to the GNC value in comparison to the prior network.Consequently, it became feasible to select this network for further analysis.
Thr score = % decrease in GNC value number of eliminated edges concerning the prior network With the above, it is important to note that the threshold with the lowest Thr score value was the one selected for network generation.

Gene Co-Expression Network Analysis
The final GCNs, upon which the topological analysis was conducted, were derived through the juxtaposition of network values for GNC [48], nodes, and interactions.A unique cancer-specific network was derived by excluding shared interactions between normal and tumor samples and exclusively retaining interactions specific to the tumor context in breast and prostate cases.The final collection of GCNs for breast and prostate was carried out in the R programming language.
In order to examine the selected co-expression networks, topological and enrichment analyses were conducted.The topological analysis entailed the identification of hubs and clustering, followed by an enrichment analysis of the clusters generated within each coexpression network.In a GCN, hubs are highly connected nodes that have a large number of edges or connections with other nodes and may be considered key components of the co-expression network [49].
Hub genes were selected using Cytoscape software named CytoHubba according to the degree algorithm [50].This method calculates the degree of connectivity of each node within the co-expression network and ranks them according to this value.The top 20 hub genes were filtered.
Clustering evaluation was performed in order to identify densely connected nodes.The clustering method used was community clustering (GLay) [51], implemented through Cytoscape's ClusterMaker app [52].
The gene ontology (GO) [53,54] and KEGG [55] enrichment analyses carried out on the previously generated gene clusters were performed using the clusterProfiler [56] R package for comparing biological themes among gene clusters.

Results
The generation and analysis of GCNs implemented for this study involved a variety of stages that are described in detail in the following sections.Specifically, Section 3.1 illustrates the exploratory analysis conducted on the dataset used, while Section 3.2 identifies the DEGs.Section 3.3 uses the GNC metric to assess and compare the coherence of the results with other GCN methods and to determine the best threshold.The succeeding Section 3.4 undertakes network reconstruction and selection.Finally, the analysis of the candidate co-expression networks for stromal tissues corresponding to both breast and prostate in a tumor context is performed in Section 3.5.
The primary objective of this study was to identify novel biomarkers through the creation and comprehensive analysis of co-expression networks.To construct these coexpression networks, we employed an algorithm that assessed the associations between gene pairs using three distinct co-expression measures, subsequently employing a majority voting strategy.This case study was applied to a dataset encompassing stromal cells within the context of both breast and prostate cancer [39], with the intent of elucidating the contrasting behaviors observed in these two cancer types.

Exploratory Analysis
The microarray analyses used in this study were obtained from expression data of stromal cells in breast and prostate cancer.Twelve samples from women (6 controls and 6 cases with breast cancer) and 12 samples from men (6 controls and 6 cases with prostate cancer) were analyzed.
To ensure the quality of the data used for this analysis, an exploratory analysis was carried out using an MDS plot for data transformation into log2 (Figure 2) to validate any discrepancies between the gene expression profiles from breast and prostate samples.Each point corresponds to a sample, and the distance between the points reflects the similarity or difference between the samples; closer points are more similar, while distant points are more dissimilar.The MDS plot shows the resulting plot, where there is a clear distinction in the stromal expression profiles between breast and prostate, but the difference between normal and tumor samples is much greater in the breast group than in the prostate group.The close proximity between normal and tumor samples of prostate stromal cells suggests minimal differences between the two conditions.Therefore, we opted for a FDR threshold of 10%, in contrast to the breast cancer samples, where an FDR of 5% was chosen.This pattern of sample distribution aligns with those observed in previous studies that employed the same dataset [39].This suggests that breast and prostate tumors have a distinct stromal reaction to tumor invasion that could be used to classify the samples used for this study and that the overall stromal response in breast cancer is stronger than in prostate cancer.

Differential Gene Expression Analysis
The DEGs of the two groups consisting of normal prostate compared to tumor prostate (NP vs. TP) and normal breast compared to tumor breast (NB vs. TB) were identified.A total of 218 DEGs were identified for the prostate stromal cases, of which 89 were upregulated and 129 were downregulated.For the breast stromal cases, a total of 776 DEGs were identified, of which 325 were upregulated and 451 were downregulated (Table A1 and Figure 3).The total DEGs obtained for prostate samples were significantly lower than for breast samples.
Among the total number of DEGs identified, only those with the corresponding symbol were taken into account for this study.The approach to handling duplicate genes involved simplifying the dataset by selecting the maximum value among repeats.This method assigned greater importance to the probe with the highest intensity, ensuring more accurate capture of the dominant gene expression.Therefore, for the study of breast stromal tissue, 776 DEGs were used as a starting point, and for prostate stromal tissue, 218 DEGs.Although the analysis of differential expression pertains to distinct organs specific to each gender, we identified a common set of 17 DEGs shared between both types of cancer (Figure A1).Furthermore, a GO enrichment analysis was conducted on the genes, showing shared expression profiles between the two organs (Figure A1).

Performance Comparison between Different GCN Methods
The aim of this section is to evaluate the GCNs generated both by our case study and by other co-expression network methods used in other works, WGCNA [57,58] and EnGNet [38,59].
To initiate the generation of co-expression networks using WGCNA, pickSoftThreshold function was executed to identify the appropriate power value that maximized the scalefree topology criterion for constructing the adjacency matrix.The thresholds chosen for the algorithm comparative analysis were then applied to derive the definitive outcomes from the adjacency matrix.For the generation of co-expression networks using EnGNet, the CyEnGNet application [60] was utilized.The default Hub threshold value remained unchanged at 3, while the remaining parameters were adjusted based on the selected range for the comparative study.
For each of these methods, multiple co-expression networks were generated using correlation values between 0.7 and 0.90, with an increment of 0.05 for each co-expression network.To evaluate the biological coherence of these networks, the GNC metric was employed, utilizing the human BioGrid as a reference network.Subsequently, the mean GNC and the highest value were calculated for each tissue group in the study: prostate tissues (normal and tumor) and breast tissues (normal and tumor).These coherence values are listed in Table 1.The maximal GNC value observed in normal stromal prostate tissue was 0.5.On the other hand, the tissue surrounding invasive primary prostate tumors exhibited a mean GNC value of 0.24, with a maximum value of 0.52.In contrast, using the WGCNA method, maximal values of 0.47 and 0.49 were observed, with mean GNC values of 0.26 in both cases.The EnGNet approach produced slightly lower values, with a maximum of 0.22 and a mean of 0.1.These results indicate that EnGNet produced notworks with lower coherence compared to WGCNA.
For breast tissues, mean and maximum GNC values of 0.33 and 0.52, respectively, were observed in the normal breast stroma and stromal tissue surrounding invasive primary breast tumors.Conversely, the WGCNA method yielded a mean of 0.13 and a maximum of 0.30 for normal stroma, and a mean of 0.12 and a maximum value of 0.26 when applied to tumor stroma.The EnGNet method produced mean and maximum values of 0.12 and 0.28, respectively.Despite this, for both normal and tumor breast tissues, mean values of 0.14 and maximum value of 0.35 were observed.However, the coherence of the co-expression networks generated by our study outperformed these two methods.
The results obtained with the proposed method exhibited consistent values across all four study cases.In contrast, WGCNA showed variations in values obtained for the normal or tumor prostate and normal or tumor breast datasets.Our method consistently produced more coherent co-expression networks, demonstrating robust performance regardless of the dataset used.
Based on this comparison and evaluation, it was determined that the co-expression networks produced by our method exhibited a higher degree of coherence compared to those generated by alternative GCN methods.In subsequent phases of the results, this enables us to select candidate co-expression networks that exhibit a greater degree of coherence compared to alternative methodologies of network reconstruction.

Gene Co-Expression Network Reconstruction and Candidate Co-Expression Network Selection
Four experimental conditions were generated for co-expression network reconstruction using the two sets of identified DEGs.The datasets were created for the stromal breast and stromal prostate tissue samples in both non-tumor and tumor contexts, with the DEGs identified for female and male samples.The algorithm that calculated the different correlation methods with different thresholds (from 0.7 to 0.90 increased by 0.05) was employed to generate the different GCNs.The results obtained from the analysis of the 20 GCNs generated by the ensemble of correlation methods (Spearman, Pearson, and Kendall) for the five selected threshold values and the results for the GNC value results are presented in Table A3.
As can be seen in the table, the best Thr score value was achieved using a threshold of 0.75 (marked in bold) for all cases.Therefore, for the rest of the study, the candidate networks were selected using this threshold.

Gene Co-Expression Network Analysis
Following the selection and refinement of the most appropriate stromal breast tumoral GCN and stromal prostate tumoral GCN, the next step involved conducting a series of analyses to gain insights into the biological mechanisms represented in the networks, with the aim of biomarker discovery.For this purpose, topological and enrichment analyses were performed.Topological analysis involved the identification of hubs and clustering, which was followed by an enrichment analysis of the clusters generated in each co-expression network.
Comparing the final co-expression networks chosen under the 0.75 threshold for both normal and tumor stromal breast samples, we found a total of 16,646 interactions in the normal co-expression network, of which 15,975 were interactions unique to this coexpression network; the tumor co-expression network was made up of 18,595 interactions, of which 17,924 were unique to this co-expression network.
The node degree distributions among the four reconstructed co-expression networks are detailed in Figure A2.Above 20 nodes with higher degree values were considered hubs in each co-expression network.These hubs are shown in Tables A6-A9.
Tables A6 and A7 display the top 20 most connected genes in each of the normal and tumor-specific breast co-expression networks.The average number of interactions among the hub genes selected for the breast normal stromal cell co-expression network was 98, while for the breast cancer stromal cell co-expression network, this was 107 degree.Of the 20 hubs selected for normal and tumor co-expression networks, no common genes were found; therefore, the hubs highlighted are network-specific.
Despite the lack of complete concurrence among the 40 hub genes identified independently in both co-expression network types, a substantial overlap of 36 hub genes existed within the shared interactions.
The majority of hub genes identified in both normal and tumor co-expression networks exhibited elevated values of closeness and eigenvector centrality.This indicates their proximity to all other genes in the co-expression network, facilitating efficient communication and connectivity with other crucial genes.
Comparing the final co-expression networks chosen under the 0.75 threshold for both normal and tumor stromal prostate samples, we found a total of 1678 interactions in the normal network, of which 1619 were interactions unique to this co-expression network; the tumor co-expression network was made up of 1352 interactions, of which 1333 were unique to this co-expression network.
Tables A8 and A9 display the top 20 most connected genes in each of the normal and tumor-specific networks.The average number of interactions among the hub genes selected for the normal stromal prostate co-expression network was 42, while for the prostate cancer stromal co-expression network, this was 30.
Among the 40 hub genes identified independently in both co-expression network types, half were found as part of the shared interactions between the normal and tumor graphs, and the other half were found without shared interactions in the normal graph and in the tumor graph.
The majority of hub genes identified in both normal and tumor co-expression networks exhibited elevated values of closeness and eigenvector centrality.This indicates their proximity to all other genes in the co-expression network, facilitating efficient communication and connectivity with other crucial genes.
The subsequent step in the topological analysis involved clustering the co-expression networks to identify groups of highly connected genes that may participate in similar biological processes.This method enabled the identification of functional modules and sub-networks, thus simplifying the subsequent enrichment analysis.Table A5 contains a summary of the clusters obtained for each breast and prostate tumor interaction network.
Two clusters with more than 10 genes were generated for stromal breast tumor (Figure 4), and four clusters with more than 10 genes were generated for the stromal prostate tumor GCN (Figure 5), which had a lower density in comparison.GO enrichment and KEGG analysis were performed on each of the clusters generated, allowing for the identification of functional genetic profiles within the co-expression network.This step enabled a better understanding of the biological mechanisms and pathways associated with the different clusters generated for each GCN.In the context of the clusters generated within breast stromal cells (Figure 4), two clusters are related due to the interaction of some of their genes, but the metabolic processes in which the genes of each cluster are involved are different.The green cluster contains all the hub genes found in this GCN and is associated with the regulation of cellular differentiation.The other brown cluster is associated with the extracellular matrix, pathways such as focal adhesion, and ECM-receptor interaction.The bridging genes, identified as the top genes with high betweenness centrality values, are distributed in all three clusters of the co-expression network.However, it is noteworthy that the green cluster and brown cluster have the highest presence of these genes.
For green cluster, pink cluster, and purple cluster related to prostate stromal cells, two clusters particularly stand out due to the distribution of hub genes (Figure 5).The green cluster emerged as the most noteworthy, harboring 13 out of the 20 hub genes with GO terms associated with prostate gland morphogenesis, and the pink cluster has seven hub genes and GO terms related to regulation of cell differentiation.Genes facilitating connections between distinct segments of the co-expression network, highlighted in red, were found in five out of the six clusters.Notably, the purple cluster stands out with the highest abundance of these connecting genes.The genes present in this cluster are related to the regulation of different metabolic pathways of cellular proliferation growth factor signaling.

Discussion
Tumors develop a unique TME that features different compositions of cancerous, noncancerous, stromal, and immune cells in each phase of cancer progression.The different cell subtypes of the TME interact with each other but also with components of the ECM surrounding the cells [61].The TME plays a crucial role in the progression of tumors, influencing key processes such as tumor growth, invasion, and metastasis.Additionally, the TME significantly impacts the immune response against the tumor, either by creating an immunosuppressive environment or by stimulating anti-tumor immune responses [4].
The objective of this study was to compare the stromal responses in breast and prostate cancer.Additionally, the study aimed to propose novel biomarkers related to the tumor microenvironment of both cancer types using an in silico approach following the pipeline illustrated in Figure 1.The following topics can be discussed based on the results obtained.

Differential Expression Analysis Revealed Significant Differences between Stromal Breast Tumor and Stromal Prostate Tumor
According to the results obtained in the exploratory analysis and the differential expression analysis performed on the expression data, there is a difference in the way the stromal reaction to invasion by breast and prostate tumors may impact tumor progression.
The MDS plot (see Figure 2) reveals substantial variation in the distances between the breast and prostate sample groups, particularly in terms of logFC.The large distances between the samples in the female group and the samples in the male group stand out.Additionally, the distance between normal and tumor samples in the female group is larger than that observed between normal and tumor samples in the prostate group.Furthermore, the differential expression analysis highlighted a significantly higher number of DEGs in stromal tissue surrounding invasive breast tumors (776 DEGs) compared to stromal tissue surrounding invasive prostate tumors (218 DEGs).These findings suggest a major role for stromal cells in the development of breast cancer over that in prostate cancer.This disparity may stem from inherent biological differences between male and female individuals, including sex-specific hormonal, genetic, and physiological factors that contribute to the pathogenesis and progression of breast and prostate cancers.
It is important to note that the heterogeneity within the breast tumor samples was higher in comparison with the prostate tumor samples.In the first case, the majority of the patients exhibited lymph node metastasis, indicating an advanced disease stage [39].Although both sample groups-breast tumor and prostate tumor samples-were obtained from primary tumor sites, the difference in disease stage could have influenced the number of DEGs identified within both tumor types.
Of the DEGs obtained from breast and prostate samples, most were organ-exclusive.They had only 17 genes in common (see Figure 3).Furthermore, a GO enrichment analysis of the 17 genes revealed common cellular annotations (Figure A1).The majority of these genes, which are shared between stromal cells in prostate and breast cancers, exhibited consistent regulatory patterns.However, there were exceptions, such as TGFB3 (Transforming growth factor beta-3 proprotein), PTGS1 (Prostaglandin G/H synthase 1), DUSP6 (Dual specificity protein phosphatase 6), SASH1 (SAM and SH3 domain-containing protein 1), and FIGN (Fidgetin), which demonstrated distinct regulations depending on the type of tumor (see Table A2), which could be linked to the progression of the tumor.These genes have been associated with growth factors, the gene regulation that promotes cell differentiation, cell migration, and microtubule regulation, respectively [62][63][64][65][66]. Nevertheless, these DEGs require further experimental validation in both types of cancer.

Exploratory Analysis in Stromal Breast and Prostate Co-Expression Networks
The differences observed in the MSD plots and after differential expression analysis were confirmed after modeling the co-expression networks obtained for each of the case studies: stromal normal breast and prostate and stromal tumor breast and prostate.
A comparative analysis between the normal stromal and stromal breast tumor coexpression networks shed light on those hub genes present in both co-expression networks but that did not share interactions.This discovery implies that the four genes excluded from shared interactions may exert a critical role in distinct contexts, contingent on their expression levels.Specifically, the SPP1 (Osteopontin) and ARSA (Arylsulfatase A) genes were discerned in the normal stromal breast network, while RIPOR3 (RIPOR family member 3) and LGALS1 (Galectin-1) were identified in the tumor stromal breast network.
Specifically, in the case of the hub genes present in the tumor breast (Table A7), 65% of them have previously been found to be biomarkers in breast cancer studies.SUN1 (SUN domain-containing protein 1) serves as a constituent of the Linker of Nucleoskeleton and Cytoskeleton complex, participating in the linkage between the nuclear lamina and the cytoskeleton [67].In a study conducted by Matsumoto et al., upregulation of this component was observed in human breast cancer tissues.The authors proposed that the expression of SUN1, along with other cytoskeleton complex genes, may play pivotal pathological roles in the progression of breast cancer [68].The LGALS1 gene was suggested as a biomarker by Jung et al. [69] in human breast carcinoma tissues, where an elevated expression of Galectin 1 was observed in all cancerous stromal tissues associated with breast cancer.The MAP7D1 (MAP7 domain-containing protein 1) gene was previously proposed as a biomarker by Wu et al. [70] in breast cancer.
Out of the 20 hub genes obtained from both the normal and tumor prostate networks, only the GATA6-AS1 gene is shared; the remaining hub genes are specific to each coexpression network.It is noteworthy that the hub genes derived from the normal network exhibited downregulation, with the exception of GATA6-AS1.In contrast, the hub genes identified in the tumor network displayed overexpression, except for GATA6-AS1 and APBA2 (Amyloid-beta A4 precursor protein-binding family A member 2).
Specifically, in the case of the hub genes present in the tumor prostate co-expression network (Table A9), 60% of them were previously found to be biomarkers in prostate cancer studies.HOXC6 (Homeobox protein Hox-C6), which was reported as a biomarker in another prostate cancer study [71], and GPR160, a G protein-coupled receptor, were overexpressed in prostate cancers, and their effect does not require the involvement of androgen receptors.There inhibition induces apoptosis in prostate cancer cells and therefore has significant effects on cancer cell proliferation and survival [72].

Gene Expression of Stromal Cells in Breast Tumors Is Related to Focal Adhesion Modifications, While That of Stromal Cells in Prostate Tumors Is Related to Organ Formation and Cell Differentiation
In the context of breast stromal cells, the central genes of the co-expression network were identified as part of one of the three clusters comprising the final co-expression network, as summarized in Figure 4.The green cluster was associated with the regulation of proteins present in the extracellular matrix, where stromal cells were embedded.Based on the ontogenetic study of the two main clusters obtained, it is suggested that the affected components are those directly involved in extracellular adhesion, possibly in response to a regulatory process facilitated by a collaborative interaction between stromal and tumor cells [73].Some genes present in the co-expression network contribute to microenvironment remodeling, including HPSE (Heparanase), which enhances cell adhesion to the extracellular matrix independently of its enzymatic activity.This induces Akt1/PKB phosphorylation through lipid rafts, consequently increasing cell motility and invasion [74,75].Another gene, RHOU (Rho-related GTP-binding protein RhoU) plays a role in the regulation of cell morphology and cytoskeletal organization [76].
In the case of the prostate stromal cells, hub genes of the co-expression network were found to form part of two of the six clusters of which the final co-expression network was composed, a summary of the co-expression network annotation is given in Figure 5.In contrast to breast cancer stromal cells, there were fewer differential genes expression difference between the normal and tumor prostate stromal tissue.This suggests that breast cancer stromal cells have a much greater role in tumor cell development than prostate stromal cells.The hub genes found within the main prostate network were distributed between the green cluster and pink cluster.Both clusters were related to organ maintenance, epithelial tissue differentiation, and morphology.A distinctive pattern is observed that reveals a close relationship with cell morphology and the process of mesenchymal development.Those present in both clusters appear to be intrinsically linked to the regulation of cell shape and structure, being themselves responsible for the stromal abnormality in prostate cancers rather than being mediated in response to the environment, as in the case of breast cancer.Some of the genes involved in this process are S100A4, a calcium-binding protein that plays a role in various cellular processes, including motility, angiogenesis, cell differentiation, apoptosis, and autophagy [77]; and HPN, which plays a role in cell growth and maintenance of cell morphology [78].These findings highlight the importance of stromal cell differentiation in the plasticity of prostate tumors.This modulation of stromal cell morphology is observed in histological sections.Some of the patterns include hypercellular stroma with scattered atypical but degenerative atypical cells mixed with benign prostatic glands, and hypercellular strychoma consisting of soft spindle-shaped stromal cells mixed with benign glands, and they may also contain atypical and degenerative stromal cells that may be associated with a variety of benign epithelial proliferations, including basal cell hyperplasia, adenosis, and sclerosing adenosis [79,80].
In summary, our network-generation method was able to generate the study coexpression networks from differentially expressed genes of stromal cells in the context of both breast and prostate cancer and find patterns of differences in the mode of stromal cell development relative to each case in particular.As mentioned earlier, a comprehensive literature review was conducted to investigate the association of the identified hub genes with stromal cells in breast cancer.From the study of the GCN, we can highlight the hub genes ST6GAL2, RIPOR3, COL5A1, and DEPDC7 as possible biomarkers in the context of stromal cells in breast tumors, as they have been characterized in other cancer cases.
There is no specific literature about ST6GAL2 (Beta-galactoside alpha-2,6-sialyltransferase 2) with breast cancer evidence, but this gene has been highlighted in follicular thyroid carcinoma [81].
While there is limited specific literature evidence about the role of the RIPOR3 gene in breast cancer, it has been studied in other cancer types.Notably, RIPOR3 has shown a disregulation at oral squamus cell carcinoma of the mobile tongue [82].At present, reports in the literature on the RIPOR3 gene are limited.
Another possible biomarker is COL5A1 (Collagen alpha-1(V) chain), and though there is no specific literature about this gene associated with breast tumor, it was associated with the progression of gastric cancer in humans [83].
Finally, DEPDC7 (DEP domain-containing protein 7) is proposed as a biomarker, but its function is poorly understood.Liao et al. investigated the disregulation of this gene in two hepatoma cell lines, as well as the cell proliferation, cell cycle progression, cell migration, and invasion of these cells, suggesting that DEPDC7 is a tumor suppress gene [84].
4.5.GATA6-AS1, ARFGEF3, PRR15L, and APBA2 Are Potential Biomarkers in the Prostate Tumor Microenvironment A comprehensive literature review was conducted to investigate the association of the identified hub genes with stromal cells in prostate cancer.From the study of the GCN, we can highlight the hub genes GATA6-AS1, ARFGEF3, PRR15L, APBA2, and LINC03026 as possible biomarkers in the context of stromal cells in prostate tumors, as they have been characterized in other cancer cases.
The GATA6-AS1 lncRNA has been linked to the expression of F-box proteins [85], which play an important role in the degradation of key proteins in cellular regulation and tumorigenesis [86].It has been studied in the context of lung, gastric, and ovarian cancer in association with the inhibition of signaling pathways and prevention of epithelialmesenchymal transition [87,88].The hub gene GATA6-AS1 was in the green cluster and had 26 interactions.Of these interactions, fifteen were hub genes.It should be noted that GATA6-AS1 is also one of the hub genes found in the results obtained for normal stromal breast and stands out in this dataset as the only one of the calculated hub genes that showed upregulation with respect to the rest.The genes with which they were related within the co-expression network were found both in the green cluter and pink cluster, so they participated in the regulation of both clusters.This gene interacts with the gene HOXC6, which was present in the pink cluster, which was previously reported as a biomarker in another prostate cancer study [71].
ARFGEF3 or BIG3 (Brefeldin A-inhibited guanine nucleotide-exchange protein 3).Kim et al. identified BIG3 as significantly overexpressed in the great majority of breast cancer cases and breast cancer cell lines [89].
The bibliographical information regarding PRR15L (Proline-rich protein 15-like protein) gene is scarce.We can highlight the study in Mizuguchi et al. [90], where PRR15L-RSPO2 fusion was identified, which expands the variations of RSPO fusions in colorectal neoplasms.This gene was present in the green cluster and interacted with 29 genes, some of which are closely related to studies on prostate cancer, for example, DKK1 (Dickkopf-related protein 1), whose protein negatively regulates the Wnt/β-catenin signalling pathway, with therapeutics targeting of it in clinical trials for cancer patients [91].
Finally, APBA2 (Amyloid-beta A4 precursor protein-binding family A member 2) was proposed as an epigenetic regulatory gene involved in multiple cancer-related pathways in breast cancer [92].This gene was present in the pink cluster and interacted with another characterized gene like FOXQ1 (Forkhead box protein Q1), which is a transcription factor that has been studied in several types of cancer, has been found to be upregulated or downregulated in different types of cancer, and has therefore already been suggested as a prognostic biomarker for several types of tumors [93].

Conclusions
In this work, we presented a computational study of GCNs to identify breast and prostate cancer biomarkers.To do so, we also introduced an ensemble method for the inference of GCNs.The method is based on three different correlation algorithms (Pearson, Kendall, and Spearman).Thus, only if two or more coefficients determine that the relationship is significant will it be established as valid.As a result, the final co-expression network obtained offered a higher level of reliability than if only a single coefficient had been used.
On the other hand, the main results of the study revealed different behaviors depending on the organ in which the cancer develops.It was found that breast cancer stromal cells are related to the maintenance of the extracellular matrix through modifications in the focal-adhesion that are part of the stromal tissue.On the other hand, prostate cancer stromal cells are related to cell differentiation, leading to abnormal development of stromal cells.Finally, potential biomarkers were suggested; in the case of breast tumor, ST6GAL2, RIPOR3, COL5A1, and DEPDC7 were found, and in the case of prostate tumor, the genes were GATA6-AS1, ARFGEF3, PRR15L, and APBA2.These results demonstrate the usefulness of the method in the field of biomarker discovery.
As future work, we are currently working on two main points in the context of this work: delving deeper into the study of prostate and breast cancer, and in another sense, improving the co-expression network inference method to achieve better results.To obtain a better understanding of these cancers, we are using additional datasets, such as RNA-Seqs and even single-cell, to verify the results obtained in the present study.On the other hand, we are also working to improve the inference algorithm from two points of view: (a) using other algorithms or correlation coefficients in the ensemble co-expression network generation step to enhance the results, and (b) allowing multi-input of data to improve the reliability of the conclusions obtained.

Figure 1 .
Figure 1.General overview of the pipeline that was implemented during the execution of this study.After the exploratory analysis, a differential expression analysis was conducted, followed by GCN generation via three different co-expression measures using an inference algorithm based on a major voting strategy.Validation and selection of the best candidate GCNs for both breast and prostate cancer were performed with a Thr score .Finally, an analysis including hub identification, clustering, and enrichment analysis was conducted.

Figure 2 .
Figure 2. MDS plot analysis for the log −2 transformed data from microarray analysis.The stromal tissue surrounding invasive primary breast tumors is in red; those corresponding to normal breast stroma are shown in blue; samples corresponding to stromal tissue around invasive primary prostate tumors are shown in purple; and samples of normal prostate stroma are shown in green.

Figure 3 .
Figure 3. Identification of the DEGs for stromal breast tumors (NB vs. TB, left) and stromal prostate tumors (NP vs. TP, right).The top section displays volcano plots representing gene data, where each dot represents a gene.Blue dots indicate downregulated DEGs, while red dots indicate upregulated DEGs.The bottom section presents a Venn diagram illustrating the common DEGs for each contrast without duplicates, revealing the identification of fourteen shared DEGs.DEG, differentially expressed gene; NB, normal breast; TB, tumor breast; NP, normal prostate; TP, tumor prostate.

Figure 4 .Figure 5 .
Figure 4.Cluster details in breast stromal tumor tissue.DEGs are highlighted in blue, hub genes are marked in yellow, and connector genes are highlighted in red.The lower section displays the GO terms influenced by the genes in the green cluster (left) and brown cluster (right).

Figure A1 .
Figure A1.Results obtained from GO term enrichment analysis for the common 17 DEGs expressed in breast and prostate stromal cancer.

Figure A2 .
Figure A2.Distribution of node's degree throughout the co-expression networks reconstructed from stromal normal breast (top left), stromal tumor breast (top right), stromal normal prostate (down left), and stromal tumor prostate (down right).The distribution trendline is shown in blue.Appendix A.2. Tables All tables presented below are referenced in the main text.

Table 1 .
Comparison of the GNC among the methods used in this study, WGCNA, and EnGNet.Bold text indicates the highest GNC values for each respective group.

Table A1 .
Comparison of DEGs between breast stromal (NB vs. TB) and prostate stromal (NP vs. TP) groups.

Table A2 .
Common differential genes and their regulation in breast stromal (NB vs. TB) and prostate stromal (NP vs. TP) groups.

Table A3 .
GNC analysis was performed on 20 co-expression networks generated for each dataset using various correlation ranges (0.7-0.9).Bold text highlights the selected GNCs for further analysis.

Table A4 .
Size of the GCNs retained for further analysis, including interactions specific to the stroma normal and tumor context in both breast and prostate.

Table A5 .
Clustering analysis summary for breast and prostate Tumor networks.

Table A6 .
The top 20 genes based on co-expression network centrality measures in the stromal breast normal GCN with a 0.75 threshold value without common interactions.

Table A7 .
The top 20 genes based on co-expression network centrality measures in the stromal breast tumor GCN with a 0.75 threshold value without common interactions.

Table A8 .
The top 20 genes based on co-expression network centrality measures in the stromal prostate normal GCN with a 0.75 threshold value without common interactions.

Table A9 .
The top 20 genes based on co-expression network centrality measures in the stromal prostate tumor GCN with a 0.75 threshold value without common interactions.