Reverse Engineering of Ewing Sarcoma Regulatory Network Uncovers PAX7 and RUNX3 as Master Regulators Associated with Good Prognosis

Simple Summary Ewing Sarcoma is a rare cancer that, when localized, has an overall five-year survival rate of 70%. Patients with metastasis have a five-year survival rate of 15 to 30%. Early analysis of patient prognosis can be crucial to provide adequate treatment and increase chances of survival. Besides, it is a disease with several gaps in our understanding, including regulation of genes and which transcription factors are master regulators. This work addresses these two topics by inferring gene regulatory networks that allow us to identify putative master regulators to predict patient prognosis. We were able to identify two sets of master regulators that can predict good and bad patient outcomes. Abstract Ewing Sarcoma (ES) is a rare malignant tumor occurring most frequently in adolescents and young adults. The ES hallmark is a chromosomal translocation between the chromosomes 11 and 22 that results in an aberrant transcription factor (TF) through the fusion of genes from the FET and ETS families, commonly EWSR1 and FLI1. The regulatory mechanisms behind the ES transcriptional alterations remain poorly understood. Here, we reconstruct the ES regulatory network using public available transcriptional data. Seven TFs were identified as potential MRs and clustered into two groups: one composed by PAX7 and RUNX3, and another composed by ARNT2, CREB3L1, GLI3, MEF2C, and PBX3. The MRs within each cluster act as reciprocal agonists regarding the regulation of shared genes, regulon activity, and implications in clinical outcome, while the clusters counteract each other. The regulons of all the seven MRs were differentially methylated. PAX7 and RUNX3 regulon activity were associated with good prognosis while ARNT2, CREB3L1, GLI3, and PBX3 were associated with bad prognosis. PAX7 and RUNX3 appear as highly expressed in ES biopsies and ES cell lines. This work contributes to the understanding of the ES regulome, identifying candidate MRs, analyzing their methilome and pointing to potential prognostic factors.


Introduction
Ewing Sarcoma (ES) is an aggressive bone and soft tissue sarcoma, occurring most frequently in adolescents and young adults, of which the hallmark is a chromosomal translocation between chromosomes 11 and 22, involving genes from the FET and the ETS transcription factor families. Bone is the most frequent site of origin, but about 15% to 20% of Ewing sarcoma emerges in bone-associated soft tissue [1][2][3]. EWSR1-FLI1 has been reported as the most frequent fusion in ES, coding for a chimeric protein that functions as an aberrant transcription factor [4]. The EWSR1-FLI1 chimeric protein is known to alter the chromatin leading to mistargeting, dysregulation of chromatin state, and eventually transcriptional impairing [5,6]. The impact of EWSR-FLI1 fusion on ES transcriptome is still poorly understood, and the regulatory mechanisms behind those transcriptional alterations have not yet been deeply investigated.
ES was first described as a presumed diffuse endothelioma of bone [7]. The key ES prognostic factor is the presence of detectable metastasis at diagnosis, and patients with bony metastases (with or without lung involvement) have a very poor prognosis, mostly when compared to patients with exclusively lung metastasis [8]. It is known that the chimeric protein alters the transcriptional pattern, and some of ES transcriptional abnormalities have been associated with epigenetic modifications induced by EWSR1-FLI1 [4]. There is no ES cell of origin identified so far, and several ES tissues of origin were proposed, including endothelial, vascular pericytes or smooth muscle, primitive vascular mesenchyme, pluripotential uncommitted mesenchyme, osteoblastic (based on collagen matrix synthesis patterns) and small cell osteosarcoma, and/or mesenchymal chondrosarcoma [2,9]. In recent decades, studies converged on two putative ES cells of origin: human mesenchymal stem cells (hMSC) and human neural crest cells (hNCC) [4]. There are also studies that point towards a neuro-mesenchymal stem cell phenotype [10].
Gene expression is regulated by the activity of regulatory molecules such as transcription factors. It is well described that a small number of transcription factors, which acts as master regulators (MRs), might handle the cell fate in different cellular models [11,12]. The reconstruction of regulatory networks based on transcriptional information has been successfully applied to different types of cancer, including breast cancer, pancreatic ductal adenocarcinoma, and neuroblastoma [13][14][15][16]. The regulatory network reconstruction aims to identify those transcription factors at the top of the transcriptional regulatory hierarchy, the MRs [17,18]. Identifying such MRs could help predict patient prognostic and pointing out putative biomarkers that could lead to better diagnosis and treatment protocols. Compared with other solid cancers, and cancers that have recurrent mutations, the number of recurrent somatic mutations in ES is limited [19][20][21][22]. Therefore, ES development is known as a transcription-related phenomenon, which points to regulatory analysis as a promising strategy in ES investigation. Here, we have inferred the ES regulatory network based on transcriptional data available in public databases. We also identified a set of seven putative transcription factors which acts as master regulators in ES and analyzed their methylation profile. Those master regulators clustered in two groups with antagonistic behavior between the groups regarding the regulation of shared genes, regulon activity, and implications in clinical outcome.

Data Acquisition and Processing
Data used to infer the ES regulatory networks were obtained from Gene Expression Omnibus (GEO) [23] (accession numbers GSE34620 and GSE63157). The dataset GSE34620 comprises expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 117 ES biopsies [24] and the dataset GSE63157 comprises expression data (Affymetrix Human Exon 1.0 ST Array) from 85 ES biopsies [25]. Data used to infer ES gene signature was also obtained from GEO (accession numbers GSE73610 and GSE67073) [26,27]. The dataset GSE73610 comprises expression data (RNA-Seq profiling Illumina Genome Analyzer IIx, Homo sapiens) from 3 ES cell lines and 2 hMSC cell lines and the dataset GSE67073 comprises expression data (RNA-Seq profiling Illumina HiSeq 2000, Homo sapiens) from four induced pluripotent stem cells (iPSC)-derived neural crest populations from familial dysautonomia patients and 2 iPSC-derived neural crest populations from healthy volunteers. Data used for survival analysis was obtained from GSE63157 (also used for Ewing Sarcoma regulatory network inference) and from GSE17618, both retrieved from GEO. The dataset GSE17618 comprises expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 44 ES biopsies and 11 ES cell line samples [27].
To assess the gene expression of the master regulators in different tissues and solid pediatric cancers, we used once again the dataset GSE34620 for ES, in addition to the following: GSE16476, comprising expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 88 Neuroblastoma biopsies; GSE53224 comprising expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 53 biopsies of Wilm's tumor; GSE75271 comprising expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 55 biopsies of hepatoblastoma, GSE87437 comprising expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 21 biopsies of high-grade osteosarcoma, GSE29684 comprising expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 20 biopsies of retinoblastoma, GSE66533 comprising expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 58 biopsies of rhabdomyosarcoma and GSE3526 comprising expression data (Affymetrix Human Genome U133 Plus 2.0 Array) from 353 samples of 65 different healthy tissues. The criterion chosen for this analysis was the most common solid pediatric tumors, microarray platform type, and the number of samples.
Microarray raw data (GSE34620 and GSE17618, Affymetrix Human Genome U133 Plus 2.0 Array) normalization (Robust Multichip Average, RMA method) and quality control was performed with the Affy Bioconductor/R package [28]. For GSE63157 (Affymetrix Human Exon 1.0 ST Array) normalization (Robust Multichip Average, RMA method) and quality control was performed with oligo Bioconductor/R package [29]. Annotation data for Affymetrix Human Genome U133 Plus 2.0 Array was obtained from hgu133plus2.db and for Affymetrix Human Exon 1.0 ST Array from huex10sttranscriptcluster.db R packages. RNA-seq data were processed according to the Tuxedo protocol. Briefly, we used fastqdump to convert SRA compressed files into FASTQ format files and fastqc to assess the quality of them. Then we run TopHat to align reads to the hg19 reference transcriptome. For the comparison of gene expression of the master regulators among solid pediatric tumors and healthy tissues, the datasets GSE34620, GSE16476, GSE53224, GSE75271, GSE87437, GSE29684, GSE66533, and GSE3526 were normalized together. Looking for samples on the same platform type was important here since different microarray platforms have a different set of probes.
In this work, we analyzed three datasets with ES biopsies. For GSE63157, data was from biopsies collected from patients on Children's Oncology Group (COG) and EuroEwing. Molecular analysis of COG and EuroEwing tumors was performed using RT-PCR for EWS-FLI1 and EWS-ERG fusions. Forty-six samples obtained from the COG Biorepository in Columbus, Ohio (Cooperative Human Tissue Network -CHTN) were prospectively acquired from patients on clinical trials INT-0154 (CCG-7942, POG-9354) and AEWS0031, the two most recent COG clinical protocols for patients with localized Ewing sarcoma (ES). An independent set of 39 tumor samples was obtained from the EuroEwing tumor biorepository in Muenster, Germany and were prospectively acquired from patients registered on EICESS 92 (European Intergroup Cooperative Ewing's Sarcoma Study) and EuroEwing 99.46 out of the 85 samples have information about the primary tumor site (Under three categories: other, extremity and pelvis) [25]. For GSE17618, the patient material for study was taken prior to any treatment in 29 cases, and in 15 cases, chemotherapeutics, radiation therapy, and/or surgical treatment was applied before material was collected [30]. In the case of GSE34620, samples were from the CIT program (Cartes d'Identité des Tumeurs research program) from the french Ligue Nationale Contre le Cancer. The published dataset article details that all cases included in the study showed a specific EWSR1-ETS fusion and the molecular diagnosis was performed in Institut Curie [24].

Regulatory Network Inference and Master Regulator Analysis
The regulatory network inferences and the master regulator analyses were performed using RTN (Reconstruction of Transcriptional Networks), a Bioconductor/R package that provides several tools for the reconstruction and analyses of transcriptional networks, such as the Algorithm for the Reconstruction of Accurate Cellular Networks (ARACNe), and the Master Regulator Analysis (MRA) algorithms [15]. Briefly, the RTN algorithm measures statistical dependence of gene expression data along with a previously defined set of transcription factors (TFs) to infer a TF-centric regulatory network. The Transcriptional Network Algorithm (TNI) computes the association between a transcription factor and each potential regulated gene, removing spurious associations by permutation analysis (BH adjusted p-value 0.05). We have inferred two ES regulatory networks by using two datasets containing ES expression data from 117 biopsies (GSE34620), used to infer Network 1, and 85 biopsies (GSE63157), used to infer Network 2, and a list of 1388 human transcription factors available in Fletcher2013b Bioconductor/R package [15]. The MRA looks for regulatory units (regulons) that are enriched for a gene signature (BH adjusted p-value 0.05), pointing out putative transcription factors that are relevant to the disease, known as Master Regulators (MRs). Here, we obtained two ES signatures: one by performing a differential expression analysis comparing hMSC (GSE73610 N = 2) against ES cell lines (GSE73610 N = 3) and another by performing a differential expression analysis comparing hNCC (GSE67073 N = 2) against the same ES cell line samples (GSE73610 N = 3). RNAseq samples were processed with the Tuxedo protocol, and Cuffdiff was used to quantify gene expression in each condition and test for differential expression. Genes with adjusted p-value lesser or equal than 0.05 were considered to be differentially expressed. We used both signatures to perform the MRA pipeline to network 1 and network 2 ( Figure A1).

Differential Methylation Analysis
The differential methylation analysis was performed comparing 15 samples from different ES cell lines with 9 samples from hMSC cell lines. These 9 samples from hMSC are divided between 6 samples from patients with ES and 3 from healthy donors. The complete study consists of 24 samples (GSE118872) [31]. We used the methylationArrayAnalysis package from Bioconductor [32], and followed their pipeline instructions. Data normalization was performed using the Genome Studio, the standard software provided by Illumina, through the function preprocessIllumina.
Quality control was performed looking for failed positions (this is defined as both the methylated and unmethylated channel reporting background signal levels), as recommended in the pipeline manual, and with a p-value threshold of 0.01. At the end, no sample was required to be filtered out. After that, probe-wise differential methylation analysis was performed (Bonferroni adjusted p-value < 0.01). We then used Fisher's Exact Test (Bonferroni adjusted p-value < 0.01) to check if the regulons of the putative master regulators inferred in the Master Regulator Analysis were significantly enriched with genes that were found to be differentially methylated in the probe-wise differential methylation analysis. This enrichment analysis was performed 2 times: (a) with regulons inferred in Network 1, and (b) with regulons inferred in Network 2.

Gene Ontology Functional Enrichment
To investigate the functional enrichment of genes regulated by each of the master regulators, we merged the regulons of each of the master regulators from Network 1 and 2. The Gene Ontology functional enrichment was performed using the clusterProfiler R Package aimed at the Biological Processes ontology [33]. The adjusted p-value < 0.05 (Benjamini-Hochberg correction) was considered to be significant.

Network Visualization
We used the RedeR Bioconductor/R package [34] to visualize two types of networks generated by the RTN package: association maps and tree-and-leaf representations. In as-sociation maps, nodes in the network are transcription factors, and the edge width between any two nodes is related to the number of genes mutually regulated by each pair of transcription factors, i.e., the edge width refers to the regulatory overlap between regulons. Tree-and-leaf representation is similar to the association map, with the exception that edge width is fixed and nodes are hierarchically organized throughout the network according to the overlap of regulated genes. The regulon clusters are formed according to the overlap of genes regulated by these transcription factors. Regulons with less than 15 genes are not represented in the networks, a default cut-off in the RTN package.

Master Regulators Activity
We used the RTN package to measure MR regulatory activity, i.e., the enrichment score of the regulatory activity of each regulon compared to the other regulons in the set for every patient. The RTN package measures for every patient how the expression of every gene deviates towards the average expression of that gene for all patients in the cohort and then applies the Two-Tailed Gene Set Enrichment Score analysis (GSEA2). Briefly, gene expression was first converted into z-score and, in each sample, all genes were sorted by the z-score and then used as the reference list in GSEA2. The TFs activity level was approximated by the Normalized Enrichment Score (NES) computed for its regulon. The results of the two analyses were plotted as a heatmap along with dendrograms.

Survival Analysis
From the ES datasets used in this study, only GSE63157 (N = 85) and GSE17618 (N = 44) had survival data. The survival R package was used to analyze patient overall survival in terms of the activity of the regulon of each putative master regulator. Samples were divided into two groups based on the median of the regulatory activity of each regulon. The regulons with activity values above the median were classified as "high regulon activity" and values below the median as "low regulon activity". The median was calculated for every regulon of each patient. The p-values for the comparison of the survival curves in the Kaplan-Meier estimator were calculated with the log-rank test.
For the GSE17618 cohort, with too few samples to perform network inference, we used the regulon structure of the cohort one (the cohort with the largest sample size used for network inference, N = 117), i.e., which genes were associated with each putative master regulator, but with the expression values of the GSE17618 cohort.

RT-qPCR Analysis
For the in vitro expression gene analysis of master regulators identified in silico, we used different representative cell lines of Ewing Sarcoma (RD-ES and SK-ES), Neuroblastoma (SH-SY5Y and SK-N-Be(2)), Hepatoblastoma (HepG2) and Medulloblastoma (Daoy and D283). The total RNA extraction was performed with SV Total RNA Isolation System kit (Promega, Madison, USA) according to manufacturer's instructions and quantified in Nanodrop (Thermo Fisher Scientific, Waltham, USA). The cDNA was obtained using the GoScript Reverse System (Promega) according to manufacturer's instructions. The qPCR of the master regulators cDNA (PAX7, RUNX3, ARNT2, CREB3L1, GLI3, MEF2C, PBX3) were amplified using PowerUp SYBR Green Master Mix (Thermo Fisher Scientific.) and the relative gene expression was analysed using 2 (−DDCt) method [35]. The RD-ES was used as control and ACTB was used as internal control. Statistical analysis was performed by one-way analysis of variance (ANOVA) followed by Bonferroni post-hoc test; p values under 0.05 were considered to indicate statistical significance.

Regulatory Network Inference and Master Regulator Analysis
We have inferred ES regulatory networks based on two cohorts (GSE34620 and GSE63157) containing transcriptional data from 117 patients (regulatory network 1) and 85 patients (regulatory network 2) (Figures 1 and A1). A gene signature is required to infer potential master regulators. A common strategy to obtain a gene signature for a given tumor is through differential expression analysis by comparing the cancer cells against the cell of origin. Since the cell of origin of ES is still a matter of debate, we have used two different cell types to infer ES signature: human mesenchymal stem cells (hMSC) and human neural crest cells (hNCC), the most accepted cells of origin in the literature [4]. The two signatures were used to perform the MRA for network 1 and network 2 ( Figure A1). Figure 1A,B show tree-and-leaf representations of both regulatory networks (for the regulatory networks with fully regulon names, see Figures A2 and A3). Each network is depicted based on regulon overlap, and the nodes represent the inferred regulons containing at least 15 genes. Networks 1 and 2 contain 645 and 568 regulons, respectively. The MRA performed in the network 1 identified 44 master regulators (Table A1) for the hMSC signature and 13 master regulators for the hNCC signature (Table A2). In the network 2, MRA identified 66 master regulators for the hMSC (Table A3) signature and 42 master regulators for the hNCC signature (Table A4). Among those four sets of inferred master regulators (MRs), seven are always present: ARNT2, CREB3L1, GLI3, MEF2C, PBX3, PAX7, and RUNX3 ( Figure 1A,B) and Figure A1). From that point, we chose to work with those seven master regulators composing the intersection of both networks, and both signatures ( Figure A1). In network 1, five master regulators (CREB3L1, GLI3, MEF2C, PBX3, and PAX7) out of the seven common to all MRA analyses are located close to each other at a small region of the network ( Figure 1A). In network 2, five of those master regulators (CREB3L1, GLI3, PBX3, RUNX3, and PAX7) are also located close to each other at a small region of the network ( Figure 1B).

Differential Methylation Analysis
As a further regulon validation, we investigated the regulon methylation profile in an independent dataset involving ES cell lines and hMSC cell lines. Figure 1C shows that the regulons of the seven identified master regulators (ARNT2, CREB3L1, GLI3, MEF2C, PBX3, PAX7, and RUNX3) have more genes differentially methylated as expected by change in both network 1 and network 2.

Biological Function of Master Regulators Associated Genes
To perform the functional enrichment of the genes regulated by the seven master regulators (MRs), we considered collectively the genes associated with each MR in both networks 1 and 2. According to Figure 2, PAX7 regulated genes are associated with glycoprotein metabolic process, proteoglycan metabolism, and protein deacetylation. There are no biological functions significantly enriched for RUNX3 regulated genes, while ARNT2 associated genes are involved with synapse and postsynapse organization, and axogenesis. The regulons CREB3L1, GLI3, PBX3, and MEF2C share biological functions, mainly functions involved with extracellular matrix dynamics. For example, the GO term extracellular matrix organization is enriched in those four regulons. GO terms extracellular structure organization, connectivity tissue development, and chondrocyte differentiation are enriched in regulons CREB3L1, GLI3, and MEF2C, while MRs PBX3 and MEF2C regulates genes involved with endothelial and epithelial cell migration.

Regulon Activity
The regulon activity is calculated based on the expression of genes into the regulon. In other words, a regulon is considered to be activated when the genes identified as positively regulated by the transcription factor (in this case, the master regulator) are upregulated in the GSEA, and the genes negatively regulated by the TF are down-regulated in the GSEA. On the contrary, the regulon is considered inhibited when negatively regulated genes are on the GSEA top, and positively regulated genes are on the GSEA tail. We assessed regulon activity in the different samples used for networks 1 and 2 inference. Figure 3A shows the heatmaps involving the activity of the seven MR for the 117 samples used to infer network 1 (GSE34620) and the 85 samples used to infer network 2 (GSE63157). In both heatmaps, it is possible to observe two clusters: one formed by regulons PAX7 and RUNX3 and another formed by regulons ARNT2, CREB3L1, GLI3, MEF2C, and PBX3. The regulatory activity of these two clusters is antagonistic to each other in both cohorts: in general, patients with high RUNX3 and PAX7 regulon activity have low ARNT2, CREB3L1, GLI3, MEF2C, and PBX3 regulon activity and vice-versa ( Figure 3A). Figure 3B shows two subnetworks extracted from the regulatory networks 1 and 2. Both subnetworks include only the seven regulons used for heatmap analyses ( Figure 3A). Similar to Figure 1, the network is depicted according to regulon overlapping (i.e., according to genes mutually regulated), but here the edges indicate whether each pair of transcription factors regulates the shared genes in the same direction (agonistic regulation) or in the opposite direction (antagonistic regulation). As seen in Figure 3B, it is possible to observe two groups of regulons. PAX7 and RUNX3 regulate shared genes in the same direction. Similarly, the group formed by ARNT2, CREB3L1, GLI3, MEF2C, and PBX3 regulates the shared genes in the same direction ( Figure 3A and Figures A4 and A5). However, transcription factors of different groups always regulate simultaneously regulated genes in the opposite direction ( Figure 3B). According to Figure 3, regulons PAX7 and RUNX3 are simultaneously activated or inhibited and act coordinately by regulating shared genes in the same direction. The same occurs among regulons ARNT2, CREB3L1, GLI3, MEF2C, and PBX3. Moreover, both groups seem to work as antagonists between each other. . Node size reflects the number of regulated genes into regulon, and edge width reflects the number of mutually regulated genes between each regulon. Nodes are colored according to the cluster they belong to. Dotted edges represent regulatory antagonism, while solid edges represent regulatory agonism.

Survival Analysis
To evaluate the impact of the seven regulon activity in ES outcome, we accessed survival data available for the 85 patients from the cohort used to infer ES regulatory network 2 (GSE63157). The samples into the cohort were classified according to the activity of each regulon. Figure 4 shows the seven Kaplan-Meier plots corresponding to each regulon where six of them were significantly related to patient outcome. Again, PAX7 and RUNX3 present similar behavior regarding patient outcome with both regulons associated with a good prognosis when activated. In contrast, regulons ARNT2, CREB3L1, GLI3, and PBX3 are associated with bad prognosis when activated. Unfortunately, there is no survival data available regarding patients from cohort 1 (GSE34620). To verify the implication of regulon activity in patient outcome, we access another ES cohort composed by 44 patients (GSE17618). We evaluated the activity of the seven regulons inferred for network 1 in each of 44 patients with data available in GSE17618. Figure 5A shows the heatmap clustering based on regulon activity. The heatmap presents the same pattern observed in Figure 3A. PAX7 and RUNX3 regulons cluster together, in contrast to regulons ARNT2, CREB3L1, GLI3, and PBX3. Additionally, both PAX7 and RUNX3 are significantly related to good prognosis when activated. Among the other regulons, only ARNT2 was significantly related to patient outcome, being associated with bad prognosis when activated.

Master Regulators Expression in Ewing Sarcoma
We evaluated the expression of ES master regulators significantly associated with patient outcome (i.e., PAX7, RUNX3, ARNT2, GLI3, PBX3, and CREB3L1) in ES samples as well as in samples of the most common solid pediatric tumors: neuroblastoma, Wilm's tumor, hepatoblastoma, osteosarcoma, retinoblastoma, and rhabdomyosarcoma [36]. We also evaluated the expression of the above master regulators in a set of samples (N = 353) from 65 different healthy tissues ( Figure 6). PAX7 and RUNX3 genes were highly expressed in ES samples when compared with the other evaluated pediatric tumors and normal tissues. RUNX3 was also highly expressed in osteosarcoma when compared with normal tissue, neuroblastoma, Wilm's tumor, hepatoblastoma, retinoblastoma, and rhabdomyosarcoma. However, RUNX3 expression in ES was significantly higher when compared with the expression in osteosarcoma. The expression of the other four master regulators (ARNT2, GLI3, PBX3, and CREB3L1) had no significant difference among the samples, except by CREB3L1 gene which was highly expressed in osteosarcoma samples (Pairwise Wilcox test with Bonferroni correction, p-value < 0.01). In sense to validate the transcriptome data, the transcript levels of ES master regulators genes were evaluated in representative cell lines of ES, neuroblastoma, hepatoblastoma and medulloblastoma ( Figure 6B). The expression of PAX7 and RUNX3 were similar between ES cell lines and were significantly higher in RD-ES cell line in comparison to other tumor cell lines (p < 0.01). ARNT2 and PBX3 expressions were lower in ES cell line RD-ES than neuroblastoma cell line SH-5Y5Y (p < 0.001). GLI3 was highly expressed in medulloblastoma cell line D283 compared to RD-ES (p < 0.01). CREB3L1 expression was higher in the ES cell line RD-ES compared to neuroblastoma, hepatoblastoma (p < 0.001) and D283 cell line (p < 0.01), whereas no significant difference was observed in Daoy cells.

Discussion
The primary goal of this study was to reconstruct the ES regulatory network to understand the ES regulatory properties and, particularly, to identify transcription factors potentially relevant to that cancer. However, the uncertainty regarding ES cell of origin hinders the identification of master regulators. To surpass that limitation, we inferred two signatures using the two most accepted ES cells of origin: hMSC and hNCC [4]. The number of inferred master regulators vary according to the used signatures, but the master regulators ARNT2, CREB3L1, GLI3, MEF2C, PBX3, PAX7, and RUNX3 were found using either signature in both networks. It would be naive to consider only those seven TFs as Ewing Sarcoma regulators. It is reasonable to assume that other TFs identified as master regulators only when using hMSC signature could be relevant to ES transcriptional regulation, especially if hMSC were the ES cell of origin. The same is true for the hNCC signature. As example, the transcription factors NR0B1 and NKX2-2 are well-known to be associated with Ewing Sarcoma [37]. NR0B1 have been associated with the tumor phenotype mediated by EWS/FLI chimera in cell lines [38]. NKX2-2, a homeodomain transcription factor involved in neuroendocrine/glial differentiation and a downstream target of EWSR1-FLI1, has been reported as an immunohistochemical marker for Ewing sarcoma [39]. We identified both NR0B1 and NKX2-2 as MRs in our analysis. NR0B1 was identified as MR in network 2 using both hMSC and hNCC signatures (Tables A3 and A4, and Figures A2 and A3), while NKX2-2 was identified in network 1 using hNCC signature (Tables A2 and Figure A2), and in network 2 using the two signatures (Tables A3 and A4, and Figure A3). Because they were not always present in the four master regulator analysis (i.e., networks 1 and 2, using both hMSC and hNCC signatures), they were left out of the further analysis. Another sensitive point is the Ewing Sarcoma cell of origin. Despite hMSC and hNCC being the most supported cells of origin by the literature [2,10,[40][41][42][43][44], we can not neglect the possibility of other cells of origin. However, our stringent strategy assures that the seven master regulators inferred here are involved in the ES transcriptional regulation.
According to our results, the regulons ARNT2, CREB3L1, GLI3, MEF2C, and PBX3 act coordinately: (i) they are collectively activated or collectively inhibited in the majority of the patients from the three cohorts evaluated here; (ii) those five regulons always regulate shared genes in the same direction; (iii) except for MEF2C, all of those regulons are significantly related with poor prognosis in cohort GSE63157 (N = 85) when activated. The same statement is true for PAX7 and RUNX3, except that they are both associated with good prognosis when activated. We also evaluated overall survival of a small cohort (GSE17618) with 44 patients with similar results: PAX7 and RUNX3 are significantly associated with good prognosis when activated, and ARNT2 is significantly associated with poor prognosis when activated. Interestingly, both clusters-the first formed by ARNT2, CREB3L1, GLI3, MEF2C, and PBX3, and the second formed by PAX7 and RUNX3act collectively as reciprocal antagonists to each other regarding activation, regulation of shared genes, and implication in patient outcome. A similar behavior can be observed in the biological functions performed by the regulons. For instance, CREB3L1, GLI3, PBX3, and MEF2C share biological functions involved with extracellular matrix dynamics, and all those regulons are agonist among each other and cluster together.
The results of our methylation analysis show that all the seven regulons discussed in the manuscript are differentially methylated in an independent set of ES samples, reinforcing our findings and highlighting the importance of epigenomic reprogramming in the tumour regulation, as demonstrated by Sheffield et al that pointed out epigenetic heterogeneity in genetically homogeneous developmental cancers [45]. However, it was not possible to associate the methylation experiment with the regulon activity since we do not have the same set of samples with both methylation and expression experiments.
It is not clear why some patients have the cluster composed of PAX7 and RUNX3 regulons activated, while the same cluster is inhibited in other patients, and further investigations are needed to elucidate it. However, the differential regulon activity among patients seems to be important to tumor prognostic since we showed it is associated with overall survival. Similar result was observed in breast cancer, where it was possible to stratify a patient cohort based on regulon activity [13].In the same work, the authors have shown that the pharmacological inhibition of estrogen receptor drastically suppresses the ESR1 regulon. The ESR1 regulon has either estrogen-induced and estrogen-repressed genes. Therefore, their function as regulator occurs oppositely when the regulon is repressed and when the regulon is activated [13]. It suggests that the MR influence can be through the activation of its targets or by target inhibition, and both activation and repression of an MR might modulate the cell fate.
In other studies, ARNT2, CREB3L1, GLI3, and PBX3 have been associated with tumor progression. The role of CREB3L1 in tumor phenotype is controversial: while some studies associated this TF with metastasis promotion, other studies suggest its role in metastasis inhibition [46,47]. ARNT2 was pointed out as a key TF in the control of glioblastoma cell aggressiveness by regulating the expression of TFs related to a tumorigenic/stem glioblastoma signature [48]. PBX3 is well described as associated with poor prognostic in several cancer types [49], and the same is correct for GLI3 [50,51]. However, the role of those TFs has not been previously reported in ES. Previous studies have suggested that high expression of CREB3L1, GLI3, MEF2C, and PBX3 induces invasion and metastasis by promoting epithelial-mesenchymal transition (EMT) [47,51,52], a process reported as critical to induce metastasis in ES [53]. ARTN2 is associated with hypoxia response and acts as a dimerization partner of hypoxia-inducible factor 1α (HIF1α), which acts is adaptive stress response as well as angiogenesis required for tumor growth and metastasis [54,55].
Functional enrichment analysis has shown ARNT2, CREB3L1, GLI3, and PBX3 regulons related to several processes associated with cell migration such as extracellular matrix organization, collagen metabolic process, glycosaminoglycan (GAG) process, epithelial cell migration, and regulation of angiogenesis. Cell migration is an essential process for regulating cancer invasion. An initial step in cancer metastasis is the migration of tumor cells through the extracellular matrix (ECM) and into the lymphatic or vascular systems [56]. Progression to metastasis is associated with some biomechanical particularities, such as the restructuring of the extracellular matrix, collagen organization, ECM environments rich in the GAG, angiogenesis process, and epithelial cell migration [57][58][59][60] all process regulated by the ARNT2, CREB3L1, GLI3, and PBX3 regulons according to our results. Therefore, the high activity of ARNT2, CREB3L1, GLI3, and PBX3 regulons could be related to ES aggressiveness through stress adaptation, angiogenesis, and mesenchymal-like phenotype induction [61].
Recent studies suggested that EWSR1-FLI1 chimera regulates the expression of PAX7 and RUNX3 [62,63]. According to the available evidence, PAX7 is required for neural crest formation and adult skeletal muscle progenitor development [64]. PAX7 is described to be expressed in ES, subsets of rhabdomyosarcoma, and rare synovial sarcomas [65,66].  [66]. However, its role in tumor biology is uncertain. Charville and collaborators suggested that EWSR1-FLI1 binds to the PAX7 promoter, being the chimeric protein required for PAX7 expression in ES [62]. The transcription factor RUNX3 has been described as a tumor suppressor [68]. Bledsoe and collaborators verified that RUNX3 is expressed in ES cell lines as well as in tumor biopsies. Additionally, RUNX3 inhibition alters the expression of a set of genes regulated by EWSR1-FLI1 in A673 cell line. The authors also observed that suppression of RUNX3 expression in A673 reduced cell growth [63]. On the other hand, several studies demonstrated RUNX3 inhibition in different cancers, such as colorectal cancer, glioma, melanoma, and breast cancer [69][70][71][72].
The opposite behavior of PAX7 and RUNX3 when compared with ARNT2, CREB3L1, GLI3, and PBX3 suggests that the first two TFs could act by antagonizing the last four TFs action. As mentioned before, ARNT2, CREB3L1, GLI3, and PBX3 are well described in several tumors, while PAX7 and RUNX3 seem to be more specific to ES. When taken together, our results allow us to hypothesize that PAX7 and RUNX3 activation in ES could help mitigate the damaging effect caused by ARNT2, CREB3L1, GLI3, and PBX3 activation. PAX7 is known to be expressed in ES, subsets of rhabdomyosarcoma, and rare cases of synovial sarcomas, though only in ES samples it was found positive in all evaluated cases [65]. Charville and collaborators suggest that significant expression of PAX7 is a unique feature of rhabdomyosarcoma and ES, and put forward PAX7 as a diagnostic marker for ES diagnosis [62]. Additionally, we also identified RUNX3 as highly expressed in ES, corroborating with the similar regulatory behavior shared by those two transcription factors in ES regulatory network. High expression of PAX7 and RUNX3 genes could help mitigate EMT promoted by the cluster formed by ARNT2, CREB3L1, GLI3, and PBX3, contributing to avoid metastasis and therefore an aggressive behavior that often leads to death. Even though there is no evidence that PAX7 and RUNX3 promote mesenchymal-epithelial-transition (MET), which is the opposite of EMT, they may be able to avoid metastasis only by avoiding EMT.

Conclusions
The regulatory network analysis sheds light on the Ewing Sarcoma regulatory behavior by identifying PAX7 and RUNX3 as promising master regulators for this cancer. Both regulons are agonists, are simultaneously activated or inhibited in patient samples, and are both associated with a good prognosis. The analysis evinces another cluster of regulon consisting of ARNT2, CREB3L1, GLI3, MEF2C, and PBX3, which counteracts PAX7 and RUNX3 in all the parameters mentioned above, suggesting that the last two regulons counteract the former five regulons.

Acknowledgments:
We would like to thank the institutions and projects that funded this research, the High Performance Processing Unit (NPAD) for the high-performance computing infrastructure required for some of the analyses performed in this work, as well as the Gene Expression Omnibus (GEO) of the National Center for Biotechnology Information (NCBI) for hosting the public datasets. We would also like to thank the authors of the studies that generated the public data used in this work.

Conflicts of Interest:
The authors declare that the research was conducted in the absence of any commercial or financial relationships that could be construed as a potential conflict of interest.

Abbreviations
The following abbreviations are used in this manuscript:

2GSEA
Two- Tailed  Appendix A Figure A1. Master regulator analysis flowchart. Human mesenchymal stem cells (hMSC) and human neural crest cells (hNCC) were used along with ES cell lines to obtain the ES signatures. One analysis using hMSC as control and another using hNCC as control. Gene expression data from GSE34620 (N = 117) and GSE63157 (N = 85) were used to infer the regulatory networks 1 and 2, respectively. The networks were interrogated with the two gene signatures (disease signature with hMSC as control and disease signature with hNCC as control), and a set of shared master regulators were identified: ARNT2, CREB3L1, GLI3, MEF2C, PAX7, PBX3, and RUNX3. The meta-PCNA signature was used to filter for transcription factors commonly associated with cancer development. However, none from the ES MRs were filtered.  T   TAF1B   TAF5L   TAL1   TARDBP   TBX2   TBX21   TBX4   TBX6   TCEAL1   TCF15   TCF21   TCF25   TCF4   TCF7   TCF7L1  TCF7L2   TCFL5   TEAD1   TEAD4   TEF   TFAM   TFAP2B   TFAP4   TFCP2   TFCP2L1   TFDP1   TFDP2   TFEB   TFEC   TGIF2   THRA   THRB   TLX2   TP53   TRERF1   TRIM22   TRIM28   TRPS1   TSC22D1   TSC22D3   TSC22D4   TSPY26P   TTC36   TULP4 Figure A2. Tree-and-leaf representation of network 1 (GSE34620). Nodes are representation of regulons, labeled by the transcription factor that regulates them and colored according to its classification as master regulator for the hMSC signature (light green), hNCC signature (light blue), both signatures (red) and not a master regulator (black). The edges are the result of a hierarchical analysis on regulon overlap, i.e., the more regulated genes two regulons have in common, the closer they are in the representation.  T   TAF1B   TAF5L   TAL1   TBR1   TBX10   TBX19   TBX2   TBX21   TBX4   TBX6   TCEAL1   TCF15   TCF25   TCF4   TCF7   TCF7L2   TCFL5   TEAD1   TEAD1   TEAD3   TEAD4   TEF   TFAP2B   TFAP4   TFCP2   TFCP2L1   TFDP1   TFDP3   TFE3   TFEB   TFEC   THRA   THRB   TLX2   TMOD4   TP53   TP73   TRERF1   TRERF1   TRIM22   TRIM29   TRPS1   TSC22D4   TSPY26P   TULP4 Figure A3. Tree-and-leaf representation of network 2 (GSE63157). Nodes are representation of regulons, labeled by the transcription factor that regulates them and colored according to its classification as master regulator for the hMSC signature (light green), hNCC signature (light blue), both signatures (red) and not a master regulator (black). The edges are the result of a hierarchical analysis on regulon overlap, i.e., the more regulated genes two regulons have in common, the closer they are in the representation.  Figure A4. Regulatory map (TF-centric regulatory network) of the seven common master regulators in network 1 (GSE34620). Gray squares are transcription factors (TFs) that are here inferred as mater regulators (MRs) and circles are genes regulated by at least one of the seven MRs. The edges indicate regulation and the color of the edge indicates the type of regulation (red for positive regulation, activation, and blue for negative regulation, inhibition). The circles follow the same color scheme, except that whenever a gene is positively regulated by one TF and negatively regulated by another TF, it is colored gray.  Figure A5. Regulatory map (TF-centric regulatory network) of the seven common master regulators in network 2 (GSE63157). Gray squares are transcription factors (TFs) that are here inferred as mater regulators (MRs) and circles are genes regulated by at least one of the seven MRs. The edges indicate regulation and the color of the edge indicates the type of regulation (red for positive regulation, activation, and blue for negative regulation, inhibition). The circles follow the same color scheme, except that whenever a gene is positively regulated by one TF and negatively regulated by another TF, it is colored gray.