Identification of an RNA-Binding-Protein-Based Prognostic Model for Ewing Sarcoma

Simple Summary Ewing sarcoma (ES) is an aggressive childhood tumor for which response to chemotherapy is central to long-term prognosis, but few prognostic markers have been identified. RNA-binding proteins (RBPs) are strong regulators of cell behavior, working, for example, through post-translational modifications of mRNA. In this study, we investigated whether patterns in the RBP levels were related to outcomes in ES patients. A total of three distinct patterns were recognized, and additional modelling suggested that 10 RPBs had predictive value, suggesting that this model could be used in a clinical setting to identify patients with a higher risk of mortality. Abstract RNA-binding proteins (RBPs) are important transcriptomic regulators and may be important in tumorigenesis. Here, we sought to investigate the clinical impact of RBPs for patients with Ewing sarcoma (ES). ES transcriptome signatures were characterized from four previously published cohorts and grouped into new training and validation cohorts. A total of three distinct subtypes were identified and compared for differences in patient prognosis and RBP signatures. Next, univariate Cox and Lasso regression models were used to identify hub prognosis-related RBPs and construct a prognostic risk model, and prediction capacity was assessed through time-dependent receiver operating characteristics (ROCs), Kaplan–Meier curves, and nomograms. Across the three RBP subtypes, 29 significant prognostic-associated RBP genes were identified, of which 10 were used to build and validate an RBP-associated prognostic risk model (RPRM) that had a stable predictive value and could be considered valuable for clinical risk-stratification of ES. A comparison with immunohistochemistry validation showed a significant association between overall survival and NSUN7 immunoreactivity, which was an independent favorable prognostic marker. The association of RBP signatures with ES clinical prognosis provides a strong rationale for further investigation into RBPs molecular mechanisms.


Introduction
Ewing sarcoma (ES) represents the second-most common primary bone malignancy affecting children and adolescents, with an incidence of 2.9 per million/year [1][2][3]. It is an aggressive tumor typically characterized by a fusion of the Ewing sarcoma breakpoint region 1 (EWSR1) with an erythroblast transformation specific (ETS) transcription factor gene, most frequently (>95%) the friend leukemia virus integration 1 (FLI1) gene [4,5]. Most types of ES harbors the t (11;22) (q24;q12) chromosomal translocation leading to the EWS-FLI1 fusion transcript. Genome-wide association studies identifying molecular features and genetic profiles suggest a generally low mutational burden for ES [6][7][8]. The STAG2 gene is the most frequently mutated (17%) compared to normal paired cases,

ES Data Processing
Both the ES gene expression microarray data and clinical data were derived from the NCBI Gene Expression Omnibus (GEO) database: GSE63155 (46 ES cases, follow-up time: median 1876 days, range 155-3987 days) [24], GSE63166 (39 ES cases, follow-up time: median 2085 days, range 286-4500 days) [24], GSE17679 (32 ES cases, follow-up time: median 1159 days, range 138-5766 days) [25], and GSE34620 (38 ES cases, follow-up time: median 1549 days, range 270-4017 days) [26]. The patient characteristics involved in this study are shown in Table 1. In these four datasets, Affymetrix HuEx1.0 (GPL5175) and Affymetrix HG-U133 Plus 2.0 (GPL570) platform data were merged into one integrated dataset. Batch effects were removed using the ComBat function of the R package surrogate variable analysis (sva) [27], and expression values were quantile-normalized across different samples. A total of 85patients from GSE63155 and GSE63166 were involved in the training cohort, while 70 from GSE17679 and GSE34620 were included in the validation cohort. We obtained a comprehensive RBP catalog consisting of 1542 genes and 318 transcriptional factors (TFs) for further analysis [19,28]. Unsupervised clustering was performed using the R package ConsesusClusterPlus to classify patients into distinct subtypes according to RBP expression.

Identification of Differentially Expressed RBPs
The R package linear models for microarray data (LIMMA) [29] was used to perform differentially expressed RBP (DERBP) analysis in the training cohort by comparing clusters of patient subtypes that were significantly associated with overall survival. A false discovery rate (FDR) of <0.05 or p-value of <0.05 was set as the cut-off criterion. A Venn diagram was used to determine the overlapping DERBPs between two sets of comparisons. Additionally, the DETFs and DEHallmarks were generated based on the same criteria.

Protein-Protein Interaction (PPI) Network Construction and Functional Enrichment Analyses
To further explore the potential molecular functions, the DERBPs were submitted into the Search Tool for the Retrieval of Interacting Genes (STRING) database for building the protein-protein interaction (PPI) network [30]. The PPIs with combined scores ≥ 0.7 were selected, after which the network was constructed and visualized using Cytoscape software (version 3.8.2) (National Institute of General Medical Sciences, Bethesda, MD, USA) [31]. Any genes with a connectivity of ≥1 (node/edge) were screened as hub genes for downstream analyses. The top 50 central nodes were ranked by The MCC (maximum clique centrality) algorithm of the CytoHubba plug-in [32] in Cytoscape. The functional enrichment analyses were performed using the ClueGO and CluePedia plug-ins in Cytoscape [33,34]. The main parameters of the constructing network with ClueGO were as follows: marker list, homo sapiens; ontologies/pathways, GO (biological process) and REACTOME pathways; GO tree interval, level = 6; GO term/pathway selection, min # genes = 6 and % genes = 6.000; and GO term/pathway network connectivity (kappa score) = 0.45. Only GO terms/pathways with p < 0.01 were selected.

Prognosis-Related RBPs and the Interaction with TFs
After combining the expression levels of the hub genes with the survival status and follow-up time, a univariate Cox proportional regression analysis was used to determine the hub genes related to overall survival in the training cohort. The association between the expression level of significant survival-related genes and DETFs was assessed by Pearson correlation analysis. A correlation coefficient of ≥0.4 and p-value of ≤0.05 were set as the cut-off criteria. The Cytoscape ClusterViz plug-in [35] was employed to identify biological network modules between RBPs and TFs.

Establishment and Validation of an RBP-Associated Prognostic Risk Model (RPRM)
The survival-related genes were subjected to penalized multivariate Cox proportional hazard survival modeling by an algorithm for variable selection based on L1-penalized Least Absolute Shrinkage and Selection Operator (LASSO) regression [36]. The process of prognostic model construction was repeated for the combination of parameter values for 1000 iterations. Subsequently, the resulting models were combined through crossvalidation. The risk score formula for each sample was calculated as follows: Risk score = β1 * Exp1 + β2 * Exp2 + βi * Expi where β represents the coefficient value, and Exp represents the gene expression level. Subsequently, A time-dependent receiver operating characteristic (ROC) curve was per- formed to evaluate the model performance by calculating the area under the curve (AUC). As a result, patients were classified into high-or low-risk groups according to the median risk score. The RBP-associated prognostic risk model (RPRM) was externally validated by the validation cohort (GSE17679 and GSE34620) to access its generalizability. Afterwards, we developed a nomogram-combing gene expression to improve risk stratification and quantify the risk assessment for survival probability at one year, three years, and five years in individual patients.

Assessment of Gene Expression Level and Prognostic Significance in RPRM
Our group had previously identified several pathways that were significantly associated with first-line treatment failure in ES [22]. Here, a single sample gene set enrichment analysis (ssGSEA) was used to define the estimated enrichment score of the gene signature for each sample using gene set variation analysis (GSVA) in R package [37]. Hallmark gene sets were downloaded from the Molecular Signatures Database v7.4. Gene expression levels in RPRM were assessed between dead and live patients. Significant genes were cross-validated by the validation cohort and our ES RNA-seq data.

Immunohistochemistry
We investigated the expression of NSUN7 in ES tumor tissue from patients (n = 24) who underwent standard treatment protocols (ISG/SSG III or EuroEwing99/2012 protocols) [38]. These were therapy-naive preoperative biopsies (n = 9), but if these were unavailable for analysis, viable tumor cells from a resection specimen (n = 11) or metastatic specimen (n = 4) were used.
Immunohistochemical staining for NSUN7 was done using a manual protocol with a polyclonal antibody (anti-NSUN7, Product #PA5-54257 from ThermoFisher Scientific, Waltham, MA, USA) at 1:50 dilution. Formalin-fixed paraffin embedded (FFPE) tissue was sectioned in 4 um thick sections and deparaffinized. Antigen retrieval was performed for 10 min in low pH (citrate). Slides were incubated with the primary antibody at 4 • C overnight, followed by incubation with a biotinylated secondary antibody and signal generation using the VECTASTAIN ® Elite ABC-HRP Kit (Vector Laboratories, Burlingame, CA, USA).
Two clinical pathologists evaluated the staining, and after agreeing on a scoring method, the slides were scored independently in a blinded fashion. The agreement was substantial (82.6%, Cohen's K 0.620). There was disagreement in four cases, but consensus was reached after a second review. The slides fit into three scoring categories: • Negative ("Most of the tumor cells were completely negative"): >75% completely negative, <25% with very weak cytoplasmic staining); • Weak ("Cases exhibited a diffuse weak staining with small areas with negative or stronger immunoreactivity"): >75% of tumor cells exhibited weak immunoreactivity); and • Moderate ("Cases showed stronger cytoplasmic immunoreactivity in most of the tumor cells, but smaller areas with either weaker or negative immunoreactivity were sometimes observed"): >75% of the tumor cells showed moderate cytoplasmic reactivity.
Since the number of patients was limited, the staining groups were compared in two different fashions: all groups (negative vs. weak vs. moderate) or simplified (negative vs. positive).

Statistical Analysis
Normally distributed variables were analyzed with an unpaired two-tailed Student's t-test, whereas a Mann-Whitney U test or a Wilcoxon rank-sum test was used for nonnormally distributed variables. The relationship between the parameters was evaluated using the Spearman rank correlation coefficient. A log-rank test Kaplan-Meier curve and a Cox proportional hazard regression were performed for survival analysis using the R package "survminer" and "survival". Additionally, L1-penalized LASSO regression and ROC curves were conducted using the R package "glmnet" and "pROC", respectively [39][40][41]. R version 4.0.3 (R Foundation for Statistical Computing, Vienna, Austria) was used to execute all statistical tests and plots.

Data Preprocessing of the ES Dataset
Data from 155 ES patients with complete follow-up and expression data from GSE61355, GSE63156, GSE34620, and GSE17679 were bioinformatically pretreated to remove batch effects. The 15,016 overlapping genes across the four GEO datasets are depicted in the Venn diagrams in Figure S1A. Data before and after normalization were carefully inspected by principal component analysis (PCA), suggesting the batch effect was successfully removed using ComBat ( Figure S1B,C). In addition, the gene expression profiles were visualized in box plots to show the impact on batch effect removal ( Figure S1D,E).

Identification of DERBPs in ES Patients and Transcriptional Subtypes
To investigate how RBPs affect the prognostic significance in ES, we compared the gene expression between dead and alive patients using the Wilcoxon rank-sum test (p < 0.05) in the training cohort. This analysis identified 22 DERBPs, and the consensus k-mean clustering on the 22 RBP signatures clearly divided ES patients into three main subtypes with clustering stability decreasing for k = 2 to 6 ( Figures 1A and S2). They were clustered into RS1 (n = 32), RS2 (n = 42), and RS3 (n = 11) with distinct prognostic differences. Interestingly, patients in Cluster 1 had the best clinical prognosis, whereas patients in Cluster 3 had the poorest (log-ranktest, p < 0.05) ( Figure 1B). Following this, a heatmap visualized the expression levels of 22 RBPs in three RBP subtypes with survival status, where most were RS3s dominated by down-regulated transcripts ( Figure 1C). Among 1542 RBPs quantified in 85 ES patients with two comparisons (RS1 vs. RS2, RS1 vs. RS3), 377 (24.4%) were significantly down-regulated, and 374 (24.3%) were significantly upregulated in RS3 compared to RS1; 150 (9.7%) were significantly down-regulated, and 137 (8.9%) were significantly up-regulated in RS2 compared to RS1 (Wilcoxon rank-sum test, FDR < 0.05; Figure 1D). A total of 90 overlapping up-regulated and 125 overlapping down-regulated DERBPs were identified by the two comparisons and selected for further analysis ( Figure 1E).

Functional Enrichment Analysis and PPI Network of DERBPs
Among the DERBPs, we wanted to identify the key components using co-variation analysis. Functional enrichment analysis was performed for the 218 overlapping DERBPs: 90 up-regulated, 125 down-regulated, and 3 with distinct expression profiles. Gene ontology (GO) biological process and REACTOME pathway analyses were applied to clarify the DERBP gene function. As shown in Table S1, most of the DERBPs were enriched in the GO categories of RNA metabolism (i.e., RNA processing, RNA metabolic process, and RNA catabolic process) and protein metabolism (i.e., translation, ribonucleoprotein complex assembly). In addition, enrichment analysis of the REACTOME pathways revealed that RNA metabolism and mRNA splicing and translation were mostly enriched for DERBPs (Table S2). The GO term/pathway network is shown in Figure 2A,B (p < 0.01). Next, we built a high-confidence PPI network that contained 171 nodes (genes) and 1234 edges, which were selected for further analysis. To identify the top 50 hub nodes, the cytoHubba plug-in was applied, and key genes were selected from the PPI network according to the MCC score ( Figure 2C). Detailed gene descriptions and connectivity degrees of the top 15 hub genes are shown in Table S3. EIF4A3 was identified as having the largest number of edges interacting with 54 other genes, followed by POLR2F and SRSF1. Overlap up-regulated and down-regulated DEREPs between RS1 versus RS2 and RS1 versus RS3.

Functional Enrichment Analysis and PPI Network of DERBPs
Among the DERBPs, we wanted to identify the key components using co-variation analysis. Functional enrichment analysis was performed for the 218 overlapping DERBPs: 90 up-regulated, 125 down-regulated, and 3 with distinct expression profiles. Gene ontology (GO) biological process and REACTOME pathway analyses were applied to clarify the DERBP gene function. As shown in Table S1, most of the DERBPs were enriched in the GO categories of RNA metabolism (i.e., RNA processing, RNA metabolic process, and RNA catabolic process) and protein metabolism (i.e., translation, ribonucleoprotein complex assembly). In addition, enrichment analysis of the REACTOME pathways revealed that RNA metabolism and mRNA splicing and translation were mostly enriched for DER-BPs (Table S2). The GO term/pathway network is shown in Figure 2A,B (p < 0.01). Next, we built a high-confidence PPI network that contained 171 nodes (genes) and 1234 edges, which were selected for further analysis. To identify the top 50 hub nodes, the cytoHubba plug-in was applied, and key genes were selected from the PPI network according to the MCC score ( Figure 2C). Detailed gene descriptions and connectivity degrees of the top 15 hub genes are shown in Table S3. EIF4A3 was identified as having the largest number of edges interacting with 54 other genes, followed by POLR2F and SRSF1.

Prognosis-Related RPBs and the Regulatory Network
To investigate the prognostic significance of these 171 RBPs involved in the PPI network, a univariate Cox regression analysis was performed from which 29 prognosticassociated hub RBP genes were obtained: 12 having favorable factors with a hazard ratio (HR) of <1, and 17 having risk factors (HR > 1) ( Figure 3A).
Next, we identified DETFs across the three RSs to explore the regulatory mechanisms of these RPBs. We found 144 TFs significantly expressed in RS3 compared to RS1, and 28 TFs significantly expressed in RS2 compared to RS1 (FDR < 0.05). The overlapping 18 DETFs in the two comparisons were then selected ( Figure 3B). A heatmap was constructed to indicate their expression levels in three subtypes with survival status ( Figure 3C). We consequently created a regulatory network based on these 18 TFs and our 29 prognosis-related RPBs. A correlation score of more than 0.4 and p < 0.01 were set as the cut-off values. The TFbased regulatory network topology was grouped into three clusters through the ClusterViz plug-in in Cytoscape. Notably, the schematic clearly illustrated the regulatory relationships among these TFs and RBPs ( Figure 3D, Table S4). . GO terms (biological process) grouped network (kappa score levels ≥ 0.45, p < 0.01). Ellipse represents the GO terms. The node size represents the significance of the term enrichment (p-value), and the colors represent different functional groups. (B). The REACTOME pathway grouped network (kappa score levels ≥ 0.45, p < 0.01). Ellipse represents the GO terms. The node size represents the significance of the term enrichment (p-value), and the colors represent different functional groups. (C). The MCC score of top 50 genes in the PPI network of DERBPs (combined score ≥ 0.7) is represented by a red to yellow gradient.

Prognosis-Related RPBs and the Regulatory Network
To investigate the prognostic significance of these 171 RBPs involved in the PPI network, a univariate Cox regression analysis was performed from which 29 prognostic-associated hub RBP genes were obtained: 12 having favorable factors with a hazard ratio (HR) of <1, and 17 having risk factors (HR > 1) ( Figure 3A).
Next, we identified DETFs across the three RSs to explore the regulatory mechanisms of these RPBs. We found 144 TFs significantly expressed in RS3 compared to RS1, and 28 TFs significantly expressed in RS2 compared to RS1 (FDR < 0.05). The overlapping 18 DETFs in the two comparisons were then selected ( Figure 3B). A heatmap was constructed to indicate their expression levels in three subtypes with survival status ( Figure 3C). We consequently created a regulatory network based on these 18 TFs and our 29 prognosisrelated RPBs. A correlation score of more than 0.4 and p < 0.01 were set as the cut-off values. The TF-based regulatory network topology was grouped into three clusters through the ClusterViz plug-in in Cytoscape. Notably, the schematic clearly illustrated the regulatory relationships among these TFs and RBPs ( Figure 3D, Table S4). . GO terms (biological process) grouped network (kappa score levels ≥ 0.45, p < 0.01). Ellipse represents the GO terms. The node size represents the significance of the term enrichment (p-value), and the colors represent different functional groups. (B). The REACTOME pathway grouped network (kappa score levels ≥ 0.45, p < 0.01). Ellipse represents the GO terms. The node size represents the significance of the term enrichment (p-value), and the colors represent different functional groups. (C). The MCC score of top 50 genes in the PPI network of DERBPs (combined score ≥ 0.7) is represented by a red to yellow gradient.

Construction and Validation of the RPBs-Associated Prognostic Risk Model (RPRM)
After identifying the 29 prognosis-related RBPs, 1000 iterations of LASSO-penalized multivariate modeling were constructed, which led to 10 features with a non-zero, coefficientbased-risk model called RPRM (Figure 3E,F). We then computed the RPRM for each patient based on the risk score formula. The ROC curve exhibited significant prognostic performances for the AUC (1-year, 3-year, and 5-year OS predictions were 0.960, 0.915, and 0.817 in the training cohort and 0.745, 0.700, and 0.792 in the validation cohort, respectively ( Figures 3G and 4D). The samples in both the training and validation cohorts were subsequently separated into high-and low-risk groups according to the median risk score. Assessments of the survival of the two groups by Kaplan-Meier estimates showed that high-risk patients had a significantly worse overall survival than the low-risk patients in both cohorts (p < 0.001) ( Figure 4A,E). Moreover, the distribution of risk score, survival time, and survival status of each patient is illustrated in Figure 4B. Among the 10 significant genes in the risk model, NSUN7, RPL15, ZCCHC6, DCP1B, and GPATCH8 were associated with a favorable prognosis, while DDX23, STRAP, PRDX1, RPL6, and DCAF13 were considered to be risk factors ( Figure 4C). Subsequently, a nomogram combining these genes was generated, and each patient was assigned a series of scores corresponding to all of the

Construction and Validation of the RPBs-Associated Prognostic Risk Model (RPRM)
After identifying the 29 prognosis-related RBPs, 1000 iterations of LASSO-penalized multivariate modeling were constructed, which led to 10 features with a non-zero, coefficient-based-risk model called RPRM (Figure 3E,F). We then computed the RPRM for each patient based on the risk score formula. The ROC curve exhibited significant prognostic performances for the AUC (1-year, 3-year, and 5-year OS predictions were 0.960, 0.915, and 0.817 in the training cohort and 0.745, 0.700, and 0.792 in the validation cohort, respectively ( Figures 3G and 4D). The samples in both the training and validation cohorts were subsequently separated into high-and low-risk groups according to the median risk score. Assessments of the survival of the two groups by Kaplan-Meier estimates showed that high-risk patients had a significantly worse overall survival than the low-risk patients in both cohorts (p < 0.001) ( Figure 4A,E). Moreover, the distribution of risk score, survival Since our group previously revealed that a series of pathways (e.g., apoptotic process, PI3K pathway, RNA splicing, rRNA metabolic process, glycolysis) were significantly associated with first-line treatment failure in ES patients, we performed an ssGSEA analysis in 50 hallmark gene sets to determine the relationship between pathway-enrichment and risk scores (Table S5). As shown in Figure 4G, 23 hallmark gene sets were significantly enriched between high-and low-risk patients, in which only protein secretion and bile-acid metabolism was positively associated with better prognosis. A total of ten overrepresenting gene sets are illustrated in Figure S3A (−log 10 (FDR) > 3). Notably, positive correlations between the risk and ssGSEA scores in high-risk patients were shown in 16 gene sets (p < 0.05) ( Figure 4H, Table S6), where the reactive oxygen species pathway, increased UV response, and glycolysis represent the top three highest correlation coefficients (p < 0.001; r = 0.67, 0.61, and 0.60, respectively) ( Figure S3B-D).
Cancers 2021, 13, x FOR PEER REVIEW 9 of 15 time, and survival status of each patient is illustrated in Figure 4B. Among the 10 significant genes in the risk model, NSUN7, RPL15, ZCCHC6, DCP1B, and GPATCH8 were associated with a favorable prognosis, while DDX23, STRAP, PRDX1, RPL6, and DCAF13 were considered to be risk factors ( Figure 4C). Subsequently, a nomogram combining these genes was generated, and each patient was assigned a series of scores corresponding to all of the involved genes. Then the 1-, 3-, and 5-year survival probabilities were projected to the final sum of the scores ( Figure 4E). Since our group previously revealed that a series of pathways (e.g., apoptotic process, PI3K pathway, RNA splicing, rRNA metabolic process, glycolysis) were significantly associated with first-line treatment failure in ES patients, we performed an ssGSEA analysis in 50 hallmark gene sets to determine the relationship between pathway-enrichment and risk scores (Table S5). As shown in Figure 4G, 23 hallmark gene sets were significantly enriched between high-and low-risk patients, in which only protein secretion and bile- . Kaplan-Meier curves show overall survival in validation groups between high-and low-risk patients (p < 0.001). (F). Nomogram for predicting 1-, 3-, and 5-year survival probability of ES patients in the training group. The total score of these genes for each patient is on the total points axis, which corresponds to the survival probabilities plotted on the three axes below. (G). Abundances of 23 DEHallmarks between high-and low-risk patients in the three RSs. (H). The correlation between LDA score and enrichment scores of 23 DEHallmarks. * p < 0.05; ** p < 0.01, *** p < 0.001.

Validation of the Prognostic Value and Expression of the RBPs Involved in RPRM
To further assess the independent prognostic value of each key RBPs in ES, the Kaplan-Meier estimates were used to evaluate the relationship between these key RBPs and OS. As a result, six gene expressions (DCP18, DDX23, GRATCH8, NSUN7, RPL6, and ZCCHC6) were identified as being significantly associated with OS in the training cohort: high (≥median) and low (<median). A total of four of these were beneficial to prognosis, and two were associated with poor outcomes (p < 0.05) (Figures 5A and S5). The Kaplan-Meier plots of the other remaining four genes are shown in Figure S4 (p > 0.05). Additionally, we found that the expression level of DDX23, RPL6, and NSUN7 were significantly correlated with survival status (Wilcoxon rank-sum test, p < 0.05, Figure 5B), but only NSUN7 could be verified (log-rank test, Wilcoxon rank-sum test, p < 0.05, Figure 5C  The immunoreactivity of NSUN7 was scored as negative (n = 8), weak (n = 6) or moderate (n = 10) in 24 patients (Table S7). The proportion of negative cases was higher in pre-treated tissue (7/15) than in therapy-naïve (1/9) biopsies, which may indicate that NSUN7-negative cells are more resistant to treatment. In the therapy-naive biopsies, 5/9 cases were classified as good responders according to the treatment protocol (<10% viable tumor cells after treatment), and in the post-therapy specimens, only 2/11 were good responders. We found a significant association between NSUN7 immunoreactivity and overall survival (Fisher's exact test, p = 0.0069, Figure 6A) and that NSUN7 negative cases had a shorter overall survival compared to positive ones (log-rank test, p = 0.044, Figure 6B). From the preoperative biopsies, only a single patient died (scored as negative for NSUN7). The prognostic association across three comparisons (negative vs. weak vs. moderate) are shown in Figure S6 and Figure 6C (chi-square test, p = 0.0084; log-rank test, p = 0.101).

Discussion
Our group had previously reported an association between RBP expression and treatment resistance in Ewing sarcoma 22 . Here, we sought to evaluate the putative relationship between RBPs and patient outcome in four previously published transcriptome datasets. We identified three RBP-related transcriptional subtypes that were significantly associated with overall patient survival. We constructed and validated a 10-gene prognostic risk model that showed good clinical applicability as a prognostic marker for ES patients. Finally, one individual transcript in the model (NSUN7) was validated on a protein level in a separate cohort.
It is well known that RBPs are critical for various biological processes. Perhaps not surprisingly, we identified 171 prognostic-associated RBPs that were mostly enriched in RNA (mRNA, ncRNA, rRNA) metabolism, splicing, and translation, and these were similar to our previous findings from the RNA-sequencing of a single Ewing sarcoma cohort. Differential expression between the three RBP-related subtypes also identified 18 transcription factors that significantly co-varied with the 171 RBPs, potentially identifying the regulatory network that define the three RBP subtypes. While many of the identified transcription factors were described as central to cancer development (including SMARCB1, These results strongly suggest that NSUN7 may be an important biomarker for ES therapy resistance and that the absence of immunoreactivity may be a prognostic marker.

Discussion
Our group had previously reported an association between RBP expression and treatment resistance in Ewing sarcoma [22]. Here, we sought to evaluate the putative relationship between RBPs and patient outcome in four previously published transcriptome datasets. We identified three RBP-related transcriptional subtypes that were significantly associated with overall patient survival. We constructed and validated a 10-gene prognostic risk model that showed good clinical applicability as a prognostic marker for ES patients. Finally, one individual transcript in the model (NSUN7) was validated on a protein level in a separate cohort.
It is well known that RBPs are critical for various biological processes. Perhaps not surprisingly, we identified 171 prognostic-associated RBPs that were mostly enriched in RNA (mRNA, ncRNA, rRNA) metabolism, splicing, and translation, and these were similar to our previous findings from the RNA-sequencing of a single Ewing sarcoma cohort. Differential expression between the three RBP-related subtypes also identified 18 transcription factors that significantly co-varied with the 171 RBPs, potentially identifying the regulatory network that define the three RBP subtypes. While many of the identified transcription factors were described as central to cancer development (including SMARCB1, MAZ, EZH1 and H2AFX), none had an established role in ES, likely due to the limited number of functional studies of this tumor type.
Current chemotherapy protocols can cure a significant proportion of ES patients, but progression under chemotherapy is likely to drive resistance mechanisms. It may therefore be valuable to identify patients with a high likelihood of limited treatment response to offer alternative therapies in clinical trials. Since our 10-gene prognostic model was based on patients with current treatment regimes, our model could be used to identify patients who are at a high risk of death and who derive little benefit from current chemotherapy protocols.
When considering each gene in the model, only six were significantly associated with patient outcomes in the training cohort. The expression levels of three (DDX23, RPL6, and NSUN7) were related to survival status, but only NSUN7 was validated as an individual marker, which may be related to the semi-quantitative nature of immunohistochemistry protocols. While this may sound disheartening, it should be interpreted as meaning that multiple gene analyses are required to create a functional prognostic panel for ES. For comparison, validated and implemented prognostic panels for breast cancer contaiñ 50 genes [42].
Consistent with our previous findings, we also found that cancer-related pathways were mostly activated in the ES samples with high-risk scores. Of these, high expression of hypoxia-inducible factor 1 α (HIF1α) emerged as an novel hallmark pathway, which was described to be correlated with a hypoxic tumor microenvironment in ES [43]. These findings are in line with the well-known prognostic value of serum lactate dehydrogenase (LDH) in ES patients [44].

Conclusions
In summary, this study identified three distinct RBPs signatures in ES. Through DERBP stratification, an RPRM risk model was constructed and validated to be a potential prognostic factor for ES patients. Major limitations of this study should be noted, including a relatively small sample size (155 patients) and the lack of detailed clinical characteristics for many of the cases. However, our results provide a strong rationale for the continued investigation of RBPs since they seem central to chemotherapy resistance and ES patient outcomes.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/10 .3390/cancers13153736/s1, Figure S1. GEO data processing and normalization (A). A Venn diagram illustrates the overlap of the genes across the four involved GEO datasets of ES. (B,C). The PCA plots show the overall GEO data profiles of GSE17679, GSE34620, GSE61355, and GSE61366 before and after removing batch effect, respectively. (D,E). The box plots show the expression profiles of GSE17679, GSE34620, GSE61355, and GSE61366 before and after removing batch effect, respectively. Figure S2. Consensus matrices of identified clusters (k = 2, 4-6); Figure S3. (A). The distribution of significant gene sets enriched in high-risk patients at cutoff with −log10 (FDR) > 3. (B-D). Violin plots show the correlation between risk score and enrichment scores of reactive oxygen species pathway, UV response up, and glycolysis, respectively. Figure S4. Kaplan-Meier curves show the overall survival in training group with high-risk and low-risk subgroups by DCF13, PRDX1, RPL15, and STRAP, based on the median value of these gene expression levels, respectively. (p < 0.05). Figure S5. (A). Kaplan-Meier curves show the overall survival in training group with high-risk and low-risk subgroups by DCP1B, GPATCH8, and ZCCHC6, based on the median value of these gene expression levels, respectively. (p < 0.05). (B). The expression levels of DCP1B, GPATCH8, and ZCCHC6 in training group by survival status. The Wilcoxon rank sum test was used to compare the differences between groups. * p < 0.05. Figure S6.   Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.

Conflicts of Interest:
The authors declare no conflict of interest.