OSmfs: An Online Interactive Tool to Evaluate Prognostic Markers for Myxofibrosarcoma

Myxofibrosarcoma is a complex genetic disease with poor prognosis. However, more effective biomarkers that forebode poor prognosis in Myxofibrosarcoma remain to be determined. Herein, utilizing gene expression profiling data and clinical follow-up data of Myxofibrosarcoma cases in three independent cohorts with a total of 128 Myxofibrosarcoma samples from The Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) databases, we constructed an easy-to-use web tool, named Online consensus Survival analysis for Myxofibrosarcoma (OSmfs) to analyze the prognostic value of certain genes. Through retrieving the database, users generate a Kaplan–Meier plot with log-rank test and hazard ratio (HR) to assess prognostic-related genes or discover novel Myxofibrosarcoma prognostic biomarkers. The effectiveness and availability of OSmfs were validated using genes in ever reports predicting the prognosis of Myxofibrosarcoma patients. Furthermore, utilizing the cox analysis data and transcriptome data establishing OSmfs, seven genes were selected and considered as more potentially prognostic biomarkers through overlapping and ROC analysis. In conclusion, OSmfs is a promising web tool to evaluate the prognostic potency and reliability of genes in Myxofibrosarcoma, which may significantly contribute to the enrichment of novelly potential prognostic biomarkers and therapeutic targets for Myxofibrosarcoma.


Introduction
Based on the morphologic and clinicopathologic classification criteria established in 2002 and redefined in 2013, the myxoid variant of malignant fibrous histiocytoma (MFH) with a predominant myxoid component (>50%) was renamed as Myxofibrosarcoma (MFS) by World Health Organization (WHO) [1,2]. MFS is a common adult sarcoma with the traits of curvilinear vessels in the variable myxoid stroma and multinodular growth of spindle to polygonal cells [3,4]. MFS tumors harbor highly complex karyotypes, often sharing the aberrations observed in leiomyosarcoma (LMS) and undifferentiated pleomorphic sarcoma (UPS) [5]. The overall recurrence rate of MFS is up to 50-60% [6]. To guide the clinical management of MFS, prognostic biomarkers are essential to be explored and developed.
There were three independent gene expression datasets with clinical follow-up information of MFS collected from GEO and TCGA by searching with the keywords of "Myxofibrosarcoma" and "survival" or "prognosis". In total, 128 unique MFS cases were chosen for OSmfs construction; the clinical characteristics of each dataset used in OSmfs were listed and summarized in Table 1. The three independent datasets of TCGA, GSE71118 [17], and GSE72545 [18] from TCGA and GEO databases include 25, 39, and 64 myxofibrosarcoma (MFS) samples, with 7, 10 and 21 death events, respectively. Furthermore, only GSE71118 has 10 metastatic MFS samples. Data in TCGA of MFS come from RNA sequencing; however, GSE71118 and GSE72545 belong to microarrays data. The data from different datasets may possess batch effects. The combined datasets mean that each cohort was divided separately into subgroups (based on high vs. low expression of an inputted gene), which are then pooled for survival analysis.

Design of OSmfs
The method to develop OSmfs has been previously described [15]. In short, the dynamic web interfaces are developed in HTML 5.0 and hosted by Tomcat in a Windows server system. The server-side scripts developed in Java control the output of the analysis results, the R package "R serve," acts as a middleware to connect R and Java. The SQL Server is used to store and integrate gene expression profiles and clinical data. As the webserver is "out-of-the-box", when users input the official gene symbol, the prognosis analyses will be performed by the R package "survival" to generate the Kaplan-Meier (KM) curves with a hazard ratio (HR, 95% confidence interval (CI)) and calculate log-rank p-value ( Figure 1A). OSmfs is freely accessible at http://bioinfo.henu.edu.cn/MFS/MFSList.jsp. side scripts developed in Java control the output of the analysis results, the R package "R serve," acts as a middleware to connect R and Java. The SQL Server is used to store and integrate gene expression profiles and clinical data. As the webserver is "out-of-the-box", when users input the official gene symbol, the prognosis analyses will be performed by the R package "survival" to generate the Kaplan-Meier (KM) curves with a hazard ratio (HR, 95% confidence interval (CI)) and calculate logrank p-value ( Figure 1A). OSmfs is freely accessible at http://bioinfo.henu.edu.cn /MFS/MFSList.jsp.

Venny Analysis
Utilizing the data of Cox analysis according to transcription profile data and clinical information in TCGA, GSE71118 and GSE72545, we picked out genes with prognostic significance (p < 0.05) in MFS in each of the three datasets, and then compared the three gene lists to look for the common prognostic genes in the three cohorts by Venny 2.0.2.

Receiver Operating Characteristic (ROC) Analysis
Utilizing overall survival (OS) event state (0: Alive; 1: Dead) information, we divided the transcriptional data of one gene into two groups. The area under the curve (AUC) scores represents the capacity for certain genes to predict alive and dead state in overall survival. An AUC of 0.5 represents a test with no discriminating ability, whereas an AUC of 1.0 represents a test with perfect discrimination.

Application of OSmfs
OSmfs is a web tool that assesses the prognostic value of genes in MFS. To apply OSmfs, users need to input an official gene symbol or a gene signature (one gene per line), specify the "Data

Venny Analysis
Utilizing the data of Cox analysis according to transcription profile data and clinical information in TCGA, GSE71118 and GSE72545, we picked out genes with prognostic significance (p < 0.05) in MFS in each of the three datasets, and then compared the three gene lists to look for the common prognostic genes in the three cohorts by Venny 2.0.2.

Receiver Operating Characteristic (ROC) Analysis
Utilizing overall survival (OS) event state (0: Alive; 1: Dead) information, we divided the transcriptional data of one gene into two groups. The area under the curve (AUC) scores represents the capacity for certain genes to predict alive and dead state in overall survival. An AUC of 0.5 represents a test with no discriminating ability, whereas an AUC of 1.0 represents a test with perfect discrimination.

Application of OSmfs
OSmfs is a web tool that assesses the prognostic value of genes in MFS. To apply OSmfs, users need to input an official gene symbol or a gene signature (one gene per line), specify the "Data Source" including "TCGA", "GSE71118", "GSE72545" and "Combined" (the combination of the above three datasets), specify the "Survival" including OS, disease-free interval (DFI), progression-free interval (PFI), progression-free survival (PFS), disease-specific survival (DSS) and metastasis-free survival (MFS), and select one cutoff (Upper 25%, Upper 30%, Upper 50%, Upper 25% vs. Lower 25%, Upper 30% vs. Lower 30%, Upper 50% vs. Lower 50%, Lower 25%, Lower 30%, Lower 50%, Trichotomy and Quartile) in "Split patients by" item. With regard to follow-up clinical information, when one selects TCGA or GSE72545 in the "Data Source" column, three clinical factors named Gender (All, Male and Female), Tumor depth (All, Deep, Superficial) and Age (Any scope) can be chosen ( Figure 1B) when one selects GSE71118 dataset, clinical factor Metastasis (Metastasis, No and All) can be selected to perform multivariate analysis whose results are illustrated through the Kaplan-Meier curves with HR (95% CI) and log-rank p-value, which allow users to evaluate the validity and reliability of prognostic biomarker candidates.

Validation of Prior MFS Biomarkers in OSmfs
To assess the property and reliability of OSmfs, we quarried 12 genes acting as poor prognostic biomarkers in either mRNA or protein level in MFS patients through retrieving Pubmed, then, cleared that the overexpression of 7 genes in the 12 genes, including Integrin Subunit α 10 (ITGA10) [18], CD109 [6], Cyclin Dependent Kinase 6 (CDK6) [19], Cyclin Dependent Kinase Inhibitor 2A (CDKN2A) [20], MET [21], Cyclin D1 (CCND1) [20] and EZR [22] predict adverse survival for MFS patients according to OSmfs online analysis (Table 2), however, the other 5 genes including AMACR [8], SKP2 [9], KRAS Proto-Oncogene G TPase (KRAS) [23], Epidermal Growth Factor Receptor (EGFR) [24], and Argininosuccinate Synthase 1 (ASS1) [25] were not identified as prognosticators in MFS possibly due to difference of the size or clinical information of datasets used in previous studies and that of the datasets adopted in OSmfs (Table 2). Moreover, the mRNA-level expression data were utilized in OSmfs while the prognostic role in MFS for genes in ever reports were concluded based on mRNA or protein-level data. Hence, under the absence of prognostic tools for MFS, OSmfs may be an efficient web tool to evaluate the prognostic value of genes and explore new biomarkers in MFS patients.

Identification of Potentially Novel Prognostic Biomarkers in MFS
To identify novel potential prognostic biomarkers in MFS patients, synthesized cox analysis of patients' survival was performed with data derived from TCGA, GSE71118 and GSE72545 listed in Table 1. The overlapping results of genes potentially and significantly presenting adverse prognosis illustrated that 7 genes were all associated with poor prognosis of MFS in the three cohorts (  (Figure 6), further indicating that the 7 genes may be equipped with the capacity to be prognostic markers.  patients' survival was performed with data derived from TCGA, GSE71118 and GSE72545 listed in Table 1. The overlapping results of genes potentially and significantly presenting adverse prognosis illustrated that 7 genes were all associated with poor prognosis of MFS in the three cohorts ( Figure  2).  (Table 3). To be specific, Lysphospholipase1 (LYPLA1) overexpression independently correlated with worse overall survival (p = 0.0223, HR = 5.8292) ( Figure  3A) according to TCGA, with adverse metastasis-free survival (p = 0.0108, HR = 5.1703) according to GSE71118 ( Figure 3B), with worse overall survival (p = 0.0067, HR = 3.3368) according to GSE72545 ( Figure 3C), and with worse overall survival (p = 0.0004, HR = 3.8115) according to combined analysis ( Figure 3D), respectively. The role as poor prognostic markers for the other 6 genes including DBF4B, MMP13, PLK1, TMEM158, WNT5B and RUNX2 was also reflected using Kaplan-Meier curves in Figure 4A-F. Novelly and in conclusion, in Pubmed searching, none of the 7 genes have been reported to have any association with the prognosis of MFS, suggesting the seven genes may be novelly and potentially prognostic markers in MFS.

Discussion
OSmfs, the first web tool to assess the prognostic potency and reliability of genes in MFS, may be a significant tool for the working scientific community and further contribute to the enrichment of MFS prognostic biomarkers, supplying assistance to doctors or medical oncologists; more exact diagnosis reference needs other validations. To investigate the credibility and specificity of OSmfs, we totally collected 12 genes clarified to possess worse survival in MFS patients. With regard to these markers, the prognostic significance of 7 genes analyzed in OSmfs was consistent with the results ever reported, however, the other 5 genes were not identified as prognosticators in MFS, possibly due to difference in the size or clinical information of datasets used in previous studies and that of the datasets employed in OSmfs. Moreover, the mRNA-level expression data were utilized in OSmfs while the prognostic role in MFS for genes in ever reports were concluded based on protein-level or mRNA data. In addition, the renaming of MFS in 2013 WHO classification which changes the scope of cases designated as MFS, may be also one reason why not all biomarkers candidates previously

Discussion
OSmfs, the first web tool to assess the prognostic potency and reliability of genes in MFS, may be a significant tool for the working scientific community and further contribute to the enrichment of MFS prognostic biomarkers, supplying assistance to doctors or medical oncologists; more exact diagnosis reference needs other validations. To investigate the credibility and specificity of OSmfs, we totally collected 12 genes clarified to possess worse survival in MFS patients. With regard to these markers, the prognostic significance of 7 genes analyzed in OSmfs was consistent with the results ever reported, however, the other 5 genes were not identified as prognosticators in MFS, possibly due to difference in the size or clinical information of datasets used in previous studies and that of the datasets employed in OSmfs. Moreover, the mRNA-level expression data were utilized in OSmfs while the prognostic role in MFS for genes in ever reports were concluded based on protein-level or mRNA data. In addition, the renaming of MFS in 2013 WHO classification which changes the scope of cases designated as MFS, may be also one reason why not all biomarkers candidates previously

Discussion
OSmfs, the first web tool to assess the prognostic potency and reliability of genes in MFS, may be a significant tool for the working scientific community and further contribute to the enrichment of MFS prognostic biomarkers, supplying assistance to doctors or medical oncologists; more exact diagnosis reference needs other validations. To investigate the credibility and specificity of OSmfs, we totally collected 12 genes clarified to possess worse survival in MFS patients. With regard to these markers, the prognostic significance of 7 genes analyzed in OSmfs was consistent with the results ever reported, however, the other 5 genes were not identified as prognosticators in MFS, possibly due to difference in the size or clinical information of datasets used in previous studies and that of the datasets employed in OSmfs. Moreover, the mRNA-level expression data were utilized in OSmfs while the prognostic role in MFS for genes in ever reports were concluded based on protein-level or mRNA data. In addition, the renaming of MFS in 2013 WHO classification which changes the scope of cases designated as MFS, may be also one reason why not all biomarkers candidates previously identified to evaluate patient survival can be verified in our web tool. Interestingly, 3 of the 5 biomarkers unverified in our tool were identified as prognostic biomarkers before 2013. To be highlighted, new potential prognostic biomarkers of MFS patients can also be explored using Cox regression analysis data constructing OSmfs, such as LYPLA1, DBF4B, MMP13, PLK1, TMEM158, WNT5B and RUNX2, they were here detected to be potential biomarkers to evaluate poor prognosis for MFS patients. LYPLA1 acting as a homodimer, exhibits both depalmitoylating as well as lysophospholipase activity, plays a tumor-promotor role in non-small cell lung cancer cells [26]. DBF4B is a serine-threonine kinase linking cell cycle regulation to genome duplication. DBF4B-FL is required for colon cancer cell proliferation and maintenance of genomic stability [27]. MMP13 is involved in the breakdown of extracellular matrix in normal physiological processes and disease processes [28], they play a vital role in the prognosis of various cancers such as gastric cancer [29], colorectal cancer [30], and oral squamous cell carcinoma [31]. PLK1 plays an important role in the initiation, maintenance, and completion of mitosis. Dysfunction of PLK1 may promote cancerous transformation and drive tumor progression. PLK1 overexpression was reported to be associated with poor prognoses in a variety of cancers [32]. TMEM158 facilitates the progression of several carcinomas such as pancreatic cancer [33]. WNT5B has been implicated in oncogenesis and developmental processes, including regulation of cell fate and patterning during embryogenesis. Ever report indicated WNT5B could serve as a prognostic biomarker in hepatocellular carcinoma [34]. RUNX2, a transcription factor, acts as an essential factor in osteoblast differentiation and bone development and regulates a much wider tissue range [35], it could promote breast cancer bone metastasis by increasing integrinα5-mediated colonization [36].
Receiver operating characteristic (ROC) curve analysis acting as an efficient tool is used extensively in medicine to present diagnostic accuracy. Recently, ROC analysis has been commonly used for characterizing the accuracy of medical imaging techniques, non-imaging diagnostic tests, and prediction/risk scores in various settings involving screening, prognosis, diagnosis, staging, and treatment [37]. The area under the ROC curve (AUC) is a global measure of the ability of a test to discriminate whether a specific condition is present or not [38]. ROC curve can also reflect the prognostic ability of the markers. AUC scores in ROC analyses represent the discriminative capacity of the markers [39]. The ROC analyses provide complementary information compared with Cox regression analysis of the potential and novel markers. In this study, the AUC scores which are all greater than 0.5 for the 7 genes indicated the discriminative capacity for the alive and dead state of overall survival in MFS.

Conclusions
Hence, OSmfs is a potential and significant prognostic analysis tool to evaluate the prognostic value of one gene or a signature in MFS. To be highlighted, the limitations for the tool including that the datasets used to build the tool are small, the datasets come from different platforms and the diagnosis cannot be 100% guaranteed for the included samples are really present. Based on the cancer statistics in 2019, the incidence of sarcomas was not high [40]. Myxofibrosarcoma (MFS) is a unique subtype of soft tissue sarcoma, approximately only 5% of soft tissue sarcoma is diagnosed to be Myxofibrosarcoma [1]. Hence, there are not much public data with regard to Myxofibrosarcoma. To collect as much data as possible, we selected data from different platforms including TCGA and GEO, which may be one reason why the diagnosis cannot be 100% guaranteed for the included samples. OSmfs will be gradually improved and updated to pull in new MFS patients' data available from TCGA or GEO datasets according to the new criterion defined by WHO in 2013.