1. Introduction
Osteosarcoma is the most common malignancy derived from bone, which is originated in mesenchymal tissue like many other sarcomas. It usually occurs in children and young adults as well as the elderly with a typical bimodal distribution in age [
1]. The incidence of osteosarcoma is 4.4 per-million among those aged 0–24 years old, and in the second peak of age the disease is usually secondary, accompanied with Paget’s disease or other bony lesions [
2]. The primary therapy for osteosarcoma is surgery, but the survival of patients with treatment of surgery alone remains disappointing, around 15–17% [
3]. At present, the therapy for osteosarcoma is a combination of chemotherapy and surgery, which could cure about 70% of the patients. For the patients with localized tumor, the reaction of chemotherapy is the best predictor to predict prognosis for now [
4]. However, for patients with recurrent or metastatic disease, their overall survival is still not optimistic, which remained at 20% for the last 30 years [
5]. Notwithstanding in the advances in surgical techniques, targeted therapy, and tumor immunity, the complications involved with infection and inconvenience resulting from limb-salvage surgery as well as the low survival rates makes it urgent to develop prediction methods for the improvement of survival of osteosarcoma patients [
2]. Prognostic biomarkers can provide information about the probable outcome of a cancer relative to disease progression, recurrence, or death [
6]. This could provide considerable help with patient stratification, treatment management, and monitoring disease status in clinical practice, such as offering personalized therapeutic schedules for osteosarcoma patients which would benefit enormously [
7]. Therefore, it will be very helpful for treatments of osteosarcoma if accompanied by suitable prognostic prediction.
Pseudogenes are a class of homologues of the corresponding functional genes, which are also known as ‘gene fossils’ or ‘junk genes’. They belong to a subclass of long non-coding RNAs [
8]. Their inability in expressing functional proteins was because of various mutations in their coding sequences including deletions, insertions, frameshift mutations, etc., which often lead to premature termination of codons [
9]. However, pseudogenes can still have numerous regulatory functions by being transcribed into small interference RNAs (siRNAs) [
10], competitive endogenous RNA (ceRNA) [
11], antisense transcripts [
12], and sequestering miRNAs [
13]. Studies have shown that aberrant expression of pseudogenes participate in many diseases including cancer [
14]. Some cancer related pseudogenes could regulate the expression of their corresponding coding genes such as KRAS and KRASP1, KRAS, and KRASP1 by sequestering the interacting miRNAs [
13]. Besides, studies have found aberrant expression of the pseudogenes of transcription factors is critical for maintaining embryonic stem cell pluripotency (i.e., NANOG and NONOGP1, POU5F1P1, and OCT4) in cancer [
15]. Some cancer related pseudogenes can be used as biomarkers for prognosis. In hepatocellular carcinoma, high expression level of the pseudogene RP11-564D11.3 is observed to be related with poor prognosis [
16], and in another study conducted by Ganapathi et al., they found that pseudogene SLC6A10P can work as a predictive marker for recurrence in high-grade ovarian cancer [
17]. However, due to lack of attention and limited number of samples, the potential of pseudogenes as biomarkers for prognosis in osteosarcoma has been not studied.
Recent development in bioinformatics and the availability of large-scale RNA-seq transcriptome data of multiple cancers with clinical follow-up data provide better approaches to explore the biomarkers for diagnosis and prognosis, allowing for a better understanding of the mechanism of cancer and improvement for patients’ outcome [
18]. However, most of these studies aimed to construct a diagnosis or prognosis signature are mainly focused on genes, lncRNAs, miRNAs, DNA methylation, as well as alternative splicing [
18,
19,
20,
21,
22], and pseudogenes’ potential as biomarker has been neglected in osteosarcoma, despite aberrant expression of pseudogenes have been found to be related to multiple pathological processes in cancer and work as a promising biomarker in other types of cancers [
23].
In our study, we applied machine learning analysis including univariate cox regression, LASSO cox regression, and multivariate cox regression analysis to construct a pseudogene-based signature to predict the prognosis outcome for osteosarcoma patients. First, by univariate regression, we identified survival related pseudogenes. Next, we narrow down the significantly prognosis related pseudogenes by LASSO regression and multivariate regression, from which we constructed a four-pseudogene based prognostic signature. Then we assessed the clinical utility of this prognostic model and explored its potential functions. Our findings provide new insights into predicting and evaluating the clinical outcome of osteosarcoma patients.
4. Discussion
With the application of chemotherapy in the 1970s, the treatment of osteosarcoma has made great progress. However, survival rate of metastatic and relapse cases remains to be unsatisfactory, and the poor prognosis of such patients is the major problem for osteosarcoma [
36]. Thus, identification of novel biomarkers to predict patients’ outcome might help to customize more personalized therapy and would be able to improve their prognosis. Growing evidence supports the role of pseudogenes in the oncogenesis and progression in different cancers [
23,
37,
38] and there are also a few studies which recognized the importance of pseudogenes in osteosarcoma [
39,
40]. High throughput RNA-seq has paved the way for exploitation of various biomarkers for the diagnosis and prognosis of many cancers including osteosarcoma [
18,
41,
42]. In this study, we took a systematic analysis for the potential role of pseudogenes as prognostic predictor and provided first evidence of survival related pseudogenes of osteosarcoma. We made several important discoveries during the course of this analysis. First, we identified 125 survival-related pseudogenes using univariate Cox analysis, and most of them are risk factors (91/125) which may play the oncogenesis role. Second, we identified a four-pseudogene signature and established a scoring system that was significantly associated with the OS of osteosarcoma patients. This signature helped to stratify the low- and high-risk groups and predicted the OS of osteosarcoma patients with high sensitivity and specificity. Out of four, RP4-706A16.3, is a risk factor and the another three are protective factors. Of the four pseudogenes we have identified, none were reported before, suggesting that these pseudogenes were newly found and required more attention. Third, in order to validate the applicability in different patients and extend the signature to various subgroups, K–M survival analysis and ROC curve analysis were performed in different subgroups. We found that it was independent of other potential predictors—including age, gender, and metastatic status—and the performance of predicting survival was satisfactory. As for the important clinical feature-stage, we did not perform related analysis on it due to the incompleteness of the stage information. Further studies and data are needed to uncover the role of stage. We visualized the pseudogene signature and the other clinical information by a nomogram to simplify the use of this signature in clinical practice. Last, to further understand the biological function and explore the underlying oncogenic mechanism of the four pseudogenes, co-expression analysis was employed. Results showed the four pseudogenes were involved in multiple biological processes and pathways including malignant phenotype, immune, and DNA/RNA editing, which might be the underlying mechanism of osteosarcoma progression. Last, we compared the gene signature and pseudogene signature by ROC curve and found the pseudogene signature is a little better than the gene signature. The AUC of them were very close indicating the two signatures may have similar performance. Maybe the patient sample size leads to these results, a large size cohort is needed to verify this finding.
There are some limitations and shortcomings in this study that cannot be ignored. First, this study was mainly focused on data mining and data analysis, which are based on methodology and the results were not validated using experiments. Further experiments are needed to verify the findings of this study. Second, the datasets we were able to obtain were limited as we could only obtain one osteosarcoma dataset that contained both patient RNA-seq data and clinical follow-up information. If there were another dataset that matched our requirements, it could have been used to further validate our results. Additional datasets should be included to obtain a better result. Besides, there is currently no other study exploiting pseudogene signature for osteosarcoma, meaning that we also cannot validate our result in another independent study. Third, when constructing a pseudogene signature for prognosis, one must take it into consideration of the application of such a model. Since different methods of detecting pseudogenes might lead to different results, the procedure of detection, quantification, and determination of transcriptional activity of pseudogenes must be standardized [
43]. Therefore, the four newly found prognosis-related pseudogenes deserve more attention and the next step for our research is to validate our results using experiments. We hope that these results could give other researchers inspiration for further study.