Transfer-RNA-Derived Fragments Are Potential Prognostic Factors in Patients with Squamous Cell Carcinoma of the Head and Neck

Transfer-RNA-derived fragments (tRFs) are a class of small non-coding RNAs that are functionally different from their parental transfer RNAs (tRNAs). tRFs can regulate gene expression by several mechanisms, and are involved in a variety of pathological processes. Here, we aimed at understanding the composition and abundance of tRFs in squamous cell carcinoma of the head and neck (SCCHN), and evaluated the potential of tRFs as prognostic markers in this cancer type. We obtained tRF expression data from The Cancer Genome Atlas (TCGA) HNSC cohort (523 patients) using MINTbase v2.0, and correlated to available TCGA clinical data. RNA-binding proteins were predicted according to the calculated Position Weight Matrix (PWM) score from the RNA-Binding Protein DataBase (RBPDB). A total of 10,158 tRFs were retrieved and a high diversity in expression levels was seen. Fifteen tRFs were found to be significantly associated with overall survival (Kaplan-Meier survival analysis, log rank test p-value < 0.01). The top prognostic marker, tRF-20-S998LO9D (p < 0.001), was further measured in tumor and tumor-free samples from 16 patients with squamous cell carcinoma of the oral tongue and 12 healthy controls, and was significantly upregulated in tumor compared to matched tumor-free tongue (p < 0.001). Results suggest that tRFs are useful prognostic markers in SCCHN.


Prediction of tRF Binding Proteins
tRFs can interact with RNA-binding proteins (RPBs) and regulate their function [23,35,36]. To estimate the role of tRFs in SCCHN, we used the RNA-Binding Protein DataBase (RBPDB) [37] to predict tRF binding proteins. The submitted tRF sequences were scanned with a set of Position Weight Matrix (PWM) in the database. Every site that scored at least 80% of the maximum possible score for that matrix was returned. The predicted Homo sapiens proteins were considered potential tRF binding proteins.

Relationship between tRF Level and Clinical Features
The top survival-associated tRF was selected to explore the relationship between gene expression and clinical features. Associations between categorized clinical variables and categorized tRF levels were determined by Chi-Square test. The nonparametric Mann-Whitney U test was used to study the difference between two groups of continuous variables. In multivariate Cox regression analysis, patient age at diagnosis, gender, HPV status and clinical stage were considered as covariates. Statistical tests were conducted in IBM SPSS Statistics 26 (IBM Corp., Armonk, NY, USA). A two-sided p-value < 0.05 was considered statistically significant.

Quantification of tRF Levels Using Reverse Transcription Quantitative PCR (RT-qPCR)
For the top survival-associated tRF, RT-qPCR was performed to measure expression levels in our clinical samples: 12 healthy volunteers and 16 patients with squamous cell carcinoma of the oral tongue (SCCOT) ( Table S1). The healthy volunteers and patients were not matched for gender and age, as the number of healthy individuals willing to donate tongue tissue was limited. As described previously [38,39], for healthy volunteers not exposed to classical oral cancer risk factors (smoking and alcohol), biopsies were taken from the lateral border of the tongue. For patients with SCCOT, biopsies were taken from the tumor, as well as clinically normal tongue contralateral to the tumor. The study was approved by the Regional Ethics Review Board, Umeå, Sweden (Dnr 08-003 M) and performed in accordance with the Declaration of Helsinki. Written informed consent was obtained from all participants.
TaqMan™ small RNA assays were selected for tRF quantification (Thermo Fisher Scientific, Waltham, MA, USA). tRF-specific reverse transcription primers and PCR primers were designed and produced by the company and the U6 TaqMan™ Small RNA Control was used as endogenous control. According to the manufacturer's protocol, tRF-specific cDNA was synthesized from 7.5 ng of total RNA. RT-qPCR was performed using the QuantStudio 6 Flex real-time PCR system with TaqMan™ Fast Advanced Master Mix (Thermo Fisher Scientific). The 2 −∆∆CT method was applied to calculate relative gene expression levels. Using SPSS Statistics 26, nonparametric Mann-Whitney U test or Wilcoxon signed-rank test were carried out to compare the difference between healthy controls and tumor-free samples, or between matched SCCOT and tumor-free samples, respectively.

Reported tRFs in the TCGA HNSC Cohort
TCGA has studied 523 patients with SCCHN using short RNA sequencing, including 525 tumor samples (523 primary and 2 metastatic) and 44 normal controls. In the MINTbase v2.0 database, expression of 10,158 tRFs were recorded for this cohort, representing 43% of all tRFs reported from all 32 TCGA cancer cohorts. Expression values of 2436 tRFs were reported only in single samples, 3194 in 2 to 9 samples, 2701 tRFs in 10 to 99 samples and 1827 tRFs in 100 to 569 samples. Levels of tRFs ranged from 1.002 RPM to 61,097.254 RPM. The maximum expression level was lower than 10 RPM for 7473 of 10,158 tRFs (74%).

Prognosis-Associated tRFs in SCCHN
To identify prognosis-associated factors, the 1827 tRFs reported in at least 100 samples were investigated. Among these, 53% belonged to the i-tRF class, 25% to the 5′-tRFs and 21% to the 3′-tRF. There were only nine 5′-tRNA halves, all of which were derived from mitochondrial tRNAs. Patients were classified into high-and low-level groups according to the median tRF level of tumor samples. Kaplan-Meier overall survival analysis using the Survminer R packages revealed that significant differences between high-and low-level groups were seen for 15 tRFs (log-rank test, p < 0.01, Table  1). Detailed information for the 15 tRFs, which are mapped to 59 genomic locations, is listed in Table  S2.

tRF-Interacting Proteins
The 15 tRFs with significant impact on survival were scanned through the RNA-Binding Protein DataBase (RBPDB). As shown in Table 1

Prognosis-Associated tRFs in SCCHN
To identify prognosis-associated factors, the 1827 tRFs reported in at least 100 samples were investigated. Among these, 53% belonged to the i-tRF class, 25% to the 5 -tRFs and 21% to the 3 -tRF. There were only nine 5 -tRNA halves, all of which were derived from mitochondrial tRNAs. Patients were classified into high-and low-level groups according to the median tRF level of tumor samples. Kaplan-Meier overall survival analysis using the Survminer R packages revealed that significant differences between high-and low-level groups were seen for 15 tRFs (log-rank test, p < 0.01, Table 1). Detailed information for the 15 tRFs, which are mapped to 59 genomic locations, is listed in Table S2.

The Association between tRF-20-S998LO9D and Clinical Factors
As shown in Table 1, according to the p-value of the Kaplan-Meier log rank test, tRF-20-S998LO9D, tRF-16-I8W47WB and tRF-16-884U1DD are the top three tRFs associated with patient survival (p = 0.003). Compared to the other two tRFs, expression tRF-20-S998LO9D, hereafter referred to as tRF-20, was reported in more tumor samples (443 primary SCCHN tumors, 1 metastatic sample and 16 normal controls). Therefore, we considered tRF-20 as the top-ranked tRF associated with patient survival. tRF-20 is classified as a 5 -tRF and exclusively derived from chromosome 1 tRNA86 ArgTCT . According to RBPDB prediction, this fragment binds to the translation initiation factor eIF4B. Based on these findings, we decided to further explore the relationship between tRF-20 and clinical features. Patients were divided into three groups based on log 2 -transformed RPM levels (less than 2, 2 to 4, higher than 4), as shown in Table S3. The Chi-square test indicated that there was no significant correlation between tRF-20 levels and gender, age, clinical stage or HPV status (p > 0.05). When comparing the 443 primary tumors to the 16 normal controls, significantly higher tRF-20 levels were seen in tumor samples (p < 0.001) (Figure 2A). We also investigated tRF-20 expression in different subtypes of SCCHN. As shown in Table S4, the top two tumor subsites demonstrating tRF-20 expression were tongue (n = 113) and larynx (n = 99). No significant difference in tRF-20 levels was seen between tongue and larynx cancer (p = 0.896). Expression of tRF-20 was found in 64 tumors in the mixed group of "overlapping lesion of lip, oral cavity and pharynx", with significantly higher levels compared to both tongue (p = 0.003) and laryngeal (p = 0.001) SCC. Kaplan-Meier survival plot indicated that patients with high tRF-20 have poorer overall survival than patients with low levels of tRF-20 ( Figure 2B). We further studied the impact of tRF-20 using multivariate Cox-regression analysis. Considering gender, age at diagnosis, HPV infection and clinical stage as co-variates, tRF-20 remained significantly associated with overall survival (p = 0.002, HR = 1.624, 95% CI = 1.213 to 2.219). ROC curve revealed that the specificity of tRF-20 in predicting patient overall survival is 0.614, and the sensitivity is 0.643 (AUC = 0.669, p < 0.001, Figure S1).

Investigation of tRF-20 in our Clinical Samples
The differential expression of tRF-20 was validated in a small cohort of patients with SCCOT. In agreement with the TCGA SCCHN cohort, tRF-20 was significantly upregulated in SCCOT samples compared to the matched tumor-free tongue tissue (p < 0.001, Figure 2C). Comparing tumor-free samples to healthy controls, no significant difference was found (p = 0.963, Figure 2C). Kaplan-Meier survival plot indicated that patients with high tRF-20 have poorer overall survival than patients with low levels of tRF-20 ( Figure 2B). We further studied the impact of tRF-20 using multivariate Cox-regression analysis. Considering gender, age at diagnosis, HPV infection and clinical stage as co-variates, tRF-20 remained significantly associated with overall survival (p = 0.002, HR = 1.624, 95% CI = 1.213 to 2.219). ROC curve revealed that the specificity of tRF-20 in predicting patient overall survival is 0.614, and the sensitivity is 0.643 (AUC = 0.669, p < 0.001, Figure S1).

Investigation of tRF-20 in Our Clinical Samples
The differential expression of tRF-20 was validated in a small cohort of patients with SCCOT. In agreement with the TCGA SCCHN cohort, tRF-20 was significantly upregulated in SCCOT samples compared to the matched tumor-free tongue tissue (p < 0.001, Figure 2C). Comparing tumor-free samples to healthy controls, no significant difference was found (p = 0.963, Figure 2C).

Discussion
Although several studies have described dysregulation of tRFs in cancer, their impact on patient survival remains poorly investigated. Here, we investigated the composition and abundance of tRFs in SCCHN, as well as their prognostic potential.
The MINTbase v2.0 is a web-accessible repository providing the most comprehensive tRF expression data in 32 TCGA cancer types, including SCCHN. It provides users with access to a large collection of tRFs, which were identified through a deterministic and exhaustive tRF mining pipeline [12]. However, so far, the MINTbase v2.0 only comprises mature tRNA derived fragments with an abundance of ≥1.0 RPM, meaning that if expression was not recorded, levels could be either 0 or just under the threshold. Due to the overall low read-count of tRFs and the low number of normal controls available, this introduces a problem when calculating differential expression between tumor and normal samples. To avoid this problem, we aimed at identifying tRFs related to prognosis in this study Among 10,158 detected tRFs in SCCHN, only 1827 were identified in at least 100 samples. Together with the large range in tRF levels among samples, it is obvious that tRF expression is highly diverse and probably varies between cell types and cellular states [7,36]. Nevertheless, according to Kaplan-Meier analysis, 15 tRFs were potential prognostic markers for patients with SCCHN.
Previously, differential expression of multiple 5 tRNA halves had been shown in serum and/or tumor tissue from SCCHN patients as compared to control counterparts, indicating the potential of 5 tRNA halves as diagnostic biomarkers for SCCHN [31,32]. However, in this study, among 10,158 MINTbase recorded tRFs for SCCHN, there were only 23 tRFs belonging to the class of 5 tRNA halves. Furthermore, no prognostic value of 5 tRNA halves for SCCHN was seen. Due to the technical limitation of TCGA data using short RNA sequencing, only fragments shorter than 30 nt could be reliably identified. Therefore, the contribution of long tRFs, such as tRNA halves that are Genes 2020, 11, 1344 7 of 10 produced under stress and involved in translation regulation [35,40,41], is likely to be underestimated in this study. The specificity and sensitivity of tRF-20 are not sufficient for predicting patient survival. Still, we could demonstrate a prognostic potential of tRF-20, and the performance of tRF-20 could be improved when a multiple biomarker panel is used. In clinical practice classical prognostic factors such as anatomic location, tumor size (T stage), nodal involvement (N stage), distant metastases (M stage), tumor grading, treatment and resection margins are widely used [29,42]. Of the many new prognostic biomarkers human papillomavirus status is the most robust prognostic factor for survival among patients with oropharyngeal SCC [29,43,44]. However, sensitivity and specificity of these markers have not been reported. There is thus an unmet need for statistical evaluation of markers with respect to sensitivity and specificity.
tRFs can bind to RNA or protein and affect mRNA translation [10]. To estimate the functions of the prognostic tRFs we found, the RBPDB database was used to predict RNA interacting proteins. Without validation, these results could be considered non-significant. However, we show a common feature of the predicted proteins being related to RNA splicing, except for eIF4B (eukaryotic translation initiation factor 4B). We found that eIF4B was predicted to interact with three tRFs. Previous studies have shown that eIF4B controls cell survival and proliferation and is regulated by oncogenic signaling pathways [45,46]. eIF4B is a cofactor of the eukaryotic initiation factor 4A (eIF4A), a subunit of the heterotrimeric cap-binding complex eIF4F. It has been reported that 5 -tRNA halves can bind directly or indirectly to eIF4G/eIF4A and reduce the rate of translation initiation [35]. However, interactions between 5 -tRFs and translation initiation factors have not yet been identified, even though some 5 -tRFs inhibit protein translation [47]. Under certain stress conditions, a 26 nt long 5 -tRF produced from valine-tRNA competed with mRNA for ribosome binding, resulting in global translation attenuation [2].
For the rest of the predicted tRF binding proteins, YBX1 (Y-Box Binding Protein 1) is the only one for which an interaction with specific tRFs has been previously shown. Goodarzi et al. reported that binding of hypoxic stress induced tRFs to this oncogenic RNA-binding protein led to destabilization of pro-oncogenic transcripts and thereafter suppresses the development of breast cancer metastasis [23]. RBMX (RNA Binding Motif Protein X-Linked), NONO (Non-POU Domain Containing Octamer Binding) and FUS (FUS RNA Binding Protein) are the three RNA-binding proteins that have been demonstrated to be potential cancer drivers [48]. The role of FUS as a tumor suppressor was seen in bladder urothelial carcinoma and SCCHN [48].
Due to limited and uneven sample size and incomplete clinical information (e.g., primary treatment) of the TCGA-HNSC cohort, the REMARK (Reporting Recommendations for Tumour Marker Prognostic Studies) guidelines [49] could not be completely followed in this study. The prognostic value of tRFs should therefore be further investigated in another cohort with sufficient patients and clinical data. According to TCGA data, significantly higher tRF-20 levels were seen in 443 tumor samples compared to 16 normal controls. As the number of tumor samples and normal controls is highly unbalanced, variability of tRF expression in the normal tissue could not be fully addressed. Nevertheless, using a small number of clinical samples, we successfully quantified the level of tRF-20 using RT-qPCR, and demonstrated that tRF-20 was significantly up-regulated in SCCOT compared to matched tumor-free tongue. Using our clinical samples, we found that there was no significant difference in tRF-20 expression between healthy controls and tumor-free samples. However, this result was limited due to the fact that the 12 healthy controls and 16 patients were not gender-and age-matched. TCGA data showed that the top two tumor subsites demonstrating tRF-20 expression was tongue and larynx. Due to the lack of laryngeal samples in our biobank, we could not verify expression of tRFs in this tumor subtype. Finally, as the number of our patients with tumors at different stages was small, no analysis of the impact of tRF-20 expression on patient survival was performed.

Conclusions
Analysis of TCGA data showed that tRFs act as prognostic markers in several different sub-sites of SCCHN. The top prognostic marker, tRF-20, emerged as a promising clinical biomarker and its upregulation in tumor was demonstrated in an independent group of patients with SCCOT.