miRNA Expression Signatures of Therapy Response in Squamous Cell Carcinomas

Simple Summary miRNAs play role in various diseases and can also modulate therapy response. Our aim was to identify predictive miRNAs in platinum treated squamous cell carcinomas (SCC). Using a set of 266 squamous cancer samples we uncovered 16, 103, and 9 miRNAs correlated to chemotherapy response in the cervical, head and neck, and lung squamous cell carcinomas, respectively. By employing a logistic regression model, a signature comprising a set of six miRNAs was established capable to predict chemotherapy response with an AUC of 0.897. Our results show common molecular features of SCC tumors and pinpoint the most important miRNAs related to treatment outcome. Abstract Introduction: Squamous cell carcinomas (SCC) are a major subgroup of malignant tumors with a platinum-based first-line systematic chemotherapy. miRNAs play a role in various diseases and modulate therapy response as well. The aim of this study was to identify predictive miRNAs in platinum-treated SCCs. Methods: miRNA expression data of platinum-treated head and neck (HNSC), cervical (CESC) and lung (LUSC) cancer were collected from the TCGA repositories. Treatment response was defined based on presence or absence of disease progression at 18 months. Responder and nonresponder cohorts were compared using Mann–Whitney and Receiver Operating Characteristic tests. Logistic regression was developed to establish a predictive miRNA signature. Significance was set at FDR < 5%. Results: The integrated database includes 266 SCC patient samples with platinum-based therapy and available follow-up. We uncovered 16, 103, and 9 miRNAs correlated to chemotherapy response in the CESC, HNSC, and LUSC cohorts, respectively. Eight miRNAs overlapped between the CESC and HNSC subgroups, and three miRNAs overlapped between the LUSC and HNSC subgroups. We established a logistic regression model in HNSC and CESC which included six miRNAs: hsa-miR-5586 (Exp (B): 2.94, p = 0.001), hsa-miR-632 (Exp (B): 10.75, p = 0.002), hsa-miR-2355 (Exp (B): 0.48, p = 0.004), hsa-miR-642a (Exp (B): 2.22, p = 0.01), hsa-miR-101-2 (Exp (B): 0.39, p = 0.013) and hsa-miR-6728 (Exp (B): 0.21, p = 0.016). The model using these miRNAs was able to predict chemotherapy resistance with an AUC of 0.897. Conclusions: We performed an analysis of RNA-seq data of squamous cell carcinomas samples and identified significant miRNAs correlated to the response against platinum-based therapy in cervical, head and neck, and lung tumors.


Introduction
The vast majority of malignant tumors have epithelial origin. These tumors are termed "carcinomas" and can be further divided into adenocarcinoma and squamous cell carcinoma. The most affected anatomical sites where squamous cell carcinoma (SCC) can

Results
Overall, 1309 (CESC: 307, HNSC: 524, LUSC: 478) patients with squamous cell carcinomas were identified in the GDC data portal. From these patients, we excluded those with insufficient clinical data or without a platinum-based (cisplatin or carboplatin) chemotherapy. Finally, altogether 266 patients were retained for the analysis: 94 patients in the CESC subgroup (16 nonresponder and 78 responder), 105 patients in the HNSC subgroup (34 nonresponder and 71 responder) and 67 patients in the LUSC subgroup (16 nonresponder and 51 responder). Aggregate clinical characteristics of the samples are presented in Table 1.
The expression dataset contains information for 1881 miRNAs. From these, we had retained 599 miRNAs with non-zero expression in the more than 50% of the analyzed samples.

miRNA Expression Comparison between Tumor Types
Based on the log2 expression fold change between nonresponder and responder cohorts the samples were divided into two clusters. CESC and HNSC formed one cluster and LUSC samples were separated into a different node. The similarity score was 0.31 between CESC and HNSC and 0.26 between LUSC and HNSC samples ( Figure 2). The correlation between HNSC and LUSC was much lower (0.13).

Logistic Regression Based Classification Model
We combined the HNSC and CESC samples to have a sufficient sample number for the development of a classification model capable of predicting chemotherapy response. The selection of HNCS and CESC was based on their higher similarity scores (due to the differences between HNSC and LUSC the overall model did not improve when including LUSC samples as well). The samples were randomly divided into a training (70%) set and a test set (30%) multiple times. Thirty differentially expressed miRNA were identified in the HNSC and CESC merged samples, with an AUC over 0.66 and an FDR below 0.05.
In the second step a stepwise logistic regression with forward selection was performed to estimate the predictive power. After this second feature selection we retained nine miRNAs with the strongest link to treatment response. The accuracy of the model was 0.88 (95% CI: 0.81-0.93), the sensitivity was 0.69, and the specificity was 0.94 in the training set. In the test set the accuracy was 0.81 (95% CI: 0.69-0.90), the sensitivity was 0.60, and the specificity was 0.89. Overall, the 10-fold cross-validated model accuracy was 0.85 (95% CI: 0.79-0.90), the sensitivity was 0.60, and the specificity was 0.94.

Logistic Regression Based Classification Model
We combined the HNSC and CESC samples to have a sufficient sample number for the development of a classification model capable of predicting chemotherapy response. The selection of HNCS and CESC was based on their higher similarity scores (due to the differences between HNSC and LUSC the overall model did not improve when including LUSC samples as well). The samples were randomly divided into a training (70%) set and a test set (30%) multiple times. Thirty differentially expressed miRNA were identified in the HNSC and CESC merged samples, with an AUC over 0.66 and an FDR below 0.05.
In the second step a stepwise logistic regression with forward selection was performed to estimate the predictive power. After this second feature selection we retained nine miRNAs with the strongest link to treatment response. The accuracy of the model was 0.88 (95% CI: 0.81-0.93), the sensitivity was 0.69, and the specificity was 0.94 in the training set. In the test set the accuracy was 0.81 (95% CI: 0.69-0.90), the sensitivity was 0.60, and the specificity was 0.89. Overall, the 10-fold cross-validated model accuracy was 0.85 (95% CI: 0.79-0.90), the sensitivity was 0.60, and the specificity was 0.94.

Target Gene Prediction
Since some miRNAs were over-expressed and other under-expressed, the overall effect of the combination of these could neutralize each other. For this reason we investigated the over-and under-expressed miRNAs separately. The analysis of miRNAs overexpressed in nonresponder group revealed that hsa-mir-2355 and hsa-mir-6728 have two common predicted target genes which are repeated in different KEGG pathways, ENTPD5 and NT5C3A. Both genes take parts in purine (hsa00230, p = 6.47 × 10 −10 ) and pyrimidine metabolism (hsa00240, p = 5.20 × 10 −8 ) and NTC3A has a role in nicotinate and nicotinamide metabolism (hsa00760) and other metabolic pathways (hsa01100) as well. The underexpressed miRNAs (hsa-mir-5586, hsa-mir-642a, hsa-mir-101, hsa-mir-632) have nine common predicted target genes in four KEGG pathways. Most of the predicted genes are overlapped between pathways ( Table 6) Table 6. The target genes of miRNAs with lower expression in the nonresponder cohort (hsa-mir-5586, hsa-mir-642a, hsa-mir-101, hsa-mir-632) predicted by DIANA-mirPath.

Discussion
In our study our aim was to explore the miRNAome of platinum-treated head and neck, cervical and lung squamous cell carcinoma samples. First, we determined the most significant therapy-response related miRNAs in each SCC cohort. In CESC the most significant gene was hsa-miR-342. Hsa-miR-342 is known to be downregulated in cervical cancer tissues and cell lines and suppressed proliferation, growth, invasion and migration in human cervical cells [18]. In the HNSC samples hsa-miR-326 had the best discriminatory ability between responder and nonresponder cohorts. hsa-miR-326 was previously characterized as a tumor suppressor in breast [19] and colorectal cancer [20]. In our analysis responder samples had significantly elevated expression of hsa-miR-326 compared to nonresponder samples. Finally, in the LUSC subgroup hsa-miR-130a was the most significant miRNA. Elevated expression of hsa-miR-130a increased resistance to platinum chemotherapy [21]. Our results are in line with these previous findings, as we also observed higher expression in the nonresponding cohort.
We examined the overlapping genes, e.g., genes significant in multiple cohorts. In HNSC and CESC groups we uncovered 8 common miRNAs with significant discriminatory ability between responder and nonresponder cohorts, whereas we found only three common significant miRNAs in the LUSC and HNSC samples. A potential explanation for the higher number of overlapping miRNAs in the HNSC and CESC group can be that HPV infection has a role in the etiology of both tumor types. We have found 4 significant miRNAs (hsa-miR-16, hsa-miR-145, hsa-miR-199b) which have been previously reported as HPV core miRNAs in HNSC and CESC clinical samples [22].
The eight overlapping miRNAs between HNSC and CESC groups were previously discussed as genes related to cancer. High plasma level of hsa-mir-142 were reported as a HPV-independent prognostic marker of combined radio-chemotherapy in HNSC patients [23], and its expression correlated with the inhibition of cell proliferation and chemoresistance in ovarian cancer cell lines [24]. Induced overexpression of hsa-mir-150 inhibited cell invasion and metastasis in ovarian cancer [25]. Inhibited proliferation and enhanced therapeutic effect of cisplatin was reported in osteosarcoma for hsa-mir-16 [26]. hsa-mir-181 was overexpressed in cisplatin resistant NSCLC cells [27], and correlated with worse survival in oral squamous cell carcinoma patients [28]. Finally, higher expression of hsa-mir-378 reversed chemoresistance to cisplatin in NSCLC cells [29]. In our analysis, all of these miRNAs except for hsa-mir-181b-1 were overexpressed in responders.
The three significant miRNAs shared between LUSC and HNSC (hsa-miR-130a, hsa-miR-15a, hsa-miR-877) are all upregulated in nonresponder samples. hsa-miR-130a was upregulated in esophageal squamous carcinoma cell lines [30] and in clear cell carcinoma of ovary patients and the overexpression of the miRNA was a biomarker for disease recurrence [31]. hsa-miR-130a was also elevated in cisplatin resistant ovarian cell lines [32]. hsa-miR-15 was the first miRNA linked to tumor suppression in cancer [33]. As a contrary, it has been reported that overexpression of hsa-miR-15 predicts poor disease-free survival and overall survival in colorectal cancer patients [34], and it promotes neuroblastoma migration [35]. hsa-miR-877 was reported as oncogene in gastric cancer [36], but another study found that it has a tumor suppressor role in prostate cancer [37]. Our result suggests that the higher expression of hsa-miR-130a, hsa-miR-15a, and hsa-miR-877 are all biomarkers of resistance.
In the last step we determined a miRNA signature capable of predicting the outcome of platinum-based chemotherapy in HNSC and CESC samples. Our model reached a high accuracy in the 10-fold cross-validation of 0.854. The final model includes six miRNAs with the strongest effect-the most significant of these was hsa-miR-5586, which was studied in diffuse large B-cell lymphoma patients and the elevated expression was associated with better outcome [38]. In our study the expression of mir-5586 was elevated in responder samples both in HNSC and CESC subgroups. The second most important variable in our model was hsa-miR-632 and responders had higher expression. Earlier, its upregulation was associated with better survival in colorectal cancer [39] and its downregulation had an inhibitory effect on hepatocellular carcinoma cell proliferation and invasion [40]. The expression of hsa-miR-2355 was higher in nonresponder samples of HNSC and CESC subgroups. In esophageal squamous carcinoma cell line Zhang et al. reported that overexpressed hsa-miR-2355 promoted cell proliferation and invasion [41].
In the last step target gene prediction was performed with DIANA-Tool for six miR-NAs significant in the logistic regression model. For the hsa-mir-2355 and the hsa-mir-6728 we determined ENTPD5 and NT5C3A genes as potential targets. The ENTPDT gene was examined in numerous studies and association between higher expression of ENTPDT and chemotherapy response were reported in colorectal cancer cells [42] and prostate cancer [43]. In lung cancer cells decreased apoptosis rate was observed after knockdown of ENTPDT [44]. Interestingly, the most significant pathway related to genes targeted by the four miRNAs with lower expression (hsa-mir-5586, hsa-mir-642a, hsa-mir-101, hsa-mir-632) was the circadian entrainment pathway with six potential target genes. This pathway was demonstrated to be associated with platinum treatment resistance in cancer [45].
There is an important limitation of our study: we had only a restricted number of samples available for this analysis. Not only is the total number of SCC samples published low, but an important further restriction was the administration of a platinum-based protocol. Although we were able to validate the result of our final model using separate test/training cohorts in a ten-fold cross-validation, for the generalization of our results it will be necessary to repeat the analysis in the future using more clinical samples.

Database Construction
The level 3 miRNA expression data and the clinical information of samples were downloaded from GDC data portal [46]. All samples were primary tumors, cervical squamous cell carcinoma (CESC), head and neck squamous cell carcinoma (HNSC) or lung squamous cell carcinomas (LUSC). We retained only samples which were treated with cisplatin or carboplatin and had clinical information with survival times available ( Figure 4A).

Determination of miRNAs with Best Discriminatory Ability by Tumor Types
For the analysis we utilized the log2 transformed reads per million mapped reads and filtered the datasets to include only miRNA expressed at a non-zero level in at least 50% of the samples. Based on the presence or absence of disease progression at 18 months after surgery we splitted patients into two cohorts. Patients who had any disease progression or death within 18 months were categorized as nonresponders and patients without a disease progression were considered as responders.
We compared nonresponder and responder cohorts across all genes using Mann-Whitney U-test and Receiver Operating Characteristic test in the R statistical environment [47] using Bioconductor libraries [48]. To avoid false discovery, we calculated false discovery rate (FDR) with the qvalue R package.

Similarity Detection between Tumor Types
To uncover similarities among tumor types, the log2 transformed fold change values of CESC, LUSC, and HNSC samples were converted into a ranked matrix. To obtain a ranked matrix and clustering we used the rankerGUI website [49]. The matrix was processed further to obtain a similarity score between tumor types and clustered them based on these similarity scores.

Model Building
As an initial feature selection step, we compared nonresponder and responder samples again with Mann-Whitney U-test and Receiver Operating Characteristics using a combined cohort comprising all HNSC and CESC samples. We retained only miRNAs if the p-value in the Mann Whitney test was below 0.01 and the FDR was below 5%, and the area under the curve in the Receiver Operating Characteristic (ROC) analysis was greater than 0.65. In the next step we performed a stepwise logistic regression model with forward selection as a second feature selection step to identify potential miRNA predictors of cisplatin/carboplatin sensitivity. The dependent variable was the responder status at 18 months, and the independent variables were the selected miRNAs from the previously selected miRNAs. In the model building process, the samples were randomly divided into a training (70%) and a test set (30%). Classification was based only on the training set. To avoid bias related to sample splitting we also performed a 10-fold cross-validation. To evaluate the overall performance of the logistic regression model we performed Receiver Operating Characteristic analysis to determine the area under curve (AUC). For the analysis the caret R package was used ( Figure 4B).

KEGG Target Gene Prediction
For miRNAs significant in the logistic regression model we performed KEGG target prediction with the online DIANA-mirPath v3.0 by employing the microT-CDS algorithm

Determination of miRNAs with Best Discriminatory Ability by Tumor Types
For the analysis we utilized the log2 transformed reads per million mapped reads and filtered the datasets to include only miRNA expressed at a non-zero level in at least 50% of the samples. Based on the presence or absence of disease progression at 18 months after surgery we splitted patients into two cohorts. Patients who had any disease progression or death within 18 months were categorized as nonresponders and patients without a disease progression were considered as responders.
We compared nonresponder and responder cohorts across all genes using Mann-Whitney U-test and Receiver Operating Characteristic test in the R statistical environment [47] using Bioconductor libraries [48]. To avoid false discovery, we calculated false discovery rate (FDR) with the qvalue R package.

Similarity Detection between Tumor Types
To uncover similarities among tumor types, the log2 transformed fold change values of CESC, LUSC, and HNSC samples were converted into a ranked matrix. To obtain a ranked matrix and clustering we used the rankerGUI website [49]. The matrix was processed further to obtain a similarity score between tumor types and clustered them based on these similarity scores.

Model Building
As an initial feature selection step, we compared nonresponder and responder samples again with Mann-Whitney U-test and Receiver Operating Characteristics using a combined cohort comprising all HNSC and CESC samples. We retained only miRNAs if the p-value in the Mann Whitney test was below 0.01 and the FDR was below 5%, and the area under the curve in the Receiver Operating Characteristic (ROC) analysis was greater than 0.65. In the next step we performed a stepwise logistic regression model with forward selection as a second feature selection step to identify potential miRNA predictors of cisplatin/carboplatin sensitivity. The dependent variable was the responder status at 18 months, and the independent variables were the selected miRNAs from the previously selected miRNAs. In the model building process, the samples were randomly divided into a training (70%) and a test set (30%). Classification was based only on the training set. To avoid bias related to sample splitting we also performed a 10-fold cross-validation. To evaluate the overall performance of the logistic regression model we performed Receiver Operating Characteristic analysis to determine the area under curve (AUC). For the analysis the caret R package was used ( Figure 4B).

KEGG Target Gene Prediction
For miRNAs significant in the logistic regression model we performed KEGG target prediction with the online DIANA-mirPath v3.0 by employing the microT-CDS algorithm [50]. For the pathway designation Fischer's exact test was applied with a threshold of p ≤ 0.05 and a false discovery rate cutoff of 5% (FDR).

Conclusions
In summary, we performed an analysis of RNA-seq data of 266 squamous cell carcinomas samples and identified significant miRNAs correlated to the response against platinum-based therapy in cervical, head and neck, and lung tumors. We constructed a predictive model for HNSC and CESC with six significant miRNAs capable of predicting response to platinum-based treatment. Our results support the hypothesis of common molecular features of SCC tumors and pinpoint the most important miRNAs related to treatment outcome.

Data Availability Statement:
The data used for the analyses were obtained from GDC data portal ( https://portal.gdc.cancer.gov/) on 15/10/2020.