SRY-Related Transcription Factors in Head and Neck Squamous Cell Carcinomas: In Silico Based Analysis

Head and neck squamous cell carcinoma (HNSCC) is the sixth leading cancer and the fifth cause of cancer-related deaths worldwide with a poor 5-year survival. SOX family genes play a role in the processes involved in cancer development such as epithelial–mesenchymal transition (EMT), the maintenance of cancer stem cells (CSCs) and the regulation of drug resistance. We analyzed the expression of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes in HNSCC patients using the Cancer Genome Atlas (TCGA) and Gene Expression Omnibus (GEO) datasets, to assess their biological role and their potential utility as biomarkers. We demonstrated statistically significant differences in expression between normal and primary tumor tissues for SOX6, SOX8, SOX21 and SOX30 genes and pointed to SOX6 as the one that met the independent diagnostic markers criteria. SOX21 or SRY alone, or the panel of six SRY-related genes, could be used to estimate patient survival. SRY-related genes are positively correlated with immunological processes, as well as with keratinization and formation of the cornified envelope, and negatively correlated with DNA repair and response to stress. Moreover, except SRY, all analyzed genes were associated with a different tumor composition and immunological profiles. Based on validation results, the expression of SOX30 is higher in HPV(+) patients and is associated with patients’ survival. SRY-related transcription factors have vast importance in HNSCC biology. SOX30 seems to be a potential biomarker of HPV infection and could be used as a prognostic marker, but further research is required to fully understand the role of SOX family genes in HNSCC.


Introduction
Head and neck squamous cell carcinoma (HNSCC) is localized in the oral cavity, oropharynx, hypopharynx and nasopharynx [1][2][3].The most important risk factors are tobacco use, alcohol consumption and human papillomavirus (HPV) infection [3,4].HNSCC is the sixth leading cancer and the fifth cause of cancer-related deaths worldwide [3,4], with 600,000 new cases every year.HNSCC has a poor 5-year survival (40-60%), causing 2% of all deaths in men (data for larynx cancers) in the Greater Poland Region [1,2].Treatment is based on surgery, radiation and chemotherapy [1].HNSCC can be divided into two main groups, HPV-positive and HPV-negative, and such a status constitutes a significant prognostic factor.HPV(+) cancers are mainly caused by oral sex and mainly located in the oropharynx and HPV(−) cancers are mostly caused by smoking and excessive alcohol use and they have different molecular features [4][5][6][7].
Based on gene expression profiles, the four subtypes of HNSCC have been distinguished: basal (31%), mesenchymal (27%), atypical (24%) and classical (18% of cases).They differ clinically, histologically and molecularly.Genetic heterogeneity results in the activation of protein-coding oncogenes, e.g., EGFR and PIK3CA, and a loss of function in tumor-suppressor genes, e.g., p53 and p16 [4].In spite of the identification of the genes involved in the development and prognosis of HNSCC [8,9], no dramatic changes in recovery rates have been observed.However, these protein-coding transcripts constitute only about 2% of the human genome, and most of the genome consists of the non-coding genes which are responsible for regulation of cellular processes [10][11][12].Among them, we can distinguish long non-coding RNAs (lncRNAs) which are about 200 nucleotide-length transcripts constituting more than 80% of the eukaryotic transcriptome [11].Many lncRNAs are dysregulated in some cancers and can be used as biomarkers [13][14][15][16].lncRNAs participate in stem cell differentiation by regulating the pluripotency and are involved in tumorigenesis as tumor oncogenes or suppressors regulating cancer-related signaling pathways [17].
The SOX family is the SRY-related transcription factor family and includes 30 members [18].These proteins are involved in several growth and development processes, e.g., embryonic development, sex determination, stem cell formation and angiogenesis [19].SOX genes also have a role in cancer development including important cancer-related processes such as epithelial-mesenchymal transition (EMT) and the maintenance of cancer stem cells (CSCs) population, as well as the regulation of drug resistance [20].Moreover, long non-coding RNAs associated with the SOX gene, such as SOX2 overlapping transcript (SOX2-OT), are dysregulated in various tumors and function as oncogenes [21].The role of SRY-related transcription factors and their non-codingrelated transcripts is not fully understood in the case of HNSCC.In spite of many different strategies for improving the diagnosis [22][23][24] and chemo- [25] and/or radiotherapy treatment of HNSCC patients, an improvement in patients' outcomes is still challenging [26][27][28].Probably, treatment personalization based on a molecular transcription profile including both coding and non-coding genes could help in overcoming the high mortality rate.
In this study, the selected SRY-related transcription factors, SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY, were analyzed, and their biological role and their potential utility as biomarkers for HNSCC were tested.

TGCA Data
The TCGA expression and clinical data of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY were downloaded from cBioPortal and from the Santa Cruz University of California dataset (Head and Neck Squamous Cell Carcinoma, TCGA, dataset: gene expression RNAseq-IlluminaHiSeq pancan-normalized; RNA expression pan-cancer-normalized log2(norm_count + 1); accessed on 1 November 2021 from https://xenabrowser.net/) and from the UALCAN databases (http://ualcan.path.uab.edu;accessed on 1 November 2021) [29,30].All data are available online; access is unrestricted and does not require patient's consent or other permissions.The use of the data does not violate the rights of any person or any institution.

Gene Analysis
Association between SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY expression levels was analyzed using the UALCAN database.Receiver operating characteristic (ROC) analysis with an area under the curve (AUC) estimation in a group of 43 adjacent-matched healthy and neoplastic tissues was applied to determine the utility of diagnostic markers for distinguishing healthy from cancer samples.Next, a correlation between expression levels of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY was obtained using the Spearman test (positive correlation: r > 0.3; negative correlation: r < −0.3) and a heatmap was created using Morpheus Software (https://software.broadinstitute.org/morpheus;accessed on 1 March 2021).Next, expression levels of SRY-related genes were checked depending on localizations in the oral cavity (n = 346), pharynx (n = 88) and larynx (n = 130) according to National Institutes of Health (NIH) classification.

Association of SRY-Related Transcripts in Context to Stromal and Immune Scores and Immune Profile
Tumor composition scores for this analysis were acquired from the ESTIMATE (https: //bioinformatics.mdanderson.org/public-software/estimate/;accessed on 2 January 2021) database.Data from ESTIMATE contains information about the presence of stroma in tumor tissues (stromal score), representation inflation of immune cells in tumor tissues (immune score) and tumor purity (estimate score).We separated each gene expression group in two, low and high relative to average in all samples, as described previously [30][31][32][33][34].

Validation of the Results
We used the Gene Expression Omnibus (GEO) data repository, with GSE65858 [35] and GSE3292 [36] datasets to validate our results for SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes.The expression levels of SRY-related transcripts were compared between HPV(−) and HPV(+), types of HPV and between an active infection and inactive infection.Next, we evaluated the differences between the expression levels of SRY-related transcripts and various clinicopathological parameters such as age, gender, smoking history, alcohol consumption, disease stage, T stage, N stage, cancer molecular clusters types and tumor localization.Next, we analyzed patients' progression-free interval (PFI) and overall survival (OS) in all patients, only in the HPV(+) and only in the HPV(−) groups of patients, using the mean of expression in analyzed groups as a cut-off (low vs. high).The validation of the results was performed similarly to described previously [30].The statistical analyses were performed as described in point 2.6.

Statistical Analysis
Statistical analyses were performed using GraphPad Prism 5/8 (GraphPad, San Diego, CA, USA).The Shapiro-Wilk normality test and Mann-Whitney U test were used for measuring SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY levels (depending on clinical parameters).The expression in each of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes was compared depending on the location of the tumor using one-way ANOVA, obtained using the Tukey test.All TCGA data are presented as a median with SEM (standard error of the mean).Disease-free survival (PFS) and OS survival rates and statistics for data from the TCGA were generated using GEPIA2 tool.PFI and OS for data from GEO datasets were calculated using the log-rank (Mantel-Cox) test and the Gehan-Breslow-Wilcoxon test.In all analyses, p < 0.05 was used to determine statistical significance, as described previously [30][31][32][33][34].

SOX Family Gene Expression Levels Distinguish between Healthy and Cancer Tissues
We observed statistically significant differences in the expression of SOX6, SOX8, SOX21 and SOX30 genes in healthy tissue vs. primary tumor samples.In all cases, except for SOX30, the expression of these genes was lower in cancer tissues compared to the healthy ones.No differences between tumor and normal tissues were indicated for SOX2-OT and SRY genes (p > 0.05), Figure 1A.
genes was compared depending on the location of the tumor using one-way ANOVA, obtained using the Tukey test.All TCGA data are presented as a median with SEM (standard error of the mean).Disease-free survival (PFS) and OS survival rates and statistics for data from the TCGA were generated using GEPIA2 tool.PFI and OS for data from GEO datasets were calculated using the log-rank (Mantel-Cox) test and the Gehan-Breslow-Wilcoxon test.In all analyses, p < 0.05 was used to determine statistical significance, as described previously [30][31][32][33][34].

SOX Family Gene Expression Levels Distinguish between Healthy and Cancer Tissues
We observed statistically significant differences in the expression of SOX6, SOX8, SOX21 and SOX30 genes in healthy tissue vs. primary tumor samples.In all cases, except for SOX30, the expression of these genes was lower in cancer tissues compared to the healthy ones.No differences between tumor and normal tissues were indicated for SOX2-OT and SRY genes (p > 0.05), Figure 1A.Next, receiver operating characteristic curve (ROC) analyses for SOX2-OT, SOX6, SOX8, SOX30 and SRY were performed to describe the potency of SRY-related genes to distinguish between healthy and cancerous samples.We showed that the analysis of the expression levels of SOX6, SOX8 and SRY correlated with the AUC (area under the curve) estimated to 0.6360, 0.7588 and 0.7188, respectively, p < 0.05.SOX2-OT, SOX21 and SOX30 did not meet the criteria of the diagnostic markers (p > 0.05, AUC < 0.6), Figure 1B.
To analyze the correlation between all SRY-related genes in HNSCC samples, the Spearman test was used.The results showed that SOX2-OT expression was positively correlated with SOX21 (R = 0.6153, p < 0.0001), and also, SOX6 expression was positively correlated with SOX21 (R = 0.3130, p < 0.0001).No correlations (p > 0.05) between SOX30 and SOX6, SOX30 and SRY or SRY and SOX2-OT were indicated, Figure 1C.

Expression Levels of SRY-Related Transcription Factors Depend on Tumor Localization
In our analyses, we divided the patients into three groups, oral cavity (n = 346), pharynx (n = 88) and larynx (n = 130), depending on the tumor localization based on the National Institutes of Health (NIH) classification.The gene expression in each group was compared for SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes.For the oral vs. pharynx comparison, there were significant differences between expressions in all genes except SOX6 (p > 0.05).There were no significant differences between the pharynx and larynx in all genes other than SOX2-OT, SOX30 and SRY.Between the oral cavity and larynx, there were significant differences in SOX2-OT, SOX8 and SOX21, Figure 2.
Next, receiver operating characteristic curve (ROC) analyses for SOX2-OT, SOX6, SOX8, SOX30 and SRY were performed to describe the potency of SRY-related genes to distinguish between healthy and cancerous samples.We showed that the analysis of the expression levels of SOX6, SOX8 and SRY correlated with the AUC (area under the curve) estimated to 0.6360, 0.7588 and 0.7188, respectively, p < 0.05.SOX2-OT, SOX21 and SOX30 did not meet the criteria of the diagnostic markers (p > 0.05, AUC < 0.6), Figure 1B.
To analyze the correlation between all SRY-related genes in HNSCC samples, the Spearman test was used.The results showed that SOX2-OT expression was positively correlated with SOX21 (R = 0.6153, p < 0.0001), and also, SOX6 expression was positively correlated with SOX21 (R = 0.3130, p < 0.0001).No correlations (p > 0.05) between SOX30 and SOX6, SOX30 and SRY or SRY and SOX2-OT were indicated, Figure 1C.

Expression Levels of SRY-Related Transcription Factors Depend on Tumor Localization
In our analyses, we divided the patients into three groups, oral cavity (n = 346), pharynx (n = 88) and larynx (n = 130), depending on the tumor localization based on the National Institutes of Health (NIH) classification.The gene expression in each group was compared for SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes.For the oral vs. pharynx comparison, there were significant differences between expressions in all genes except SOX6 (p > 0.05).There were no significant differences between the pharynx and larynx in all genes other than SOX2-OT, SOX30 and SRY.Between the oral cavity and larynx, there were significant differences in SOX2-OT, SOX8 and SOX21, Figure 2.

Expression Levels of SRY-Related Transcription Factors Depend on Specific Clinicopathological Parameters
Differences in expression levels of all studied genes in HNSCC patients were analyzed in the context of selected clinicopathological parameters.In all parameters, the expression of at least one of the six studied genes showed a significant difference.SOX2-OT, SOX6, SOX8, SOX21 and SRY genes were connected with tobacco smoking; SOX2-OT, SOX6, SOX21 and SOX30were connected with lymph node neck dissection and lymphovascular invasion; and SOX2-OT, SOX8 and SOX30 were connected with HPV in p16 test two with regard to gender (SOX21, SRY), cancer stage (SOX6, SOX21) and grade (SOX8, SOX30) and one gene with age (SRY), alcohol (SRY), T stage (SOX8) and N stage (SOX30).Data are summarized in detail in Tables 1 and 2.

Higher Survival Rates Are Associated with High SOX21 and SRY Expression Levels
Patients were divided into two groups with low and high expression levels of the selected genes.The high expression group was distinguished from the low expression group on the basis of the median of all patients gene expression levels.For each gene, both groups had their overall survival and disease-free survival rates compared.There were no significant differences (p > 0.05) in any of the analyzed genes, except for disease-free survival in SOX21 and for overall survival for the SRY, which had a better survival rate when the expression of this gene was higher than the median.HNSCC patients with higher levels of the six examined SRY-related transcription factors have longer DFS and OS in comparison to the patients with lower levels of these transcripts, Figure 3.

SOX Genes' Expression Levels Correlate with Important Cellular Processes
Positively and negatively (Spearman positive correlation: R > 0.3; negative correlation: R < −0.3) correlated genes with SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY were downloaded from cBioPortal.com, accessed on 2 January 2021, see Supplementary Materials.The gene-pathways association analysis was performed using the Reactome web tool reactome.org(accessed on 1 March 2021).Five pathways were selected representing each set with the lowest p value.SRY was not included in the analysis due to the insufficient number of documented correlations.For SOX6, negatively correlated genes were associated with chromatin organization and FGFR2 signaling, and positively correlated genes were associated with developmental biology processes, keratinization and formation of the cornified envelope.For SOX8, processes such as O-glycosylation of the TSR domain-containing proteins, fiber formation and potassium channels were connected with negatively correlated genes, and those which were positively correlated were similar to the SOX6 with additional processes connected with the gap junction.In the case of SOX30, positively correlated genes were also similar to those described for SOX6 and SOX8, but negatively correlated genes were associated with DNA replication processes.The last protein-coding gene, SOX21, correlated with genes associated with very different processes in comparison to the rest of SRY-related genes.Positively correlated genes were connected with immunological processes and cellular transport, and negatively correlated ones with tight junction interactions and with N-acetylgalactosaminyltransferase (GALNT3 and GALNT12).Finally, our analysis indicated that SOX2-OT lncRNA were positively correlated with genes associated with immunological processes and cellular transport, similar to SOX21.On the other hand, negatively correlated genes were connected with DNA repair.All data are presented in Figure 4 and in Supplementary Table S1.group on the basis of the median of all patients gene expression levels.For each gene, both groups had their overall survival and disease-free survival rates compared.There were no significant differences (p > 0.05) in any of the analyzed genes, except for disease-free survival in SOX21 and for overall survival for the SRY, which had a better survival rate when the expression of this gene was higher than the median.HNSCC patients with higher levels of the six examined SRY-related transcription factors have longer DFS and OS in comparison to the patients with lower levels of these transcripts, Figure 3.  with immunological processes and cellular transport, and negatively correlated ones with tight junction interactions and with N-acetylgalactosaminyltransferase (GALNT3 and GALNT12).Finally, our analysis indicated that SOX2-OT lncRNA were positively correlated with genes associated with immunological processes and cellular transport, similar to SOX21.On the other hand, negatively correlated genes were connected with DNA repair.All data are presented in Figure 4 and in Supplementary Table S1.

Tumor Composition Is Associated with Differences in the Expression of SRY-Related Transcription Factors
Next, we focused on the tumor composition using a stromal score that indicates the presence of stroma in tumor tissue, an immune score describing the infiltration of immune cells in tumor tissue and an estimate score that describes the tumor purity.
We found that the stromal score, immune score and estimate score were lower in patients with a higher expression of SOX2-OT and SOX21 and with a lower expression of SOX8.Stromal and estimated scores were higher in samples with lower SOX6 expression.The immune score was higher in samples with higher SOX30 expression, Figure 5 and Supplementary Table S2.
For genes with significant correlations with the immune scores, we analyzed the specific populations of immune cells, Figure 6.We found that in patients with a higher expression of SOX2-OT, the population of lymphocytes was significantly elevated, and the populations of neutrophils, mast cells and macrophages were lowered.Similar results were found in SOX8, SOX21 and SOX30 analyses.The population of neutrophils and dendritic cells were lowered in patients with a higher expression of SOX8 and SOX30.A higher expression of SOX8, SOX21 and SOX2-OT correlated with the lowered population of macrophages, Figure 6A and Supplementary Table S3A.
Next, we focused on the tumor composition using a stromal score that indicates the presence of stroma in tumor tissue, an immune score describing the infiltration of immune cells in tumor tissue and an estimate score that describes the tumor purity.
We found that the stromal score, immune score and estimate score were lower in patients with a higher expression of SOX2-OT and SOX21 and with a lower expression of SOX8.Stromal and estimated scores were higher in samples with lower SOX6 expression.The immune score was higher in samples with higher SOX30 expression, Figure 5 and Supplementary Table S2.For genes with significant correlations with the immune scores, we analyzed the specific populations of immune cells, Figure 6.We found that in patients with a higher expression of SOX2-OT, the population of lymphocytes was significantly elevated, and the populations of neutrophils, mast cells and macrophages were lowered.Similar results were found in SOX8, SOX21 and SOX30 analyses.The population of neutrophils and dendritic cells were lowered in patients with a higher expression of SOX8 and SOX30.A higher expression of SOX8, SOX21 and SOX2-OT correlated with the lowered population of macrophages, Figure 6A and Supplementary Table S3A.We also analyzed the subpopulations of B cells, macrophages and T cells.The subpopulation of naive B cells was higher in patients with a higher expression of SOX2-OT, SOX8 and SOX30, and in patients with a higher expression of SOX21, the subpopulation of memory B cells was higher.The significant differences of the subpopulation of macrophages were found in SOX21.The subpopulation of M0 and M1 macrophages were lower in pa-Figure 6. Immunological profile of HSNCC patients depending on the expression levels of SOX2-OT, SOX8, SOX21 and SOX30.Differences in the population of lymphocytes, neutrophils, eosinophils, mast cells, dendritic cells and macrophages (A), and in the specific subpopulation of B cells, macrophages and T cells (B).t-test or Mann-Whitney U test; p < 0.05 considered as significant; ns: not significant, * p < 0.05; ** p < 0.01; *** p < 0.001; **** p < 0.0001.
We also analyzed the subpopulations of B cells, macrophages and T cells.The subpopulation of naive B cells was higher in patients with a higher expression of SOX2-OT, SOX8 and SOX30, and in patients with a higher expression of SOX21, the subpopulation of memory B cells was higher.The significant differences of the subpopulation of macrophages were found in SOX21.The subpopulation of M0 and M1 macrophages were lower in patients with a higher expression of this gene.There were no significant differences in subpopulations of T cells in the SOX30 gene.In SOX8 and SOX21, the subpopulation of follicular helper T cells and regulatory Tregs were higher in patients with a higher expression, and in SOX2-OT, the subpopulation of follicular helper T cells was higher in patients with a higher expression.All data are presented in Figure 6B and in Supplementary Table S3B.

Validation of the Results for SRY-Related Transcription Factors Using the GEO Datasets
Next, we decided to validate the TCGA data using the GSE65858 and GSE3292 GEO datasets [35,36].First, we analyzed the GSE65858 dataset and analyzed only HPV(−) patients.We observed no significant differences (p > 0.05) in expression levels of all SRY-related transcription factors and gender, age, alcohol consumption, smoking status, cancer stage, T-stage or N-stage.Moreover, there were no differences (p > 0.05) in the case of HPV(+) and HPV(−) patients, nor in the case of viral activity, when the active (DNA+RNA+) and inactive infection (DNA+RNA-) status were compared, for all SRYrelated transcription factors were indicated in the set of GSE65858 data.However, in the case of molecular clusters, we observed no differences between the atypical vs. basal vs. classical vs. mesenchymal subtypes in the case of SOX30 and SRY (p = 0.1622 and p = 0.883, respectively).However, in the cases of the apical vs. basal subtypes for SOX8 and atypical vs. classical for SOX6, SOX8 and SOX21, as well as of the atypical vs. mesenchymal for SOX8 and SOX21, significant differences were noticed (p = 0.0012; p = 0.0007, p < 0.0001 and p < 0.0001; p = 0.0003 and p < 0.0001, respectively).Surprisingly, we observed a similar pattern of expression in the case of SOX2-OT, SOX6 and SOX21 genes between the basal vs. classical as well as the classical vs. mesenchymal subtypes (p = 0.0018, p = 0.0033, p = 0.0094; and p = 0.0005, p < 0.0001, p < 0.0001; respectively).Only the expression level of SOX21 was significantly higher in the basal than in the mesenchymal subtype of HNSCC (p < 0.0001).Next, we analyzed the association between high and low expression levels of SRY-related transcription factors and patients OS, and no significant differences between high and low expression levels were observed in the groups of all patients and only HPV(+).However, when expression levels of SOX30 were compared in only HPV(−) groups of patients, we observed significantly longer PFS (p = 0.0338 and p = 0.0715) and OS for the patients with a lower expression level of SOX30 than those with a higher expression of this gene (p = 0.0109 and p = 0.0313).All data are presented in Figure 7A and in Supplementary Table S4.
For the GSE329 dataset, we chose only HPV(−) patients and differences in expression levels of SRY-related transcription factors, depending on the clinicopathological parameters that were measured.We observed differences in the case of age (for SOX30: p = 0.0395), alcohol consumption (for SOX30: p = 0.0093; for SRY: p = 0.0102), pathological stage (for SOX21; p = 0.0064), pathological T-stage (for SOX21; p = 0.0005) and pathological N-stage (for SOX21; p = 0.0114 and for SOX30: p = 0.0114) and no differences in the expression levels of all SRY-related transcription factors for gender, tumor grade, clinical stage and clinical cervical lymph nodes status (p > 0.05).

Discussion
The SOX gene family, also known as SRY-related HMG-box genes, constitutes the transcription factors which bind to the minor groove of DNA and regulate the transcription.In humans, 20 different protein-coding SOX genes have been described [37].The SOX gene family plays a role in cancer development including important cancer-related processes such as epithelial-mesenchymal transition (EMT), the maintenance of cancer stem cell (CSC) population, metastasis and the regulation of drug resistance [38,39].In this study, the selected SRY-related transcription factors, including lncRNA SOX2-OT, and protein-coding SOX6, SOX8, SOX21, SOX30 and SRY genes were analyzed for the first time in HNSCC.Our results clearly point to SOX6, SOX8 and SRY as factors which are Next, the association between HPV status and expression levels of SRY-related transcription factors was analyzed.We observed that only for SOX30 a significant up-regulation of expression level in the case of HPV+ patients (p < 0.0001) and no significant differences for SOX2-OT, SOX6, SOX8, SOX21 and SRY genes (p > 0.05).All data are presented in Figure 7B and in Supplementary Table S5.

Discussion
The SOX gene family, also known as SRY-related HMG-box genes, constitutes the transcription factors which bind to the minor groove of DNA and regulate the transcription.In humans, 20 different protein-coding SOX genes have been described [37].The SOX gene family plays a role in cancer development including important cancer-related processes such as epithelial-mesenchymal transition (EMT), the maintenance of cancer stem cell (CSC) population, metastasis and the regulation of drug resistance [38,39].In this study, the selected SRY-related transcription factors, including lncRNA SOX2-OT, and protein-coding SOX6, SOX8, SOX21, SOX30 and SRY genes were analyzed for the first time in HNSCC.Our results clearly point to SOX6, SOX8 and SRY as factors which are characteristic for a tumor tissue.We correlated the expression levels of the SOX genes with a panel of clinicopathological parameters and observed the significant correlations between the gene expression and cancer site (oral cavity vs. pharynx or larynx).Moreover, we found a longer disease-free survival for patients with a higher SOX21 expression and better survival rate for patients with a higher SOX6 expression.Our study also showed the relevance of the analyzed genes with the immune response, as there were significant correlations between the expression of SOX genes and the estimate scores.
SOX2-OT is the lncRNA SOX2 overlapping transcript.Its gene consists of 10 exons and more than two transcription start sites.Some SOX2-OT lncRNA transcription variants can be up-or down-regulated in various cancer cell lines and tissues, since they are related to the cancer and the stem-cell-related pathways affecting tumor progression [40].The SOX2-OT gene is placed within over 700 kb of a highly conserved region and is overexpressed in embryonic stem cells, and its expression decreases upon differentiation.In breast cancer, SOX2-OT takes part in the induction and/or maintenance of SOX2 expression [38].SOX2 is a transcription factor responsible for maintaining the self-renewal and pluripotency of undifferentiated embryonic stem cells [21].Its overexpression has been associated with tumorigenicity in HNSCC.SOX2 drives the proliferation and expansion of tumor cells and inhibits their apoptosis.SOX2 is described as the stem cell marker, and it has been detected in various cancer stem cell subpopulations including lung squamous cell carcinoma, medulloblastoma or CD44-positive prostate cancer [38,41].We discovered that SOX2-OT expression is positively correlated with SOX21 expression.In glioma, the increased SOX21 expression inhibits SOX2 leading to cancer cell apoptosis.A decreased SOX2/SOX21 ratio results in the loss of stem cell features in glioma cells [42].
SOX6 plays a role in cell growth, development and proliferation, especially during the development of the central nervous system, and the shaping of early embryonic muscle tissue and cartilage.SOX6 can also be involved in tumorigenesis as an oncogene or tumor suppressor in different types of cancer [18].It was demonstrated that transgenic mice with SOX6 overexpression developed antitumor activity, caused by cytotoxic T cell activity.SOX6 overexpression correlated with a longer survival in these mice compared to untreated mice [38].The analyses of The Cancer Genome Atlas, Cancer Cell Line Encyclopedia and Genotype-Tissue Expression datasets, indicate that SOX6 expression is decreased in breast cancer samples.Oppositely, SOX6 overexpression inhibits the proliferation of cancer cells and promotes apoptosis.Moreover, methylation of the SOX6 promoter is elevated when compared to non-cancerous tissues [43].It has been demonstrated that SOX6 expression is positively correlated with SOX21 expression in glioma patients.Both SOX6 and SOX21 high-expression patients had a worse survival prognosis than those with a low expression of these genes [42].
SOX8 is a transcription factor involved in the nervous system and mammalian testis development.It was identified as an oncogene, and it maintains the stemness in cancer stem cells [20].In the study on SOX8 in tongue squamous cell carcinoma, it was discovered that the knockout of this gene inhibited stem-like capacities, mesenchymal features and tumor metastasis in cancer chemoresistant cells.In tongue squamous cell carcinoma (TSCC), patients' higher expression of SOX8 caused lymph node metastasis, chemotherapeutic resistance and a poor prognosis [20].Shuwei et al. conducted a study in which they identified SOX8 as a regulator of a common oncogene-Golgi phosphoprotein 3 (GOLPH3) in TSCC.The knockdown of SOX8 correlated with the suppression of GOLPH3 promoter activity and the inhibition of TSCC cell proliferation in vivo and in vitro.Moreover, overexpressed SOX8 was linked to GOLPH3 promoter activation and elevated TSCC development.Addi-tionally, SOX8 was upregulated in tumor tissues in comparison to a non-cancerous control, which again correlated with GOLPH3 levels [44].In breast cancer patients, amplification of SOX8 caused poor overall survival [45].
Another member of the SRY-related family, SOX21, is expressed in the anterior neural plate of mouse embryos and causes the progression of neurogenesis.The role of SOX21 in human neural development is still unclear [14].In glioma, SOX21 is a tumor suppressor with a repression activity during tumorigenesis.It also reduces the stemness of tumor cells, leading to differentiation.The increased expression of SOX21 in glioma cells inhibits the tumor progression and reduces the tumor size [38].It is consistent with our results, as patients with a higher expression of SOX21 have a longer disease-free survival.It can be assumed that a higher expression of SOX21 reduces the number of cancer stem cells, resulting in a longer DFS.
SOX30 participates in spermatogonial differentiation and the regulation of spermatogenesis.It can also function as a tumor suppressor in various cancers [46].For example, in lung cancer, SOX30 inhibits cell proliferation, migration, invasion, growth and tumor metastasis.The product of this gene stimulates p53 production, which promotes apoptosis [38].Kumar et al. indicated that a lower expression of SOX30 correlates with a poor prognosis and is associated with the malignant tumor type in bladder cancer [20].Similarly, Yang et al. studied SOX30 mRNA expression in bladder cancer and they discovered that it was lower in cancer cells than in noncancerous control tissues.Moreover, low SOX30 expression correlated with reduced survival rates and its overexpression impaired cell proliferation, invasion and migration, and at the same time, it elevated the apoptosis of bladder cancer cells [46].Another study indicated that the expression level of SOX30 might be used as a prognostic biomarker in patients with ovarian cancer.Fai Han et al. demonstrated that a high SOX30 expression is associated with a better overall survival in patients with the advanced-stage disease, but it is not associated with early stage patients.SOX30 can also inhibit tumor metastasis by upregulating e-cadherin and downregulating n-catherin, vimentin and fibronectin and preventing epithelial-mesenchymal transmission (EMT) [47].

Institutional Review Board Statement:
This study is based on the analysis of freely available datasets and does not need any ethics committee's agreement and does not violate the rights of other persons or institutions.

Informed Consent Statement:
This study used the data from the TCGA and GEO datasets, which are freely available and do not disturb any rights of people or institutions.

Figure 2 .
Figure 2. Expression levels of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes in patients with head and neck squamous cell carcinoma depending on the localization of the tumor.Graphs

Figure 2 .
Figure 2. Expression levels of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes in patients with head and neck squamous cell carcinoma depending on the localization of the tumor.Graphs show the mean value of transcripts per million; t-test or Mann-Whitney U test; p < 0.05 considered as significant; ns: not significant, * p < 0.05; ** p < 0.01; **** p < 0.0001.

Figure 3 .
Figure 3. Disease-free survival and overall survival rates of HNSCC patients with low and high expression of each of the transcripts (A) separately and (B) together.Patients were divided into low and high groups based on the median expression of each gene; graphs generated using GEPIA2 tool; p < 0.05 considered as significant.

Figure 3 .
Figure 3. Disease-free survival and overall survival rates of HNSCC patients with low and high expression of each of the transcripts (A) separately and (B) together.Patients were divided into low and high groups based on the median expression of each gene; graphs generated using GEPIA2 tool; p < 0.05 considered as significant.

Figure 4 .
Figure 4. Positive and negative correlation of SOX6, SOX2-OT, SOX8, SOX21 and SOX30 with genes involved in the important cellular processes.Only genes with Spearman's correlation R > 0.3, R < −0.3 and p < 0.05 were indicated in the REACTOME pathway analysis.Green color indicates negatively correlated genes, and positively correlated genes are in orange.

Figure 4 .
Figure 4. Positive and negative correlation of SOX6, SOX2-OT, SOX8, SOX21 and SOX30 with genes involved in the important cellular processes.Only genes with Spearman's correlation R > 0.3, R < −0.3 and p < 0.05 were indicated in the REACTOME pathway analysis.Green color indicates negatively correlated genes, and positively correlated genes are in orange.

Figure 5 .
Figure 5. Differences between patients with low and high expression of SRY-related genes in context of their stromal score, immune score and estimate score; t-test or Mann-Whitney U test; p < 0.05 considered as significant; ns: not significant, * p < 0.05; ** p < 0.01; **** p < 0.0001.

Figure 7 .
Figure 7. Validation of the results for SRY-related transcription factors using the GSE65858 and GSE3292 datasets.Expression level of SRY-related transcription factors depending on the viral status, molecular clusters and association between expression level of SOX30 and patients' OS and PFS in all patients, only in the HPV(+) and only in the HPV(−) groups of patients using the mean (Low vs. High) of gene expression level as cut-off based on GSE65858 dataset (A).Differences of SRYrelated transcription factors expression levels depending on the viral status HPV(−) and HPV(+) based on GSE3292 dataset (B).t-test or Mann-Whitney U test, or ANOVA with Kruskal-Wallis posttest; p A -log-rank (Mantel-Cox) test; p B -Gehan-Breslow-Wilcoxon test; p < 0.05 considered as significant; ns: not significant, ** p < 0.01; *** p < 0.001; **** p < 0.0001.

Figure 7 .
Figure 7. Validation of the results for SRY-related transcription factors using the GSE65858 and GSE3292 datasets.Expression level of SRY-related transcription factors depending on the viral status, molecular clusters and association between expression level of SOX30 and patients' OS and PFS in all patients, only in the HPV(+) and only in the HPV(−) groups of patients using the mean (Low vs. High) of gene expression level as cut-off based on GSE65858 dataset (A).Differences of SRY-related transcription factors expression levels depending on the viral status HPV(−) and HPV(+) based on GSE3292 dataset (B).t-test or Mann-Whitney U test, or ANOVA with Kruskal-Wallis post-test; p A -log-rank (Mantel-Cox) test; p B -Gehan-Breslow-Wilcoxon test; p < 0.05 considered as significant; ns: not significant, ** p < 0.01; *** p < 0.001; **** p < 0.0001.
T.K., U.K. and K.L.; funding acquisition, T.K. and K.L.; T.K. and Z.G. have contributed equally to this work.All authors have read and agreed to the published version of the manuscript.Funding: This work was supported by Poznan University of Medical Sciences (Department of Cancer Immunology, Chair of Medical Biotechnology-budget for scientific activities) and by the Greater Poland Cancer Center grant no.: 1/2023(263), 13/01/2023/PGN/WCO/001.

Table 1 .
Differences in expression levels of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes depending on chosen clinical parameters.t-test or Mann-Whitney U test; p < 0.05 considered as significant.

Table 2 .
Differences in expression levels of SOX2-OT, SOX6, SOX8, SOX21, SOX30 and SRY genes depending on chosen pathological parameters.t-test or Mann-Whitney U test; p < 0.05 considered as significant.