NF-YA Overexpression in Lung Cancer: LUSC

The CCAAT box is recognized by the trimeric transcription factor NF-Y, whose NF-YA subunit is present in two major splicing isoforms, NF-YAl (“long”) and NF-YAs (“short”). Little is known about the expression levels of NF-Y subunits in tumors, and nothing in lung cancer. By interrogating RNA-seq TCGA and GEO datasets, we found that, unlike NF-YB/NF-YC, NF-YAs is overexpressed in lung squamous cell carcinomas (LUSC). The ratio of the two isoforms changes from normal to cancer cells, with NF-YAs becoming predominant in the latter. NF-YA increased expression correlates with common proliferation markers. We partitioned all 501 TCGA LUSC tumors in the four molecular cohorts and verified that NF-YAs is similarly overexpressed. We analyzed global and subtype-specific RNA-seq data and found that CCAAT is the most abundant DNA matrix in promoters of genes overexpressed in all subtypes. Enriched Gene Ontology terms are cell-cycle and signaling. Survival curves indicate a worse clinical outcome for patients with increasing global amounts of NF-YA; same with hazard ratios with very high and, surprisingly, very low NF-YAs/NF-YAl ratios. We then analyzed gene expression in this latter cohort and identified a different, pro-migration signature devoid of CCAAT. We conclude that overexpression of the NF-Y regulatory subunit in LUSC has the scope of increasing CCAAT-dependent, proliferative (NF-YAshigh) or CCAAT-less, pro-migration (NF-YAlhigh) genes. The data further reinstate the importance of analysis of single isoforms of TFs involved in tumor development.


Introduction
The process of cellular transformation entails profound changes in gene expression. As a consequence of genetic and/or epigenetic changes, cancer cells are characterized by alteration in their transcriptomes, compared to normal counterparts. To understand the logic behind these changes, huge efforts have been made to characterize transcription profilings of cancer cells. In general, transcription is dictated by the precise and often synergistic binding of transcription factors (TFs) to regulatory regions (promoters and enhancers) of genes [1]. Indeed, alteration of the structure and/or expression levels of many TFs leads to tumorigenesis through profound changes in gene expression.
NF-Y is a TF binding with high specificity to the CCAAT box, an important DNA element found in promoters and enhancers. It is a heterotrimer formed by the sequence-specific NF-YA and the histone fold domain (HFD) dimer NF-YB/NF-YC [2]. NF-YA, at the N-terminal, and NF-YC, at the C-terminal, harbor Gln-rich trans-activation domains (TAD). Both are involved in alternative splicing events, generating two major isoforms of NF-YA (differing in 28 amino acids) and several isoforms of NF-YC [3,4]. These isoforms are expressed at various levels, with no immediately obvious logic, in different tissues.
NF-Y genes are neither frequently mutated nor amplified in human cancers, yet the trimer appears to play a role in tumor development, or progression [5]. First, microarray profilings and

RNA-seq Datasets
All available TCGA data on LUSC were obtained from the http://firebrowse.org/ web page: as of January 2019, there were RNA-seq data on 552 LUSC samples, including 501 primary tumor samples and 51 non-tumor (normal) lung tissues. Of the 198 NSCLC included in the GSE81089 dataset, we first partitioned LUSC and LUAD samples by performing Pearson correlation of each samples with a 42 genes classifier previously validated [28]. This allowed the identification of 78 LUSC samples. Samples with correlation values ranging from −0.17 and 0.17 were not further classified and were not considered while values above 0.17 are predicted to be LUAD while values below −0.17 are predicted to be LUSC.

Classification of All TCGA LUSC Tumors
We extended the partitioning of all LUSC TCGA tumors in subtypes. The 207 genes previously validated as signatures for the four subtypes were used for this task [25]; each gene was median centered on all 501 LUSC samples and Pearson correlations were calculated between the predictor centroids and TCGA samples. A TCGA tumor's subtype prediction was given by the centroid with the largest correlation value.

Global Gene Expression Analysis
Differentially Expressed Genes (DEG) analysis of RNA-seq data was performed using R package DESeq2 [29]. The tumor versus normal (T/N) expression fold change (FC) denotes upregulation or downregulation according to the FC value. Log 2 FC, and the corresponding false discovery rate (FDR) were all reported by the R package. FDR < 0.01 and |log 2 FC| > 2 were set as inclusion criteria for DEG selection in tumor/subtype versus normal samples.
To calculate DEG in tumors with low or high NF-YAs/NF-YAl ratios, we used FDR < 0.01 and |log 2 FC| > 1. Samples with values within the 1st quartile were considered as low ratio, those within the 2nd to 3rd quartile were considered as intermediate ratio, while samples of the 4th quartile were considered as high ratio. The above mentioned samples had clinical data available.
For NF-YA expression analysis in microarray data of LUSC subtypes [25], we used the GEO2R tool available on NCBI website, comparing each subtype with all other mRNA samples.

Transcription Factor Binding Site (TFBS) and De Novo Motif Discovery
To detect over-represented TFBSs, we run Pscan on the Pscan web interface [30], selecting promoter region of −450 +50 nucleotides from the transcription start site and using Jaspar_2016 descriptors. To identify novel motifs de novo we used Weeder 2.0 [31], providing as input the promoter sequences (-450 +50 nucleotides from the transcription start site) from co-regulated genes. To calculate the statistical significance, we loaded the matrices obtained from Weeder 2.0 into Pscan together with the list of enriched genes, and we obtained the Bonferroni-corrected z-test p values. The position weight logo obtained were transformed from the Weeder 2.0 matrix output with seqLogo in R [32].

Analysis of Clinical Data
We retrieved clinical data relative to the TCGA LUSC samples, including Progression Free Interval (PFI) time records of 318 patients, from the https://nborcherding.shinyapps.io/TRGAted/ web page [33]. Survival analysis was performed according to the Kaplan-Meier analysis and log-rank test [34]. Cox proportional hazard modelling of ratio and covariates was calculated to determine their independent impact on patient's survival and to estimate the corresponding hazard ratio setting either low global NF-YA expression or intermediate ratio (NF-YAs/NF-YAl) as reference values.

Statistical Analysis
All data analyses were performed in the R programming environment (version 3.2.5), using several packages (ggplot2, ggsignif, reshape, ggpubr, clinfun, heatmap.plus). Single comparisons between two groups were performed with the Wilcoxon rank sum test, while we calculated the Jonckheere's Trend test to verify statistically significant trends between two genes of interest in a pre-ranked dataset.

NF-YA is Overexpressed in Lung Tumors
We used Firebrowse (http://firebrowse.org/viewGene.html) to gather a bird's eye view on the levels of NF-Y subunits in 18 tumor types analyzed by TCGA: NF-YA, not NF-YB/NF-YC, is elevated in the majority of tumors of epithelial origin [35]. Specifically, LUSC showed high statistical significance. We thus decided to analyze systematically the LUSC dataset of TCGA [26]: quantification of RNA-seq data indicates that the levels of NF-YA are indeed increased in cancer samples compared to normal controls (p value 10 −12 ); instead, NF-YB shows comparable levels in tumors and normal cells ( Figure 1A). tumors and normal cells ( Figure 1A). NF-YC also shows increased levels, albeit with modest statistical significance (p value 10 −2 ). Note that the global levels of HFDs mRNAs, particularly NF-YC, are higher than those of NF-YA. To confirm the TCGA findings, we interrogated another dataset GSE81089, derived mostly from a Swedish cohort of patients [27]: to the best of our knowledge, this is the largest, 198 samples, non-TCGA study reporting on RNA-seq of NSCLC. This classification contains two major types of lung cancers, LUSC and LUAD: therefore, it was first necessary to partition samples in these two types, by using a gene expression signature previously described [28]. Correlation plot and Principal Component Analysis (PCA) show substantial separation of the two groups ( Figure S1A,B); the heatmap of signature genes previously identified confirms the partitioning ( Figure S1C). This procedure allowed the identification of 78 LUSC tumors in this dataset. The complete list of LUAD and LUSC samples are in Table S1. We then focused on LUSC samples and assessed mRNA levels of the three NF-Y subunits, as done with the TCGA data: NF-YA is substantially increased in tumors (p value 10 −5 ), unlike NF-YB and NF-YC ( Figure 1B). These latter results confirm NF-YA overexpression, as well as normal expression of the HFD subunits, in LUSC.

Splicing Isoforms of NF-YA Are Differentially Regulated in LUSC
NF-YA and NF-YC genes are involved in alternative splicing, generating two major NF-YA ( Figure 2A) and multiple NF-YC ( Figure S2) isoforms. We analyzed them individually in LUSC, first in the TCGA dataset. Figure 2B shows that in normal cells there are balanced levels of the two NF-YA isoforms; the "short" (NF-YAs) is substantially increased (p value 10 −23 ), and NF-YAl decreased (p value 10 −7 ), in tumor samples. As a result, a robust change in the ratio of isoforms levels is produced ( Figure 2C). As for NF-YC, the predominant 37 kD and 50 kD isoforms are concomitantly increased (p value 10 −8/9 , Figure 2D).
We then turned to the GSE81089 data: NF-YAs was statistically increased (p value 10 −5 , Figure  2E), whereas NF-YAl and the NF-YC isoforms are overall similar in normal and tumor samples ( Figure 2E,G). Largely because of the NF-YAs increase, the NF-YAs/NF-YAl ratio is indeed increased in tumors ( Figure 2F). Although differences are scored between the two datasets, mostly regarding the degree of NF-YAl decrease in tumors, the data concur that NF-YAs levels are increased in LUSC To confirm the TCGA findings, we interrogated another dataset GSE81089, derived mostly from a Swedish cohort of patients [27]: to the best of our knowledge, this is the largest, 198 samples, non-TCGA study reporting on RNA-seq of NSCLC. This classification contains two major types of lung cancers, LUSC and LUAD: therefore, it was first necessary to partition samples in these two types, by using a gene expression signature previously described [28]. Correlation plot and Principal Component Analysis (PCA) show substantial separation of the two groups ( Figure S1A,B); the heatmap of signature genes previously identified confirms the partitioning ( Figure S1C). This procedure allowed the identification of 78 LUSC tumors in this dataset. The complete list of LUAD and LUSC samples are in Table S1. We then focused on LUSC samples and assessed mRNA levels of the three NF-Y subunits, as done with the TCGA data: NF-YA is substantially increased in tumors (p value 10 −5 ), unlike NF-YB and NF-YC ( Figure 1B). These latter results confirm NF-YA overexpression, as well as normal expression of the HFD subunits, in LUSC.

Splicing Isoforms of NF-YA Are Differentially Regulated in LUSC
NF-YA and NF-YC genes are involved in alternative splicing, generating two major NF-YA ( Figure 2A) and multiple NF-YC ( Figure S2) isoforms. We analyzed them individually in LUSC, first in the TCGA dataset. Figure 2B shows that in normal cells there are balanced levels of the two NF-YA isoforms; the "short" (NF-YAs) is substantially increased (p value 10 −23 ), and NF-YAl decreased (p value 10 −7 ), in tumor samples. As a result, a robust change in the ratio of isoforms levels is produced ( Figure 2C). As for NF-YC, the predominant 37 kD and 50 kD isoforms are concomitantly increased (p value 10 −8/9 , Figure 2D).
Genes 2019, 10, x; doi: FOR PEER REVIEW www.mdpi.com/journal/genes tumors. We conclude that there is an isoform switch in NF-YA isoforms in LUSC, from normal to tumor cells.

NF-YA Levels Correlate with Proliferative Markers
To correlate the levels of NF-YA expression to proliferation of lung tumor cells, the 501 TCGA tumors were binned according 10 different NF-YA expression levels, from high to low ( Figure 3, upper panel). Thereafter, we measured expression of five genes commonly considered as proliferative markers, Ki67, TOPO IIa, BUB1, CENP2, FOXM1, including in lung cancers [36][37][38][39]; in the 10 NF-YA cohorts, all showed progressively decreasing levels from high-to-low NF-YA We then turned to the GSE81089 data: NF-YAs was statistically increased (p value 10 −5 , Figure 2E), whereas NF-YAl and the NF-YC isoforms are overall similar in normal and tumor samples ( Figure 2E,G). Largely because of the NF-YAs increase, the NF-YAs/NF-YAl ratio is indeed increased in tumors ( Figure 2F). Although differences are scored between the two datasets, mostly regarding the degree of NF-YAl decrease in tumors, the data concur that NF-YAs levels are increased in LUSC tumors. We conclude that there is an isoform switch in NF-YA isoforms in LUSC, from normal to tumor cells.

NF-YA Levels Correlate with Proliferative Markers
To correlate the levels of NF-YA expression to proliferation of lung tumor cells, the 501 TCGA tumors were binned according 10 different NF-YA expression levels, from high to low ( Figure 3, upper panel). Thereafter, we measured expression of five genes commonly considered as proliferative markers, Ki67, TOPO IIa, BUB1, CENP2, FOXM1, including in lung cancers [36][37][38][39]; in the 10 NF-YA cohorts, all showed progressively decreasing levels from high-to-low NF-YA expressing tumors ( Figure 3). The calculated statistical significance for each gene is high (p values 10 −10/16 ). We conclude that elevated expression of NF-YA correlates with that of a number of proliferative marker genes in LUSC.
Genes 2019, 10, x FOR PEER REVIEW 6 of 20 expressing tumors ( Figure 3). The calculated statistical significance for each gene is high (p values 10 −10/16 ). We conclude that elevated expression of NF-YA correlates with that of a number of proliferative marker genes in LUSC.

NF-Y Sites are Enriched in Promoters of Genes Overexpressed in LUSC
We analyzed global RNA-seq gene expression in the 501 LUSC TCGA tumors, compared to normal tissues present in the datasets. The lists of Differentially Expressed Genes (DEG) using a |log2 FC| > 2, FDR < 0.01 threshold, are in Table S2. 1905 genes are overexpressed and 1238 are downregulated, the majority, 77%, are classified as unchanged ( Figure 4A). We then analyzed promoter

NF-Y Sites are Enriched in Promoters of Genes Overexpressed in LUSC
We analyzed global RNA-seq gene expression in the 501 LUSC TCGA tumors, compared to normal tissues present in the datasets. The lists of Differentially Expressed Genes (DEG) using a |log 2 FC| > 2, FDR < 0.01 threshold, are in Table S2. 1905 genes are overexpressed and 1238 are down-regulated, the majority, 77%, are classified as unchanged ( Figure 4A). We then analyzed promoter sequences, from −450 to +50 from the TSS, of DEG with Pscan [30], an algorithm identifying enriched DNA matrices present in TFs databases (JASPAR). Figure 4B (left panel) shows that NF-Y is the most enriched matrix in up-regulated promoters. Note that NF-YA (first) and NF-YB (fourth) are essentially identical CCAAT matrices. In promoters of down-regulated genes, instead, matrices of SRF, RFX and zinc-fingers TFs, such as ZNF263 and GATA, predominate, but CCAAT is absent ( Figure 4B, Right Panel).  Figure 4B (left panel) shows that NF-Y is the most enriched matrix in up-regulated promoters. Note that NF-YA (first) and NF-YB (fourth) are essentially identical CCAAT matrices. In promoters of down-regulated genes, instead, matrices of SRF, RFX and zinc-fingers TFs, such as ZNF263 and GATA, predominate, but CCAAT is absent ( Figure 4B, Right Panel).
We then pursued analysis of Gene Ontology (GO) terms present in DEG genes, using the KOBAS software: we show enrichment of cell-cycle, particularly G2/M annotations such as mitosis, M phase, prometaphase, chromosome condensation, and signaling, notably Rho GTPases ( Figure 4C). In downregulated genes, we found a more variegated set of terms, stemming from signaling to immunity, coagulation and metabolism. We conclude that CCAAT boxes predominate in promoters of genes overexpressed in LUSC, mostly related to cell-cycle and signaling functions.  We then pursued analysis of Gene Ontology (GO) terms present in DEG genes, using the KOBAS software: we show enrichment of cell-cycle, particularly G2/M annotations such as mitosis, M phase, prometaphase, chromosome condensation, and signaling, notably Rho GTPases ( Figure 4C). down-regulated genes, we found a more variegated set of terms, stemming from signaling to immunity, coagulation and metabolism. We conclude that CCAAT boxes predominate in promoters of genes overexpressed in LUSC, mostly related to cell-cycle and signaling functions.

Classification of All TCGA LUSC Tumors in Subtypes
LUSC tumors are classified in basal, classical, primitive and secretory, according to molecular criteria: the global NF-YA overexpression observed above could be limited to one (or more) of the subtypes. In TCGA, subtypes analysis was previously performed on a third (178) of the 501 available tumor samples [26]. We figured that it could be challenging to gain statistically significant results on NF-YA expression for primitive tumors, for which only 27 samples were available from TCGA. Hence, we decided to extend the subtypes classification to all 501 tumors with RNA-seq data. To do so, we took advantage of a 207 genes signature originally described by Wilkerson et al. [25], and subsequently used by TCGA. Figure 5A shows the heatmap of expression levels of 16 representative genes in LUSC samples, showing the expected partitioning. Figure S3 shows box plots of expression of genes hallmarks for each subtype: MCM10 and TYMS (primitive), TP63 and TXN (classical), ARHGDIB and TNFSRF14 (secretory), S100A7 and MMP13 (basal). Venn diagrams of the previous and new, complete classification is shown in Figure 5B; note that the relative proportions are essentially identical: basal are now 138, Primitive 75, secretory 119 and classical 169. Finally, Principal Component Analysis (PCA) of all tumor samples shows robust clustering of the subgroups ( Figure 5C). The complete list with all TCGA LUSC tumors partitioned according to the four subtypes is in Table S3.

Classification of All TCGA LUSC Tumors in Subtypes
LUSC tumors are classified in basal, classical, primitive and secretory, according to molecular criteria: the global NF-YA overexpression observed above could be limited to one (or more) of the subtypes. In TCGA, subtypes analysis was previously performed on a third (178) of the 501 available tumor samples [26]. We figured that it could be challenging to gain statistically significant results on NF-YA expression for primitive tumors, for which only 27 samples were available from TCGA. Hence, we decided to extend the subtypes classification to all 501 tumors with RNA-seq data. To do so, we took advantage of a 207 genes signature originally described by Wilkerson et al. [25], and subsequently used by TCGA. Figure 5A shows the heatmap of expression levels of 16 representative genes in LUSC samples, showing the expected partitioning. Figure S3 shows box plots of expression of genes hallmarks for each subtype: MCM10 and TYMS (primitive), TP63 and TXN (classical), ARHGDIB and TNFSRF14 (secretory), S100A7 and MMP13 (basal). Venn diagrams of the previous and new, complete classification is shown in Figure 5B; note that the relative proportions are essentially identical: basal are now 138, Primitive 75, secretory 119 and classical 169. Finally, Principal Component Analysis (PCA) of all tumor samples shows robust clustering of the subgroups ( Figure  5C). The complete list with all TCGA LUSC tumors partitioned according to the four subtypes is in Table S3.

Expression of NF-YA Isoforms in LUSC Subtypes
With the complete list of LUSC subtypes on hand, we compared the levels of the three subunits in the four subtypes with normal lung tissues. First, we compared the global levels of NF-YA (both isoforms) in the four LUSC subtypes, as found in microarray datasets [25]: primitive tumors showed the highest expression and classical the lowest ( Figure S4A). Note that normal counterparts are absent in these data. We then moved to TCGA LUSC RNA-seq datasets: the same trend of NF-YA levels (both isoforms) is observed in the same subtypes as above ( Figure S4B). As for the specific levels of the two isoforms, NF-YAl drops significantly in Classical and Basal ( Figure 6A), NF-YAs increases in all (p value 10 −18/19 ). As a consequence, the NF-YAs/NF-YAl ratio raises substantially in all tumors ( Figure 6B). As for NF-YC, there is a generalized increase of the 37 kD, particularly in primitive ( Figure 6C), indicating that the global increase observed in this subtype is essentially due to this NF-YC isoform: in fact, the NF-YC 50 kD isoform is lowly expressed. NF-YB is either not changed, or modestly decreased, in basal tumors ( Figure 6D). In conclusion, the data concur that overexpression of NF-YA is widespread and not restricted to a specific subtype, and changes in HFD subunits are overall modest.

Genes Overexpressed in All LUSC Subtypes Have CCAAT in Promoters
We proceeded with RNA-seq gene expression analysis of the four partitioned subtypes, with the same threshold criteria used above. The lists of DEG genes are in Table S4. As for up-regulated genes, we found subtype-specific signatures of 105 genes in basal, 191 in classical, 426 in primitive and 72 in secretory tumors ( Figure 7A). Pairwise overlaps ranged from negligible (secretory/classical 3 genes) to relatively modest (classical/primitive 182 genes). Note that the vast majority of genes (1357) shows overexpression in all subtypes. We searched TFBS in the promoters of these commonly overexpressed genes with Pscan: we found a very robust enrichment of NF-Y sites (p values 10 −9/12 ), whose matrices are at the top of the list, together with different flavors of E2Fs and Sp1/KLFs DNA motifs ( Figure  7A). Remarkably, NF-Y matrices are absent from signatures of the specific subtypes ( Figure 7A). To

Genes Overexpressed in All LUSC Subtypes Have CCAAT in Promoters
We proceeded with RNA-seq gene expression analysis of the four partitioned subtypes, with the same threshold criteria used above. The lists of DEG genes are in Table S4. As for up-regulated genes, we found subtype-specific signatures of 105 genes in basal, 191 in classical, 426 in primitive and 72 in secretory tumors ( Figure 7A). Pairwise overlaps ranged from negligible (secretory/classical 3 genes) to relatively modest (classical/primitive 182 genes). Note that the vast majority of genes (1357) shows overexpression in all subtypes. We searched TFBS in the promoters of these commonly overexpressed genes with Pscan: we found a very robust enrichment of NF-Y sites (p values 10 −9/12 ), whose matrices are at the top of the list, together with different flavors of E2Fs and Sp1/KLFs DNA motifs ( Figure 7A). Remarkably, NF-Y matrices are absent from signatures of the specific subtypes ( Figure 7A). To confirm these findings, the commonly overexpressed genes were run on Weeder, an algorithm identifying de novo DNA matrices [31]. Figure 7B shows retrieval of CCAAT as the only statistically enriched DNA site. Figure S5 shows that the most statistically enriched categories in genes commonly overexpressed are cell-cycle and signaling, while gene signatures of individual subtype have other, more specific terms (not shown).

Clinical Outcomes of NF-YA Isoforms Overexpression
To investigate the potential impact on the clinical outcome of the disease, we first correlated the global levels of NF-YA with patient survival, by analyzing Progression Free Intervals (PFI), a measure that best predicts clinical outcomes [40]. We classified all tumors for which clinical features are available (318) in three cohorts: first quartile in the high levels, last quartile in the low, and the When the same exercise was done on the subtype specific cohorts, different groups of TFBSs emerged: HOX and bZip (OTX, Jun/FOS) in basal and secretory, zinc fingers (Sp1/3/8 and KLFs) in Classical, E2Fs and zinc fingers in primitive ( Figure 7A). The lower statistical significance is to be expected, given the fewer samples.
Finally, we performed the same type of analysis in down-regulated genes ( Figure S6: in the DEG common to the four subtypes (697) no CCAAT matrix was detected, but rather the SRF, GATA, RFX5 and FOXO sites, indeed reminiscent of those of Figure 4B. We conclude that genes overexpressed in all subtypes of LUSC have proliferative signatures, and their promoters harbor CCAAT boxes as well as sites of other pro-growth TFs.

Clinical Outcomes of NF-YA Isoforms Overexpression
To investigate the potential impact on the clinical outcome of the disease, we first correlated the global levels of NF-YA with patient survival, by analyzing Progression Free Intervals (PFI), a measure that best predicts clinical outcomes [40]. We classified all tumors for which clinical features are available (318) in three cohorts: first quartile in the high levels, last quartile in the low, and the remaining grouped in the intermediate. The curves of Figure 8A show a significant drop of PFI in patients with high and intermediate NF-YA levels. We calculated the hazard ratio in the three bins of NF-YA levels: Stage II and Stage IV are predictably skewed toward highest ratio ( Figure 8B). None of the other parameters, with the exception of intermediate NF-YA levels, showed a statistically significant change.
Next, we repeated the analysis based on NF-YAs/NF-YAl ratios: the first quartile with NF-YAl high , the last quartile with NF-YAs high and the two middle quartiles having intermediate levels of the isoforms. The results of Figure 8C shows a drop in PFI with high or low NF-YAs/NF-YAl ratios compared to tumors with intermediate ratios (p value 0.045).
As for hazard ratio analysis, the results are shown in Figure 8D: hazard ratios are indeed shifted in tumors with high (2.1, p value 0.004) and low (2, p value 0.004) NF-YAs/NF-YAl ratio; in secretory and primitive (2 and 1.9, p values 0.033), but not in the other subtypes. It progressively increases from the reference Stage I to 1.8 (Stage II), 2.2 (Stage III) and 29.9 in the few (n = 3) Stage IV patients. We conclude that the relative levels of the two NF-YA isoforms have an impact on the clinical outcome of LUSC: surprisingly, worst prognosis is observed not only in patients with high NF-YAs/NF-YAl ratios, but equally in those with very low.

LUSC Tumors with Low NF-YAs/NF-YAl Ratio Have a Distinct Gene Signature
The clinical data shown above spurred us to further investigate gene expression of the cohort of 80 tumors with low NF-YAs/NF-YAl ratios, most of which have relatively high levels of NF-YAl. We first compared DEG -|log2FC| >1, FDR < 0.01-in low and high NF-YAs/NF-YAl ratios with intermediate ones (not shown) and then proceeded to directly compare low to high ratios. There are  (I-IV), gender, sample type (basal set as reference) and NF-YAs/NF-YAl ratios. P values are calculated using a Cox proportional hazards regression analysis (See Materials and Methods). * p value < 0.05, ** p value < 0.01, *** p value < 0.0001.

LUSC Tumors with Low NF-YAs/NF-YAl Ratio Have a Distinct Gene Signature
The clinical data shown above spurred us to further investigate gene expression of the cohort of 80 tumors with low NF-YAs/NF-YAl ratios, most of which have relatively high levels of NF-YAl. We first compared DEG -|log 2 FC| >1, FDR < 0.01-in low and high NF-YAs/NF-YAl ratios with intermediate ones (not shown) and then proceeded to directly compare low to high ratios. There are 807 up-regulated and 438 down-regulated genes in low vs. high ratios cohorts (Figure 9): GO terms in down-regulated genes are mostly associated with metabolism (xenobiotics, drugs, retinol); this is in keeping with the general gene expression analysis of LUSC tumors shown in Figures 4 and 7, despite the lack of the cell-cycle term. Interestingly, tumors with low ratios, those generally NF-YAl high , show enrichment of new GO terms, linked to extracellular matrix, collagen, integrin. As the second word is specifically degradation of the extracellular matrix, they can be collectively associated to increased cell migration features (Figure 9). In the promoter of this cohort of up-regulated genes, Pscan analysis does not detect any enrichment of NF-Y binding sites. We conclude that the sub-group of LUSC tumors with worst prognosis, singled-out because of their comparatively higher levels of NF-YAl, is marked by high expression of CCAAT-less pro-migration genes.  Figure 9): GO terms in down-regulated genes are mostly associated with metabolism (xenobiotics, drugs, retinol); this is in keeping with the general gene expression analysis of LUSC tumors shown in Figures 4 and 7, despite the lack of the cell-cycle term. Interestingly, tumors with low ratios, those generally NF-YAl high , show enrichment of new GO terms, linked to extracellular matrix, collagen, integrin. As the second word is specifically degradation of the extracellular matrix, they can be collectively associated to increased cell migration features ( Figure 9). In the promoter of this cohort of up-regulated genes, Pscan analysis does not detect any enrichment of NF-Y binding sites. We conclude that the sub-group of LUSC tumors with worst prognosis, singled-out because of their comparatively higher levels of NF-YAl, is marked by high expression of CCAAT-less pro-migration genes.

Discussion
Nothing was previously known about NF-Y in lung squamous cells carcinomas. In addition to finding an abundance of CCAAT in promoters of "cancer" genes as in other tumor types, three new findings are reported: (i) overexpression of NF-YA, notably the "short" isoform, correlating with known proliferation markers; (ii) identification of a distinct cohort of tumors with higher NF-YAl levels and a CCAAT-less pro-migration signature; (iii) negative clinical outcome of tumors with global increase of NF-YA levels as well as an opposite spectra, with low, and high, NF-YAs/NF-YAl isoforms ratios.
Abundance of CCAAT boxes in promoters of genes overexpressed in cancer was reported in microarray profiling experiments, culminating with an unbiased search for TFBSs in the vast Oncomine microarray dataset, which found the NF-Y matrix as one of the most represented [6]. On the other hand, CCAAT was never reported in promoters of genes down-regulated in cancer. The concept was later reinforced in small sets of RNA-seq data [9,10]. Overall, these studies retrieved "proliferative" cancer signatures, often enriched in cell-cycle, signaling and metabolism terms. NF-Y is

Discussion
Nothing was previously known about NF-Y in lung squamous cells carcinomas. In addition to finding an abundance of CCAAT in promoters of "cancer" genes as in other tumor types, three new findings are reported: (i) overexpression of NF-YA, notably the "short" isoform, correlating with known proliferation markers; (ii) identification of a distinct cohort of tumors with higher NF-YAl levels and a CCAAT-less pro-migration signature; (iii) negative clinical outcome of tumors with global increase of NF-YA levels as well as an opposite spectra, with low, and high, NF-YAs/NF-YAl isoforms ratios.
Abundance of CCAAT boxes in promoters of genes overexpressed in cancer was reported in microarray profiling experiments, culminating with an unbiased search for TFBSs in the vast Oncomine microarray dataset, which found the NF-Y matrix as one of the most represented [6]. On the other hand, CCAAT was never reported in promoters of genes down-regulated in cancer. The concept was later reinforced in small sets of RNA-seq data [9,10]. Overall, these studies retrieved "proliferative" cancer signatures, often enriched in cell-cycle, signaling and metabolism terms. NF-Y is acknowledged as essential for activation of many cell-cycle genes, and the majority, if not all, G2/M genes [5]. Here, the cluster of 1905 genes globally overexpressed in LUSC tumors corresponds, by and large, to the pro-proliferative signature previously identified upon analysis of the 178 tumors by TCGA [26]. Analysis of the four subtypes reinforces that the CCAAT box takes little, if any, part in regulation of subtype-specific signatures, but rather in the core group of overexpressed genes. We find the NF-Y matrix accompanied by two classes of TFBSs, E2Fs and Sp/KLFs, often associated with CCAAT. Indeed, there is robust evidence of colocalization of NF-Y with the respective DNA-binding proteins, E2F1/4, Sp1, Sp2, on active promoters in vivo [14,[41][42][43][44]. In particular, the role of overexpressed E2F family members in lung cancer is well documented [45][46][47][48].
Our finding of global NF-YA overexpression, while the HFD subunits are not, lends support to the idea that the former is the regulatory, rate-limiting subunit of the trimer. The original indication came from two systems of post-mitotic cells: unlike their cycling precursors, monocytes and myotubes have little NF-YA, and robust levels of NF-YB [49][50][51]. It is reasonable to think that NF-YA overexpression in tumors will form functional trimers with a potential excess of HFD dimers, switching on, or keeping at high constitutive levels, CCAAT-dependent growth-promoting genes. To settle this question in a definitive way, one would need to assess the relative amounts of the three proteins in nuclei.
Unfortunately, the precise calculation of the actual number of endogenous molecules of a single TF in a single nucleus is essentially not possible.
The existence of two major NF-YA splicing isoforms was documented a long time ago [2], but their respective roles have been unclear ever since. Structurally, NF-YAl contains 28 additional amino acids at the N-terminal of the protein, within the large glutamine and hydrophobics-rich trans-activation domain (TAD); these 28 aminoacids have the same relative composition (Gln, Ile/Leu and absence of charged residues) as the surrounding part. On the other hand, both isoforms share the C-terminal subunits-interaction and DNA-binding domains, hence trimerizing and binding CCAAT with identical affinities. In essence, NF-YAl and NF-YAs might impact in differential activation of transcription, rather than selective recognition of distinct CCAAT boxes. Data supporting this view were reported in mESCs [52].
First hints at different biological roles of the isoforms came from stem cell systems, human hematopoietic and mouse embryonic, indicating that NF-YAs is more abundant in "stem", NF-YAl in differentiated cells [51][52][53][54]. NF-YAl is found relatively high in cells of normal lung ( Figure 2B, 2E), but it's impossible to partition isoforms expression in the many types of cells present in lungs. Tumors samples also contain different cell types, including normal infiltrating immune cells, which are expected to express mainly NF-YAl. Hence, the majority of LUSC show a remarkable increase in NF-YAs, likely to be ascribed to growing epithelial tumor cells within the mass. NF-YAs high expressing tumors are associated with the above-mentioned proliferative signature of CCAAT-dependent genes and, indeed, with the proliferative markers shown in Figure 3. Incidentally, all their promoters are bona fide NF-Y targets, with the exception of Ki67.
Another interesting finding was the identification of a group of tumors, showing the opposite isoforms composition, namely low ratios of NF-YAs/NF-YAl. These tend to be NF-YAl high , although attempts aimed at purely correlate the expression levels of NF-YAl alone with either prognosis or a particular gene signature yielded negative results (not shown). It is rather the relative ratio of the two isoforms, hence taking into account also NF-YAs levels, that marks this population. The PFI curves of such tumors are very similar to the NF-YAs high cohort, and clearly worse than those of tumors with intermediate isoforms levels. Analysis of the gene expression profiles gave very different, almost opposite results in the two cohorts: metabolism and cell-cycle genes up-regulated in NF-YAs high tumors are rather down-regulated in this cohort, which instead shows up-regulation of a pro-migration, signature. Interestingly, the substantial absence of CCAAT in the promoters of these genes suggest that NF-YAl per se does not directly activates these genes, as NF-YAs does for metabolism and cell-cycle genes, but it might promote them indirectly, by activating yet to be identified TF(s).
This scenario is somewhat reminiscent of what we recently found in breast carcinomas: NF-YAs is also generally overexpressed in these tumors, but we identified a distinct subclass within basal-like tumors, ER − /PR − /ERBB2 − , triple negatives, which are NF-YAl high [35]. In this case, we were able to further characterize NF-YAl high tumors, being Claudins low , with high levels of EMT markers and low levels of basal Keratins. This BRCA Claudin low cohort is known to be more aggressive and prone to metastasize. In accordance with this, analysis of several BRCA cells lines found an excellent agreement between the relative levels of NF-YA isoforms mRNA and proteins [35]: although the same is likely to happen in LUSC as well, this point was not evaluated here, and will have to be a further line of future experimentation. The partitioning of NF-YA high /NF-Yl low in LUSC is less sharp, but we observe the same trend. We therefore hypothesize that NF-YAl is involved in a program of increased expression of mesenchymal genes, not targeted by NF-YAs. The molecular comprehension of the transcriptional mechanisms by which NF-YAl potentially promotes-indirectly-the expression of CCAAT-less pro-migration genes becomes a priority.
Finally, it will also be interesting to verify such findings in the other major type of lung cancer, adenocarcinoma (LUAD): the partitioning of GEO samples described here retrieved 91 LUAD tumors: as organized here, these data will be helpful in determining the relative expression levels of the isoforms and the presence of the CCAAT in overexpressed genes.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2073-4425/10/11/937/s1, Figure S1: partitioning of GSE81089 data. A. Correlation plot B. PCA. A. Heatmap of signature genes expression across sample classified. Figure S2: splicing of NF-YC. Figure S3: expression of genes hallmarks for each subtype. Figure S4: Global NF-YA expression levels in LUSC subtypes. A. Comparison of relative NF-YA global levels in LUSC subtypes in microarray data. B. Global NF-YA levels in TCGA LUSC subtypes according to RNA-seq data. Figure S5: KOBAS analysis of commonly up-regulated genes in LUSC subtypes. Figure S6: analysis of commonly down-regulated genes in LUSC subtypes. A. Venn diagrams. B. Pscan analysis of promoter sequences of commonly down-regulated genes in all LUSC subtypes. C. KOBAS analysis of commonly down-regulated genes in all LUSC subtypes. Table S1: complete list of LUAD and LUSC samples of GSE81089.