Characteristics and Prognosis of 8p11.23-Amplified Squamous Lung Carcinomas

Background: Copy number alterations are common genetic lesions in cancer. In squamous non-small cell lung carcinomas, the most common copy-number-altered loci are at chromosomes 3q26-27 and 8p11.23. The genes that may be drivers in squamous lung cancers with 8p11.23 amplifications are unclear. Methods: Data pertaining to copy number alterations, mRNA expression and protein expression of genes located in the 8p11.23 amplified region were extracted from various sources including The Cancer Genome Atlas, the Human Protein Atlas and the Kaplan Meier Plotter. Genomic data were analyzed using the cBioportal platform. Survival analysis of cases with amplifications compared to nonamplified cases was performed using the Kaplan Meier Plotter platform. Results: The 8p11.23 locus is amplified in 11.5% to 17.7% of squamous lung carcinomas. The most frequently amplified genes include NSD3, FGFR1 and LETM2. Only some of the amplified genes present concomitant overexpression at the mRNA level. These include NSD3, PLPP5, DDHD2, LSM1 and ASH2L, while other genes display lower levels of correlation, and still, some genes in the locus show no mRNA overexpression compared with copy-neutral samples. The protein products of most locus genes are expressed in squamous lung cancers. No significant difference in overall survival in 8p11.23-amplified squamous cell lung cancers versus nonamplified cancers is observed. In addition, there is no adverse effect of mRNA overexpression for relapse-free survival of any of the amplified genes. Conclusion: Several genes that are part of the commonly amplified locus 8p11.23 in squamous lung carcinomas are putative oncogenic candidates. A subset of genes of the centromeric part of the locus, which is amplified more commonly than the telomeric part, show high concomitant mRNA expression.


Introduction
Lung cancer is the most prevalent cancer worldwide and the leading cause of cancer death [1]. Two main types of lung cancers are distinguished histologically: non-small cell lung cancers (NSCLCs), which are the most common, and small cell lung cancers (SCLCs), which represent about 15% to 20% of the total lung cancers. NSCLC is divided into two main subtypes, adenocarcinomas and squamous lung cancers, which have distinct molecular pathogenesis. Based on molecular abnormalities, some adenocarcinomas are currently treated with targeted therapies against EGFR, ALK or ROS kinases [2]. Immunotherapies inhibiting CTLA-4 or the PD-L1/PD-1 ligand/receptor pair are also effective for subsets of patients with both subtypes of NSCLC [3,4]. Besides immunotherapies, no other targeted therapies currently exist for squamous cell lung cancers. Thus, there is a need for further rationally developed targeted therapies for this type of lung cancer based on molecular defects. The molecular landscape of squamous cell lung cancer has been elucidated by TCGA and consists of recurrent mutations in 11 genes and a mean of 360 exonic mutations per tumor as well as a mean of 323 copy number alterations [5]. Copy number alterations (CNAs), both gains and losses, are common molecular lesions in cancer and complement mutations and epigenetic changes in neoplastic pathogenesis [6]. Recurrent CNAs may be promoted by increasing the expression of oncogenes or leading to tumor suppressor losses. However, often, the gained or lost region contains several genes and most of them are passenger alterations with no pathophysiologic benefit to the cancer cell. In many cases of recurrent CNAs, the possible driver oncogene(s) or tumor suppressor(s) is not well defined.
In squamous NSCLC, the most commonly recurrently amplified area, in about 40% of cases, is at chromosome locus 3q26-27, which includes the oncogene SOX2, a transcription factor and member of a panel of factors that are able to reprogram differentiated cells to pluripotent stem cells [7]. SOX2 is involved in lung organogenesis and squamous differentiation [8]. Other oncogenes located in this locus include PIK3CA and MECOM, encoding for EVI1. Another amplified area in squamous NSCLC is at chromosome 8p11.23, which is the focus of this report. This locus is amplified in about one in six squamous NSCLCs and has been also reported to be amplified in breast cancers and urothelial bladder carcinomas [9][10][11].

Methods
Publicly available genomic data pertaining to the squamous subtype of NSCLC from The Cancer Genome Atlas (TCGA) were extracted using the cBioCancer Genomics Portal (cBioportal, http://www.cbioportal.org, last accessed: 12 November 2022), a site that allows for interrogation of data for genetic alterations such as mutations and CNAs as well as mRNA expression of any gene of interest [5,12]. CNAs are computed in TCGA using the Genomic Identification of Significant Targets in Cancer (GISTIC) algorithm. The algorithm assigns a putative amplification status to genes with a score of 2 or higher. A global aneuploidy status of each case is also provided in TCGA by an aneuploidy score (AS) representing the sum of chromosomal arms with gains or losses in Affymetrix 6.0 SNP arrays. The definition of a CNA of a whole chromosome arm for the calculation of AS consists of somatic copy number alterations in more than 80% of the total arm length as determined by the ABSOLUTE algorithm [13]. In contrast, chromosomal arms with alterations in less than 20% of their total length are defined as not copy-number-altered, while chromosome arms with alterations involving 20% to 80% of their length are not called, and thus considered not altered, according to the algorithm. The RSEM algorithm was used for normalization of mRNA expression [14].
The Human Protein Atlas (www.proteinatlas.org, last accessed: 19 November 2022), a publicly available database of protein expressions in human normal and neoplastic tissues, was interrogated for the expression of proteins of genes located at chromosome location 8p11.23 in squamous lung cancer [15]. The Human Protein Atlas contains data from a semi-quantitative immunohistochemistry-based evaluation of the expression of proteins in human tissues. On many occasions, evaluations have been carried out with several different commercially available antibodies for each protein.
The overall survival (OS) of 8p11.23-amplified squamous lung cancers (defined as NSD3 amplified per the GISTIC algorithm definition) and nonamplified squamous lung cancers was determined in the TCGA cohort. The prognosis of squamous lung cancer patients according to the mRNA expression level of each of the 8p11.23 genes and association with relapse-free survival (RFS) was tested using data from a series contained in the online publicly available platform Kaplan Meier Plotter (www.kmplot.com, last accessed: 18 November 2022) [16]. The cut-off of amplified and nonamplified samples for each gene was set at the higher quartile of amplification, as the closer cut-off to the percentage of lung cancer cases with 8p11.23 locus amplifications, provided by the kmplot platform.
Statistical comparisons of categorical and continuous data were carried out with Fisher's exact test or the χ 2 test and the t test. The log-rank test was used to compare Kaplan-Meier survival curves. All statistical comparisons were considered significant if p < 0.05 except for the RFS survival analysis according to mRNA expression levels of different genes, which was considered significant at a p < 0.0005 level to account for multiple comparisons.

Results
Squamous lung cancers with amplification of the 8p11.23 locus, as defined by NSD3 amplification, do not differ from non-8p11.23-amplified carcinomas in the mean age of patients at presentation, the percentage of patients older than 65 years old or in the sex and race distribution ( Table 1). The stage at diagnosis is also similar between the two groups. Genes located at the 8p11.23 locus are amplified in 11.5% to 17.7% of squamous NSCLCs ( Table 2). A higher frequency of amplification is observed in the most centromeric parts of the locus with genes NSD3, FGFR1 and LETM2 showing a higher number of amplified cases in TCGA ( Figure 1). Genes in the central and more telomeric parts of the locus display progressively lower frequencies of amplification, with the most telomeric genes ERLIN2 and ZNF703 being amplified in 60% to 70% of NSD3-amplified cases ( Figure 1). Conversely, in NSD3 nonamplified cases, the rest of the genes of the locus are amplified only rarely, in isolated cases (not shown). The global copy number alteration burden of 8p11.23-amplified and nonamplified squamous NSCLC as measured by the AS was not different, with most cases in both groups having an intermediate AS between 4 and 24 ( Figure 2). No cases in the amplified group had an AS below 4, but even in the nonamplified group, only 4% of cases had an AS below 4. The mean AS of the amplified group was 16.29 (SD: 6.45), and the mean AS of the nonamplified group was 16.08 (SD: 6.64, unpaired t test p = 0.78). Individual chromosome arms with more frequent gains in the 8p11.23-amplified group compared to the nonamplified group included 8q (45.3% in the 8p11.23-amplified group versus 31.4% in the nonamplified group, p = 0.01) and 11q (16.3% versus 8%, respectively, p = 0.02) ( Figure 3A). In contrast, arm 8p showed no gains in any of the cases in the 8p11.23amplified group and it was gained in 6% of the nonamplified group. Chromosome arms with losses occurring more frequently in 8p11.23-amplified squamous NSCLC included 8p (72.1% versus 48.6% in the nonamplified group, p = 0.0001) and 5q (84.9% versus 68.6% in the nonamplified group, p = 0.003) ( Figure 3B). Chromosome arm 11q was more frequently (but not statistically significantly) lost in the 8p11.23 nonamplified group (26.4% versus 16.3% in the amplified group, p = 0.06).     The tumor mutation burden (TMB) was similar in 8p11.23-amplified and nonamplified squamous NSCLC, with about half of the patients in both groups having a TMB between 200 and 500 mutations and about 5% of the amplified group and a slightly higher percentage of the nonamplified group presenting a TMB above 500 ( Figure 4). The frequency of the two categories with mutation numbers above 200, which display a higher probability of responses to immunotherapy with immune checkpoint inhibitors, did not differ significantly between the 8p11.23-amplified and nonamplified groups (Fisher's exact test p = 0.9). The mean mutation number of the 8p11.23-amplified group was 271.9 (SD: 152.8) and did not differ from the mean mutation number of the nonamplified group, which was 270.9 (SD: 199.6, unpaired t test p = 0.9). Among individual oncogene mutations, TP53 mutations were more common in the 8p11.23-amplified group (90.7% versus 81.9% in the nonamplified group, Fisher's exact test p = 0.05), while the master hypoxia response transcription factor NFE2L2 was more often mutated in the nonamplified group but not statistically significant (16.1% versus 9.3% in the 8p11.23-amplified group, Fisher's exact test p = 0.13). The prevalence of PIK3CA mutations in the amplified group (8.1%) was also not different from the nonamplified group (11.6%, Fisher's exact test p = 0.44). Figure 5 shows the percentage of mutations in the most frequently mutated oncogenes in squamous NSCLC in the two groups. The tumor mutation burden (TMB) was similar in 8p11.23-amplified and nonamplified squamous NSCLC, with about half of the patients in both groups having a TMB between 200 and 500 mutations and about 5% of the amplified group and a slightly higher percentage of the nonamplified group presenting a TMB above 500 (Figure 4). The frequency of the two categories with mutation numbers above 200, which display a higher probability of responses to immunotherapy with immune checkpoint inhibitors, did not differ significantly between the 8p11.23-amplified and nonamplified groups (Fisher's exact test p = 0.9). The mean mutation number of the 8p11.23-amplified group was 271.9 (SD: 152.8) and did not differ from the mean mutation number of the nonamplified group, which was 270.9 (SD: 199.6, unpaired t test p = 0.9). Among individual oncogene mutations, TP53 mutations were more common in the 8p11.23-amplified group (90.7% versus 81.9% in the nonamplified group, Fisher's exact test p = 0.05), while the master hypoxia response transcription factor NFE2L2 was more often mutated in the nonamplified group but not statistically significant (16.1% versus 9.3% in the 8p11.23-amplified group, Fisher's exact test p = 0.13). The prevalence of PIK3CA mutations in the amplified group (8.1%) was also not different from the nonamplified group (11.6%, Fisher's exact test p = 0.44). Figure 5 shows the percentage of mutations in the most frequently mutated oncogenes in squamous NSCLC in the two groups.   Another chromosome area that is most frequently amplified in squamous NSCLC is 3q26, which harbors oncogenes SOX2, PIK3CA and MECOM. An evaluation of 8p11.23amplified and nonamplified samples disclosed that SOX2, PIK3CA and MECOM genes are coamplified in similar percentages of cases (40.7% versus 39.7% for SOX2, 38.4% versus 35.2% for MECOM and 38.4% versus 37.7% for PIK3CA, respectively). Increased mRNA expression of the amplicon genes in amplified cases correlates with gene amplification in several genes but not in others. The most frequently amplified gene of the amplicon NSD3 shows higher mRNA overexpression in over 90% of the amplified samples. In contrast, the two neighboring genes FGFR1 and LETM2, which are almost invariably coamplified, are overexpressed at the mRNA level in about half or less of the amplified cases ( Figure 6). Four other genes that show high mRNA expression (over 70% of amplified cases with a z score above 2 compared with diploid samples) include PLPP5, DDHD2, LSM1 and ASH2L. Four genes, ADGRA2, GOT1L1, ADRB3 and STAR, are not overexpressed in any amplified cases. Besides ADGRA2, ADRB3 and STAR, which are not expressed, protein products of genes of 8p11.23 are in general expressed in squamous NSCLC, at least with one antibody checked (Table 3). However, there was significant variability depending on the antibody used. Another chromosome area that is most frequently amplified in squamous NSCLC is 3q26, which harbors oncogenes SOX2, PIK3CA and MECOM. An evaluation of 8p11.23-amplified and nonamplified samples disclosed that SOX2, PIK3CA and MECOM genes are coamplified in similar percentages of cases (40.7% versus 39.7% for SOX2, 38.4% versus 35.2% for MECOM and 38.4% versus 37.7% for PIK3CA, respectively). Increased mRNA expression of the amplicon genes in amplified cases correlates with gene amplification in several genes but not in others. The most frequently amplified gene of the amplicon NSD3 shows higher mRNA overexpression in over 90% of the amplified samples. In contrast, the two neighboring genes FGFR1 and LETM2, which are almost invariably coamplified, are overexpressed at the mRNA level in about half or less of the amplified cases ( Figure 6). Four other genes that show high mRNA expression (over 70% of amplified cases with a z score above 2 compared with diploid samples) include PLPP5, DDHD2, LSM1 and ASH2L. Four genes, ADGRA2, GOT1L1, ADRB3 and STAR, are not overexpressed in any amplified cases. Besides ADGRA2, ADRB3 and STAR, which are not expressed, protein products of genes of 8p11.23 are in general expressed in squamous NSCLC, at least with one antibody checked (Table 3). However, there was significant variability depending on the antibody used. Table 3. Protein expression by immunohistochemistry (IHC) of 8p11.23 proteins in squamous lung carcinomas from the Human Protein Atlas. In IHC staining columns, intensity of staining has been grouped as none/low and medium/high. The second column shows the antibody commercial catalogue number, type and company. r pAb: rabbit polyclonal antibody, r mAb: rabbit monoclonal antibody, m mAb: mouse monoclonal antibody, NA: not available.       Although the OS of the 8p11.23-amplified squamous lung carcinoma group in TCGA was better than the OS of the nonamplified group, this difference did not reach statistical significance (log-rank p = 0.12, Figure 7). Relapse-free survival (RFS) of squamous NSCLC patients in the higher quartile of mRNA expression of NSD3 was no different from counterparts in the three lower quartiles (not shown). Surprisingly, the RFS of patients in the higher quartile of mRNA expression of FGFR1 expression was improved compared with patients in the three lower quartiles (not shown). All other genes besides PLPP5, which showed better RFS in the mRNA overexpressed group, showed no significant differences in RFS between groups (not shown). Although the OS of the 8p11.23-amplified squamous lung carcinoma group in TCGA was better than the OS of the nonamplified group, this difference did not reach statistical significance (log-rank p = 0.12, Figure 7). Relapse-free survival (RFS) of squamous NSCLC patients in the higher quartile of mRNA expression of NSD3 was no different from counterparts in the three lower quartiles (not shown). Surprisingly, the RFS of patients in the higher quartile of mRNA expression of FGFR1 expression was improved compared with patients in the three lower quartiles (not shown). All other genes besides PLPP5, which showed better RFS in the mRNA overexpressed group, showed no significant differences in RFS between groups (not shown).

Discussion
Chromosomal locus 8p11.23 is the second most frequently amplified locus in the squamous histology of lung cancers, and squamous NSCLC is a type of cancer with a higher frequency of amplifications in this locus among all cancer histologies and primary locations. In contrast, adenocarcinomas of the lung display amplifications in this chromosomal locus at a lower frequency (about 2.5% to 3% of cases in TCGA lung adenocarcinoma study). Genes located at 8p11.23 include receptor tyrosine kinase FGFR1; two methyl-transferases, ASH2L, which is part of the mixed lineage leukemia (MLL) complex, and NSD3; two phospholipid phosphatases, DDHD2 and PLPP5; and proteins ZNF703 and BRF2, which are transcription regulators (Table 1). Two proteins, EIF4EBP1 and LSM1, located at 8p11.23 are involved in mRNA translation and metabolism. A list of additional genes amplified in squamous NSCLC is shown in Table 1. Previous studies have examined the implications of some 8p11.23 genes in lung cancer. ZNF703 is a transcription factor with roles in development and in ER-positive breast cancers where it is associated with more aggressive subsets [17]. In lung cancer, samples with ZNF703 amplification displayed variable mRNA overexpression, suggesting an imperfect correlation [18]. Another transcription regulator from 8p11.23, BRF2, is a subunit of transcription factor TFIIIB [19]. TFIIIB co-operates in transcription guided by RNA polymerase III, the polymerase transcribing tRNA genes. Thus, BRF2 plays an important role in the regula-

Discussion
Chromosomal locus 8p11.23 is the second most frequently amplified locus in the squamous histology of lung cancers, and squamous NSCLC is a type of cancer with a higher frequency of amplifications in this locus among all cancer histologies and primary locations. In contrast, adenocarcinomas of the lung display amplifications in this chromosomal locus at a lower frequency (about 2.5% to 3% of cases in TCGA lung adenocarcinoma study). Genes located at 8p11.23 include receptor tyrosine kinase FGFR1; two methyl-transferases, ASH2L, which is part of the mixed lineage leukemia (MLL) complex, and NSD3; two phospholipid phosphatases, DDHD2 and PLPP5; and proteins ZNF703 and BRF2, which are transcription regulators (Table 1). Two proteins, EIF4EBP1 and LSM1, located at 8p11.23 are involved in mRNA translation and metabolism. A list of additional genes amplified in squamous NSCLC is shown in Table 1. Previous studies have examined the implications of some 8p11.23 genes in lung cancer. ZNF703 is a transcription factor with roles in development and in ER-positive breast cancers where it is associated with more aggressive subsets [17]. In lung cancer, samples with ZNF703 amplification displayed variable mRNA overexpression, suggesting an imperfect correlation [18]. Another transcription regulator from 8p11.23, BRF2, is a subunit of transcription factor TFIIIB [19]. TFIIIB co-operates in transcription guided by RNA polymerase III, the polymerase transcribing tRNA genes. Thus, BRF2 plays an important role in the regulation of protein synthesis, with implications for proliferating cancer cells. In lung cancer, pathways upregulating BRF2 have been shown to favor cancer progression [20,21]. BAG4, a protein transcribed from a gene at 8p11.23 with a role in the inhibition of apoptosis, has been shown to transform breast cells and may have functional implications for lung cancer, being commonly coamplified with FGFR1 and NSD3 [22,23]. The current study examines the 8p11.23-amplified area and the nineteen genes that are located at this chromosomal locus in squamous NSCLC. The analysis based on published genomic data shows that the genes of the locus are amplified en bloc in the majority of amplified cases, while in fewer cases, only a subset of genes at 8p11.23 are amplified. The higher frequency of amplification among the genes of the 8p11.23 locus is observed in the most centromeric genes including NSD3, LETM2 and FGFR1. The amplification of 8p11.23 has no significant influence on the TMB or the AS of the cases, suggesting that genes in the locus are not involved in aneuploidy or DNA repair mechanisms of the cancer cells. In addition, no influence of 8p11.23 amplification on the prevalence of the other frequent CNA in squamous NSCLC, the amplification of 3q26 is observed. mRNA expression of the amplified genes is variable, with a higher correlation of amplification and overexpression observed in several genes in the most centromeric part of the locus (NSD3, PLPP5, DDHD2, LSM1 and ASH2L) and also in a few genes that are located toward the telomeric end of the area (BRF2 and PLPBP). Several other genes show lower or no increase in mRNA expression in amplified cases. Interestingly, increased mRNA expression was not associated with worse patient RFS for any of the genes amplified at 8p11.23. The caveat of this survival analysis is that, in the increased-mRNA-expression group, up to one-third of patients may be nonamplified, as the cut-off was at the highest quartile of expression. These data suggest that genes amplified at 8p11.23 do not confer survival or other cancer-related process benefit in squamous NSCLC but may be amplified as part of an underlying defect that makes the locus prone to repeated DNA replication. However, given the limitations of the survival analysis, it cannot be totally excluded that a survival benefit of increased copy numbers of one or more 8p11.23 genes exists for squamous NSCLC cells. The centromeric area of the 8p11.23 amplicon that contains genes NSD3, LETM2, FGFR1 and TACC1 and presents a higher amplification frequency in squamous NSCLC is homologous to an area at human chromosome 4p that contains genes related to each of the 8p11 genes. These include NSD2, LETM1, FGFR3 and TACC3. This 4p area, although rarely amplified, it is the site of fusions between FGFR3 and TACC3 in a small percentage (1.2%) of squamous lung carcinoma cases, while FGFR1 and FGFR2 fusions are observed in isolated cases [5]. In bladder cancer, which is also a type of cancer that harbors 8p11.23 amplifications, the same fusions are present in the homologous 4p location. In addition, bladder cancers present mutations of FGFR3, which make them susceptible to treatment with FGFR inhibitors [24]. The presence of genetic abnormalities in these homologous regions harboring genes with similar functions in the same cancers would argue for a functional benefit conferred by one or more genes. The prime candidate in this case is obviously FGFR family members for which clinical data for the efficacy of their inhibition in other cancers exist, implying functional dependence of cancers carrying FGFR genetic lesions, at least in some cases. Related to FGFR1 targeting in 8p11.23-amplified squamous lung carcinomas, it is worth noting that, as shown here, amplification is not always associated with mRNA overexpression. This is important when considering FGFR1 amplifications as potential biomarkers for the development of treatments with FGFR inhibitors or inhibitors of the downstream cascades such as PI3K inhibitors [25,26].
An improved understanding of recurrent copy number alterations in cancer and the genes that are affected by them as well as the characterization of candidate driver genes in altered areas present therapeutic opportunities. The proof of principle has already been provided decades ago when amplifications of 17q in a subset of breast cancer genes were revealed to harbor ERBB2 amplifications and became the basis for effective therapies that have changed the outcomes of HER2-positive breast cancers [27,28]. A similar opportunity may arise with 8p11.23 amplifications and the development of FGFR inhibitors. Clinical translation will require a better grasp of driver genes in the locus and clarification of additional drivers. Methyltransferase NSD3 is a serious candidate and it has been shown to promote squamous lung carcinoma proliferation in experimental models in vitro and in vivo by promoting the transcription of oncogenes through lysine methylation of histone 3 at position 36 [29,30]. Pathways activated by driver genes creating tumor dependency would be important to elucidate. For FGFR receptors, for example, transactivation of additional tyrosine kinase receptors may be at play and could have implications for inhibition effectiveness in cancers with defects in these other tyrosine kinase receptors such as HER2 amplifications [31]. The development of companion diagnostics for the measurement of the amplification with the most effective cut-off for clinical efficacy will also be of paramount importance as the case of HER2 has shown.
Funding: This research received no external funding. Institutional Review Board Statement: As this study was a reanalysis of data available in the public domain, no approval by the Ethics Review Board was required or obtained.