Comprehensive Characterization of Human Lung Large Cell Carcinoma Identifies Transcriptomic Signatures with Potential Implications in Response to Immunotherapy

Lung cancer is the leading cause of cancer mortality worldwide, with non-small cell lung cancer (NSCLC) being the most prevalent histology. While immunotherapy with checkpoint inhibitors has shown outstanding results in NSCLC, the precise identification of responders remains a major challenge. Most studies attempting to overcome this handicap have focused on adenocarcinomas or squamous cell carcinomas. Among NSCLC subtypes, the molecular and immune characteristics of lung large cell carcinoma (LCC), which represents 10% of NSCLC cases, are not well defined. We hypothesized that specific molecular aberrations may impact the immune microenvironment in LCC and, consequently, the response to immunotherapy. To that end, it is particularly relevant to thoroughly describe the molecular genotype–immunophenotype association in LCC–to identify robust predictive biomarkers and improve potential benefits from immunotherapy. We established a cohort of 18 early-stage, clinically annotated, LCC cases. Their molecular and immune features were comprehensively characterized by genomic and immune-targeted sequencing panels along with immunohistochemistry of immune cell populations. Unbiased clustering defined two novel subgroups of LCC. Pro-immunogenic tumors accumulated certain molecular alterations, showed higher immune infiltration and upregulated genes involved in potentiating immune responses when compared to pro-tumorigenic samples, which favored tumoral progression. This classification identified a set of biomarkers that could potentially predict response to immunotherapy. These results could improve patient selection and expand potential benefits from immunotherapy.


Introduction
Lung cancer is responsible for the second highest incidence of tumors worldwide, representing 11.4% of newly diagnosed cancer cases in 2020. It is also the leading cause immunotherapy [21], thus making the identification of new biomarkers an unmet need, urgently requiring attention.
One notable example of a potential biomarker for NSCLC is the tumor mutational burden (TMB), which is defined as the total number of mutations per megabase of the tumor genome-encoding area [34][35][36]. TMB has shown promise as a predictive biomarker in NSCLC, a tumor type featuring a high prevalence of somatic mutations [37][38][39]. Theoretically, a higher number of mutations would increase the presence of tumor neoantigens that would trigger the immune response [35]. However, TMB presents with similar pitfalls to PD-L1 expression as a predictor of response to immunotherapy [21,31]. Other examples of immunotherapy response biomarkers under investigation include the density, tissue location and functional phenotype of tumor infiltrating lymphocytes (TILs) [31,40], HLA-I diversity [31,41], transcriptional and epigenetic signatures and even the microbiome [31]. Importantly, many other components of the immune system beyond lymphocytes could influence responses to checkpoint inhibitors [31,42]. As cancer complexity is likely to be the reason why a single optimal biomarker to predict response to immunotherapy remains elusive, many studies have therefore focused on searching for a multiparametric biomarker signature, based on clinical, molecular and immune characteristics.
In general terms, the LCC histology is exhibited by a subgroup of patients with poor prognosis that encompass a heterogeneous set of tumors lacking a proper classification. In the era of personalized medicine, the application of tailored treatments according to tumor histology and its molecular/immune landscape, employing targeted therapies or immunotherapies requires a more exhaustive description of LCC [6]. We hypothesized that there may be an association between the underlying molecular aberrations of LCC tumor cells and the immune phenotype in the tumor microenvironment. The aim of this study was thus to confirm such an association and to explore its implications in the treatment decision. From a cohort of N = 200 patients with NSCLC tumors, we gathered tumor samples from 18 of these patients with a diagnosis of early-stage LCC, collected in our hospital from 2002 to 2011. Most of the cohort was treatment naïve at the time of the surgical resection. This clinically annotated cohort was subjected to a comprehensive clinical, molecular and immune evaluation. Associated parameters arising from genomic, transcriptomic and immunophenotypic results allowed the identification of subgroups of patients characterized by a specific set of molecular and immune features.
This work provides an important step towards a comprehensive description of LCC as a proper entity with specific underlying features. The identification of patient subsets could enhance treatment decision outcomes and ultimately improve the benefits obtained from immunotherapy.

Lung Large Cell Carcinoma Samples and Patient Cohort
We established a cohort of 18 patients from whom early-stage lung LCC tumors were resected. After signed consent had been provided by the patients, resected tumor samples were formalin-fixed and paraffin-embedded (FFPE) and fresh-frozen with optimal cutting temperature (OCT) compound and stored in the Pathology Department of the 12 de Octubre University Hospital. The described procedures were approved by the Ethics Committee of the 12 de Octubre Hospital with Identification Number #17/454. All tumor samples were successfully described with all the applied methodologies, a summary of which can be found in Figure 1. Clinical and demographic information of the patient cohort is presented in Table 1. Samples were required to fulfill the following criteria in order to be eligible for the study: LCC histology, early stage, sufficient sample availability, complete clinical annotation and agreed consent provided by the patient for further analyses to be performed. clinical annotation and agreed consent provided by the patient for further analyses to be performed. Figure 1. Experimental design of the study. Samples from resected early-stage large cell carcinoma (LCC) of a clinically annotated cohort of 18 patients were subjected to molecular and immune profiling and integrative bioinformatics to find associations permitting patient subgroup classification according to potential responses to immunotherapy. ND stands for "Not Determined". NA stands for "Not Available".

DNA Extraction, Quantification and Quality Measurement
For each of the 18 tumor samples from the cohort, 5 sections (thickness 4 µm) of fresh-frozen tissue in OCT (−80 • C) were used to extract DNA. For this purpose, the DNeasy Blood and Tissue kit (#69504, QIAGEN, Hilden, Germany) was used according to the manufacturer's instructions. After DNA extraction, quantification was performed with the Quantifluor dsDNA system (#E2670, Promega, Madrid, Spain), with an average yield of 75.07 ng/µL obtained for the 18 samples, surpassing quantity requirements for subsequent applications. Finally, DNA quality was determined with the Illumina FFPE QC kit (#15013664, Illumina, San Diego, CA, USA), comparing the amplification efficiency to a control DNA provided by the manufacturer. IQ SYBR Green Supermix (#170-8880, Biorad, Hercules, CA, USA) was used to perform the amplification reaction. All DNA samples showed adequate quality required for use in subsequent protocols, and all procedures were performed by the same person at the 12 de Octubre Hospital to ensure experimental consistency. Information related to DNA extraction, quantification and quality can be found in Table S1A.

RNA Extraction, Quantification and Quality Measurement
For the 18 tumor samples from the cohort, a further 5 sections (thickness 4 µm) of fresh-frozen tissue in OCT (−80 • C) were used to extract RNA. To accomplish this goal, the RNeasy Mini kit (#74104, QIAGEN, Hilden, Germany) was used, according to the manufacturer's instructions. RNA quantification was performed with the Quantifluor RNA system (#E3310, Promega, Madrid, Spain), with an average yield of 141.09 ng/µL obtained, again exceeding the amounts required for subsequent procedures. RNA quality was evaluated using the RNA 6000 Nano kit (#5067-1511, Agilent, Santa Clara, CA, USA) or RNA 6000 Pico kit (#5067-1513, Agilent, Santa Clara, CA, USA) depending on the RNA concentration. According to their RNA Integrity Number (RIN), all samples were of sufficient quality for use in subsequent methodologies. All procedures were performed by the same person at the 12 de Octubre Hospital as described above. Information related to RNA extraction, quantification and quality can be found in Table S1B.

Tumor Genetic Sequencing
Specific molecular aberrations in the 18 samples were detected with the TruSight Tumor 170 kit (#20028821, Illumina, San Diego, CA, USA), a panel that targets 170 genes frequently mutated in cancer. For this purpose, both DNA (to detect single nucleotide variants (SNVs), copy number variants (CNVs) and indels) and RNA (to detect fusions and splice variants) were used. The DNA input was 120 ng and the RNA input was 85 ng per sample. Molecular aberrations in all samples were successfully described. Libraries were prepared according to the manufacturer's instructions. Although initial steps for DNA and RNA sample handling were different, subsequent steps were identical for both types of input.
Briefly, DNA was sheared with an M-220 Ultrasonicator (Covaris). Total DNA input was 120 ng in a total volume of 52 µL (DNA fragmentation concentration = 2.31 ng/µL). Samples were sheared in microTUBEs-50 AFA Fiber Screw Cap (#520166, Covaris, Woburn, MA, USA) and AFA-Grade Water (#52101, Covaris, Woburn, MA, USA). An M220 Holder XTU (PN 500414) and an M220 holder XTU Insert microtube 50 µL (PN 500488) were used. With respect to the experimental settings, a peak power of 75 W was used, with a duty factor of 15% and 1000 cycles per burst for a total treatment of 360 s per sample at a temperature of 20 • C. Sheared DNA fragments were analyzed with a High Sensitivity DNA kit (#5067-4626, Agilent, Santa Clara, CA, USA) and a Bioanalyzer system (Agilent), to evaluate the peak size of the generated fragments.
RNA samples were retrotranscribed as a first step with a total input of 85 ng per sample, following the manufacturer's instructions.
From this point, both DNA and RNA samples were treated in the same manner. Briefly, an end repair and an A-tailing step to convert the 5 and 3 overhangs into blunt ends were performed. The 3 to 5 exonuclease activity removed the 3 overhangs and the 5 to 3 polymerase activity filled in the 5 overhangs. The 3 ends became A-tailed during this reaction in order to prevent undesired ligations. The ligation of adapters was then performed, followed by the removal of the remaining ligation reagents. Afterwards, indices to identify each library were added to every sample. Once libraries were generated, enrichment steps were performed. A hybridization-capture was performed (twice), in which a pool of oligos specific to 170 genes was hybridized to the generated libraries, which were then captured with biotin probes attached to streptavidin magnetic beads. Once the regions of interest were enriched, amplification procedures were performed. Thereafter, amplified libraries were quantified with the Qubit dsDNA HS Assay kit (#Q32854, Invitrogen, Waltham, MA, USA). A library concentration of at least 3 ng/µL was necessary to ensure an efficient beadbased library normalization. Quality control and library preparation details are provided in Table S1C.
Sequencing was carried out in a NextSeq 500 sequencer with NextSeq 500/550 High Output kit v2.5 reagents, using 300 cycles (High Output kit #20024908, Illumina, San Diego, CA, USA). PhiX control V3 (#150117666, Illumina, San Diego, CA, USA) was used as a sequencing control. An initial data analysis was performed with the TruSight Tumor 170 application in BaseSpace (Illumina). Sample sequencing metrics can be found in Table S1C. Molecular alterations detected for each sample are described in Table S2.

Gene Expression Analysis with the Oncomine Immune Response Research Assay
The expression of 395 genes involved in the tumor-immune system communication was evaluated with the Oncomine Immune Response Research Assay (OIRRA) (#A32928, ThermoFisher Scientific, Waltham, MA, USA). An input of 10 ng of RNA was used for each sample. Briefly, a retro-transcription step was performed with the SuperScript Vilo cDNA Synthesis kit (#11754-250, Invitrogen, Waltham, MA, USA). Libraries were then generated with the automated Ion Chef System (ThermoFisher Scientific) with the following settings: number of groups of primers = 1, amplification cycles of target genes = 17, annealing and extension time = 4 min. Resulting libraries (8 libraries per run) were quantified with the Ion Library TaqMan Quantitation kit (#4468802, Applied Biosystems, Waltham, MA, USA) and diluted to a concentration of 33 pM in a total volume of 25 µL, thereby assuring maximum monoclonality with the used sequencing chip (Ion 530 TM Chip, ThermoFisher Scientific, Waltham, MA, USA). Automated library template and chip loading was subsequently performed with the Ion Chef System. Finally, libraries were sequenced in an Ion S5 System (ThermoFisher Scientific). Among the resulting files, absolute read counts were used to perform subsequent analyses, as described below. Quality control and library preparation details are shown in Table S1D. Raw and normalized gene expression data can be found in Table S3.
Immunohistochemistry scores from all markers analyzed can be found in Table S4. Three comparison groups of samples were selected based on the expression of these markers according to their assigned score: Score 0 (<1% of expression), Score 1 (1-10% of expression) and Score 2 (>10% of expression).
In the case of staining for PD-L1, FFPE sections were stained with anti-PD-L1 22C3 mouse monoclonal primary antibody by utilizing the EnVision FLEX visualization system on a Dako Autostainer Link 48 system with negative control reagents and cell line run controls as described in the PD-L1 IHC 22C3 pharmDx package insert [43].

Bioinformatic Analyses
For the analysis of molecular data, custom scripts were used for annotation and filtering of small variants (SNVs and indels). The following public databases were used for interpretation: gnomAD, Cosmic, ClinVar and an in-house database of allele frequencies from non-cancer patients. We filtered out variants with a minor allele frequency (MAF) ≥0.1% and non-exonic variants. Finally, the Integrative Genomics Viewer (IGV) was used for manual review of the remaining set of variants. CNVs, fusions and splice variants were considered as valid only when reported as high-confident by the TruSight Tumor 170 application (Illumina). Frequently altered genes described in the literature and the most frequently mutated genes in our cohort were represented in the figures shown.
Raw transcriptomic data from the OIRRA (absolute read counts) were normalized with the Variance Stabilizing Transformation (VST) algorithm, using the DESeq2 R package [44]. Following data normalization, tumor clusters and gene clusters were defined with the algorithm 'Consensus Clustering' using the ConsensusClusterPlus R package, through 10,000 and 1000 iterations, respectively [45].
The over-representation analysis (ORA) was performed based on gene categories defined by ThermoFisher for the OIRRA. Briefly, a comparison between the obtained percentage of genes of a certain category versus the expected percentage was performed. ORA pie charts were constructed using the Caroline R package [46]. Statistical analyses of over/under-representation were performed with a Fisher's exact test.
Differential gene expression analysis between the two LCC clusters of tumors was evaluated using the limma R package and its "voom" function specifically designed for RNA-Seq data. Genes with a false discovery rate (FDR) ≤ 0.05 and a Log 2 Fold Change ≥ |1| were considered as significant and used in subsequent analyses.
Since the number of significant molecules was extensive, we selected our set of potential biomarkers based on the following criteria: (1) Strict statistical significance: only genes differentially expressed (FDR ≤ 0.01 and a log 2 Fold Change ≥ |1|) were considered; (2) Biological role related to immunomodulation or tumor progression; (3) Genes suggested as predictive of response to immunotherapy as described in previous literature.

Statistical Analyses
Statistical analyses were performed with GraphPad Prism. Chi Square/Fisher's exact test were used for categorical data and log-rank test was used for survival analyses. Multivariant analyses were performed with SPSS using a Chi Square test.

Experimental Design
The objective of the study was to perform a complete description of the molecular and immune profiles of resected samples (Figure 1). The detailed molecular characteristics of the cohort were described with a targeted DNA-Seq panel (TruSight Tumor 170, TST170) that detects molecular aberrations in 170 genes, including SNVs, CNVs, insertions, deletions, amplifications, rearrangements, fusions and splice variants. The immune features of the cohort were evaluated with a targeted RNA-Seq panel which evaluated the expression of 8 of 20 395 genes involved in the tumor-immune system communication (OIRRA). In parallel, we evaluated the tumor immune infiltrate level of B cells, T cells and macrophages, and the expression of the immunomodulatory protein PD-L1 by immunohistochemistry, which is the only currently approved predictive biomarker for response to immunotherapy in NSCLC. This multiparametric clinical, molecular and immune data were evaluated with integrative bioinformatic pipelines in order to define novel subgroups of LCC and to identify new candidate predictive biomarkers for response to immunotherapy (Figure 1).

Description of the LCC Cohort
Demographic and clinical characteristics of the patient cohort can be found in Table 1. Briefly, most of the LCC patients in the cohort were males (16/18) and diagnosed with chronic obstructive pulmonary disease (COPD, 10/18). All the LCC tumors were earlystage and half of them were stage II. Interestingly, all patients were current or former smokers, with no never-smokers in the cohort. Some tumors had molecular aberrations in frequently altered genes in NSCLC such as ALK, EGFR, KRAS or ROS1, as determined with the TruSight Tumor 170 sequencing panel in-house. However, it is important to emphasize that all these genetic alterations were of uncertain significance in our bioinformatic analysis with the Varsome database, with the exception of a G12C mutation in KRAS, which may lead to treatment adaptation, considering the new KRAS inhibitors under investigation [47] and the potential utility of targeted therapies in LCC as proposed by previous research [8]. These specific molecular alterations can be found in Table S2. Most patients (15/18) were treatment-naïve at the time of surgery, with only one case confirmed for previous treatment. Half of the patients relapsed and the majority of the patients died by the end of the study period ( Figure 1, Table 1).

Evaluation of the Level of Infiltration of Immune Cell Populations and the Expression of PD-L1 Detected by Immunohistochemistry
To evaluate the infiltration of various immune cell populations and to identify expression of the immunomodulatory protein PD-L1 we stained five slides from each tumor sample with the markers for the following: CD4 + T cells, CD8 + T cells, CD20 + B cells, CD68 + macrophage/monocytic populations and PD-L1. Each immune cell marker was evaluated with the following score: 0 (<1% of expression, negative), 1 (1-10%, low expression) and 2 (>10%, high expression) ( Figure 2A). PD-L1 was evaluated as negative (<1%) or positive (≥1%), with the percentage of expression specified ( Figure 2B). Taking the whole cohort (N = 18) into consideration, the most abundant infiltrating immune cells were macrophages (100% of tumors with a score of 1 or above), followed by the T CD8 + cells (83% of tumors with a score of 1 or more) and B cells (78% of tumors with a score of 1 or more). T CD4 + cells were the least infiltrating immune cell population, with only a 22% of tumors with a score of at least 1. Only a small fraction of patients (17%) was positive for PD-L1 expression based on a positivity cutoff of 1% (Table S4).
We also examined the possible correlation between the PD-L1 expression and the level of infiltration of the evaluated immune cell populations. The onlythree patients who were positive for PD-L1 expression were also positive for T CD8 + infiltration and negative Taking the whole cohort (N = 18) into consideration, the most abundant infiltrating immune cells were macrophages (100% of tumors with a score of 1 or above), followed by the T CD8 + cells (83% of tumors with a score of 1 or more) and B cells (78% of tumors with a score of 1 or more). T CD4 + cells were the least infiltrating immune cell population, with only a 22% of tumors with a score of at least 1. Only a small fraction of patients (17%) was positive for PD-L1 expression based on a positivity cutoff of 1% (Table S4).
We also examined the possible correlation between the PD-L1 expression and the level of infiltration of the evaluated immune cell populations. The onlythree patients who were positive for PD-L1 expression were also positive for T CD8 + infiltration and negative for T CD4 + infiltration, which suggests a connection between an ongoing immune response with the tumor cell immune evasion mechanisms. Nonetheless, these observations need to be confirmed in larger cohorts of LCC with more PD-L1 + tumors. No clear relationship was found between the PD-L1 expression and the infiltration of B cells (CD20 + ) or macrophages (CD68 + ). These results can be found in Figure S1.
The whole cohort was classified as LCC according to the 2004 WHO guidelines, since all tumors lacked morphological differentiation. Of the eighteen cases, nine did not express any adenocarcinoma or squamous cell carcinoma markers and were therefore re-classified as LCC-Null, following the 2015 WHO guidelines [3]. Four tumors were positive for TTF-1 expression and were re-classified as adenocarcinomas. Five tumors were positive for p40 and were re-classified as squamous cell carcinomas. However, the frontiers of the LCC histology are yet to be well established, beyond a pathological perspective. Therefore, we followed the approach described in a recently published report [8] and reclassified these tumors as LCC-ADC or LCC-SCC, respectively. TTF-1 and p40 results can be found in Table S4.

Definition of Novel LCC Tumor Subgroups by Consensus Clustering with Specific Molecular, Immune and Clinical Features
We employed a publicly available algorithm named 'Consensus Clustering' that allowed the clustering of patients and genes based on expression data obtained from the OIRRA assay, i.e., according to tumor-immune system crosstalk. Briefly, this method performs a hierarchical clustering that distributes the information optimally. Importantly, the output classification is highly reproducible, contrary to most methods based on hierarchical clustering. As a result, we defined two novel LCC tumor subgroups and three clusters of genes ( Figure 3A). These groups were robustly obtained through 10,000 iterations for patients and 1000 iterations for genes. By assessing the functional characteristics of the three different gene clusters, we identified a transcriptional orientation of the patient subgroups to be either more immunogenic (named 'pro-immunogenic tumors') or less immunogenic (named 'pro-tumorigenic tumors') ( Figure 3A). This was addressed through an ORA ( Figure 3B), in which gene clusters A and C showed an over-representation of gene functions related to immune processes, such as lymphocyte infiltration, type II interferon signaling, antigen processing, TCR co-expression, cytokine signaling, B cell markers and innate immune responses. Gene cluster B, on the other hand, showed over-represented genes related to tumor markers and antigens, adhesion and migration ( Figure 3B). The functional categories that reached a statistically significant over or under-representation are shown in Figure 3B. The pro-tumorigenic group was characterized by an upregulation of gene cluster B and a downregulation of gene cluster A, the opposite to that of the proimmunogenic group. The gene expression profile of cluster C was not different between groups. Representative genes of each cluster and selected OIRRA gene categories can be found in Figure S2. A complete list of the OIRRA genes, their cluster and their gene category can be found in Table S5.  Each colored sector represents a functional category whose size illustrates over-or under-representation of genes in that cluster within a specific function. Over-represented functions allow patient groups to be classified as pro-tumorigenic or pro-immunogenic according to their tumor immunogenicity. The over or under-representation of each functional category was statistically evaluated with a Fisher's exact test. * (p < 0.05), ** (p < 0.01), *** (p < 0.001); (C) Immune cell population distribution in the proimmunogenic and pro-tumorigenic LCC subgroups.
We compared the clinical, molecular and immune annotations between the proimmunogenic and the pro-tumorigenic subgroups ( Figure 3A), but none of these parameters were significantly different between the two groups, which is likely due to the limited number of patients in the study. However, certain annotations suggested that molecular and immune events were occurring predominantly in one subgroup or the other. While the pro-tumorigenic group accumulated patients with alterations in RET, PIK3CA, PIK3CB and FGF10 and a higher proportion of relapsed patients, the pro-immunogenic group encompassed tumors with an alteration in ALK, ARID1A, MET, KRAS or FGF3 and with a high level of T CD4 + or B cell infiltration (above 10%) ( Figure 3A). In addition, the pro-immunogenic group included more patients in which all of immune cell populations evaluated were present, which was more apparent for CD8 + T cells and CD20 + B cells ( Figure 3C and Table 2). Importantly, no bias was observed with respect to the subtypes of LCC (LCC-ADC, LCC-SCC and LCC-Null) since they were not significantly associated with any specific subgroup and showed an even distribution. Cases with more than 1% of positive staining were considered to be positive. Avg. Score is the average score value in each subgroup for each marker. Score 0 < 1%, Score 1 ≥ 1%; <10%, Score 2 ≥ 10%. NA stands for "Not Available".

Analysis of Differentially Expressed Genes between the Pro-Immunogenic and Pro-Tumorigenic LCC Subgroups
Once tumors had been classified as pro-immunogenic or pro-tumorigenic based on the transcriptomic profile of the tumor-immune system crosstalk, we performed differential gene expression analyses to identify which genes characterized each subgroup according to their significant up-or down-regulation. For this purpose, we applied the Limma-Voom algorithm and identified 152 genes with significantly different expression between the pro-immunogenic and the pro-tumorigenic subgroups based on an FDR ≤ 0.05 and a log 2 Fold Change ≥ |1| as statistical cut-offs. Of these 152 genes, 115 were significantly upregulated in the pro-immunogenic tumors, while 37 were significantly upregulated in the pro-tumorigenic tumors ( Figure 4A).

Clinical Relevance of the Defined LCC Subgroups
We addressed whether the transcriptomic profile of the defined LCC subgroups was relevant for the clinical outcome regardless of the treatment received. Therefore, we sought to identify any possible associations between the tumor subgroups and progression-free survival (PFS) ( Figure 5A) and overall survival (OS) ( Figure 5B) data, but no significant differences were observed. However, despite the reduced number of patients, the analysis suggested a clear trend towards a greater PFS and OS in the pro-immunogenic group, as might be expected.
A PFS analysis according to the stage of the tumors and their transcriptomic subgroup suggested that these factors had an impact on relapse. Pro-tumorigenic tumors with stage III were those with a lower PFS ( Figure S3).

Transcriptional Signatures of the Defined LCC Subgroups and Potential Implications in Response to Immunotherapy
We subsequently identified a transcriptomic signature with a reduced number of genes that could be used to classify patients in the defined pro-immunogenic and protumorigenic subgroups in order to translate our findings into an experimentally applicable test with possible clinical interest. This signature consists of 20 differentially expressed genes between the subgroups with known roles in immune response and tumor progression ( Figure 5C,D). Genes in this selection that are distinctive of pro-immunogenic tumors were involved in adaptive immune response stimulation, apoptosis, immune chemotaxis, immune response regulation and iron metabolism. Conversely, genes characteristic of the pro-tumorigenic tumors were related to cell cycle regulation, DNA replication and repair and iron metabolism. In summary, genes related to pro-immunogenic tumors favored immune functions, while those related to pro-tumorigenic tumors supported tumoral progression ( Figure 5C,D).
The final objective of this study was to identify predictive biomarkers of response to immunotherapy in LCC. Ideally, the identification of these biomarkers should be performed in an advanced stage LCC cohort treated with immunotherapy. However, this approach presents two main handicaps in the current lung cancer clinical practice. First, the number of advanced LCC patients undergoing immunotherapy treatment is quite reduced. Second, for most of these cases the amount and quality of available tumor tissue would not allow a comprehensive description of LCC such as the one provided in this work. In this context, findings identified in early stage LCC tumors could be useful in order to overcome these obstacles.
Therefore, we propose that our defined transcriptional signatures could have a potential utility in predicting the response to immunotherapy in LCC tumors. Considering the relevance of the immune landscape for the potential of a NSCLC patient to respond to immunotherapies, we speculate that patients falling into our newly defined pro-immunogenic group of LCC tumors might be better candidates to receive these treatments, whereas patients classified as pro-tumorigenic might not benefit so much.
Based on this selection of genes, pro-immunogenic and pro-tumorigenic tumors could be clearly identified and showed an opposing transcriptomic profile ( Figure 5E). This set of biomarkers may allow an accurate LCC tumor classification to be made and a prediction of response to immunotherapy that could be eventually implemented in clinical practice. Figure 5. Survival analysis of the defined LCC subgroups and selection of candidate predictive biomarkers for response to immunotherapy. Kaplan-Meier analysis of progression free survival (PFS) (A); and overall survival (OS) (B) between pro-immunogenic (red) and pro-tumorigenic (blue) tumors. OS and PFS data could not be retrieved from all 18 LCC patients. Statistical analyses were performed with a log-rank (Mantel-Cox) test for which a p-value below 0.05 was considered Figure 5. Survival analysis of the defined LCC subgroups and selection of candidate predictive biomarkers for response to immunotherapy. Kaplan-Meier analysis of progression free survival (PFS) (A); and overall survival (OS) (B) between pro-immunogenic (red) and pro-tumorigenic (blue) tumors. OS and PFS data could not be retrieved from all 18 LCC patients. Statistical analyses were performed with a log-rank (Mantel-Cox) test for which a p-value below 0.05 was considered significant; (C) Selected candidate predictive biomarkers of response to immunotherapy characteristic of pro-immunogenic LCC tumors. FDR = False Discovery Rate; (D) Selected candidate predictive biomarkers of response to immunotherapy characteristic of pro-tumorigenic LCC tumors; (E) Heatmap of expression of the selected biomarkers, grouped by clusters, in pro-tumorigenic tumors (left) and pro-immunogenic tumors (right).

Discussion
NSCLC management has changed markedly with recent advances in immunotherapy. However, there is still room for improvement and a clear need to accurately identify immunotherapy responders through the definition of better predictive biomarkers. Most research in this area is focused on adenocarcinomas and squamous cell carcinomas, the two main histologies of NSCLC [25,48,49]. However, an exhaustive description of Large Cell Carcinoma, which shows a poor prognosis and a more aggressive behavior, is still lacking [6,9]. An urgent unmet need is for comprehensive studies to be performed on LCC in order to translate the discoveries of predictive biomarkers for response to immunotherapy into clinical benefits for this subset of patients.
In this work, we established a cohort of 18 clinically annotated, early-stage LCC tumors, collected from 2002 to 2011 in our hospital, and described them thoroughly from a multiparametric perspective, taking into account molecular, immune and clinical characteristics. Our main goal was to define novel subgroups of LCC tumors with distinctive molecular, immune and clinical features and to identify specific biomarkers of the cited subgroups that could improve the prediction of response to immunotherapy with checkpoint blockers.
In terms of immune infiltration, our cohort showed high positivity for macrophages (CD68 + ), T CD8 + cells and B cells. In contrast, the cases positive for T CD4 + cells or PD-L1 were less frequent. With respect to PD-L1 expression, only 17% of tumors were positive, which was clearly below previous descriptions of independent cohorts of resected LCC [8]. PD-L1 evaluation has several associated pitfalls [21,31] that are currently being addressed [32,33]. For example, PD-L1 is expressed by several cell populations and compartments and it can be altered by therapy regimes or tumor heterogeneity, among others. These limitations may explain, at least in part, the differences observed. When evaluating the correlation between PD-L1 expression and the infiltration of immune cells, we observed that all PD-L1 positive tumors (N = 3) were positive for T CD8 + infiltration and negative for T CD4 + infiltration. However, these observations require further validation in larger LCC cohorts with more PD-L1 + tumors.
Taking into consideration the histological markers evaluated, nine tumor samples did not express any adenocarcinoma or squamous cell carcinoma markers and were thus re-classified as LCC-Null according to the 2015 WHO guidelines. Four tumors were positive for TTF-1 expression and five were positive for p40 and were therefore re-classified as LCC-ADC or LCC-SCC, respectively, following the approach of a previously published report [8]. Importantly, FFPE blocks were appropriately managed and all the tumors in our cohort were thoroughly examined to rule out biphasic characteristics, ensuring a sole histology. Beyond a pathological perspective, the frontiers of LCC classification are diffuse and yet to be properly defined as discussed in the literature. For example, the cited report proposed that the LCC-ADC subtype is an entity with different phenotypic and genetic characteristics with respect to classic ADC [8]. It also discussed that the LCC-Null subtype could be considered as a TTF-1 negative adenocarcinoma [8]. Other works suggest that the LCC histology could eventually become extinct with the recent re-classifications of these tumors into poorly differentiated forms of the major subtypes of NSCLC [9]. These key questions are yet to be solved and will likely imply new approaches for the clinical management of these subtypes of patients.
We defined two novel subgroups of LCC tumors based on the expression of genes involved in the tumor-immune system communication and subsequently compared their clinical, molecular and immune characteristics. Importantly, the LCC subtypes (LCC-ADC, LCC-SCC and LCC-Null) did not imply a bias as they were not statistically associated with any of the defined subgroups. Tumors defined as pro-tumorigenic showed a trend to accumulate certain molecular aberrations and were found in a higher proportion of relapsed patients. On the other hand, tumors defined as pro-immunogenic had a higher positivity for all immune markers (except for CD68 + macrophages), as evaluated by immunohistochemistry, especially with regards to T CD8 + cells, and encompassed all tumors with a high infiltration of T CD4 + and B cells. Interestingly, one report showed that a higher infiltration of B cells was associated with a better response to immunotherapy [50]. Of note, the pro-immunogenic subgroup exhibited unique molecular alterations, gathering all tumors with alterations in ALK, ARID1A, MET, KRAS or FGF3. To this end, there are no targeted therapies directed to all KRAS mutations, the most common genetic alteration in NSCLC, but these tumors are considered good candidates for immunotherapies [51].
In agreement with our previous findings, pro-tumorigenic tumors show upregulated pro-tumoral genes involved in processes such as proliferation, adhesion or migration. In contrast, genes with increased expression in pro-immunogenic tumors were associated with immune functions like antigen processing and presentation, immune signaling and stimulation or lymphocyte activation and infiltration, among others. One study reached similar conclusions when analyzing immune hot and immune cold NSCLC tumors characterized by specific molecular, immune and clinical features. Immune hot tumors showed higher T CD8 + infiltration and the upregulation of genes related to pro-immune functions such as T cell trafficking and cytotoxicity. Immune cold tumors had lower levels of T CD8 + infiltration and lower expression of the cited immune functions [48]. Of note, the cited article reported that certain immune hot-related gene expression signatures were associated with the response to immunotherapy. Nonetheless, recent studies suggest that classifying tumors into hot and cold subsets according to the immune landscape could be a simplistic perspective, and that additional factors should be evaluated [52].
Taken together, these observations suggest that an association exists between the molecular genotype, the immune phenotype and the clinical features of tumors, as previous research reported [48,49,53]. The association of infiltrating immune cells with longer overall survival has been widely demonstrated in cancers with different histological features and anatomical location [42]. When evaluating the possible clinical implications of the LCC pro-immunogenic and pro-tumorigenic subgroups, pro-immunogenic tumors showed a trend towards a better survival, both in terms of PFS and OS, the full extent of which was limited by the cohort size. An additional analysis of PFS, according to the stage of the tumors and their transcriptomic subgroup, suggested these two factors had an impact on relapse.
We identified a transcriptional signature composed of 20 differentially expressed genes between the pro-immunogenic and the pro-tumorigenic LCC tumors that could be used to classify LCC patients into the defined subgroups. Those genes characteristic of the pro-immunogenic tumors were mainly involved in promoting immune responses while those distinctive of pro-tumorigenic tumors favored tumor progression and development.
The ultimate goal of our study was to identify predictive biomarkers of response to immunotherapy in LCC. As previously discussed, these biomarkers should be ideally defined in advanced stage LCC patients treated with immunotherapy. However, the amount of LCC tumors following these treatment regimens that could undergo a comprehensive description successfully is scarce. In this context, thorough descriptions of early stage LCC samples could provide a solid foundation to extrapolate findings into the advanced disease setting.
Therefore, we propose that our defined transcriptional signatures could potentially improve the prediction of response to immunotherapy in LCC. It could be hypothesized that pro-immunogenic tumors are better candidates to receive immunotherapy, as from both a transcriptomic and immune infiltrate perspective they seem more prone to stimulate immune responses. To this end, it has been demonstrated that a previous ongoing immune response is crucial in order to respond to immunotherapy [42,52].
The potential clinical implications of some of the genes of our transcriptional signatures have already been suggested in previous studies. For example, an elevated expression of FCRLA in lung adenocarcinoma, among other genes, was associated with a better response to immunotherapy [50]. Expression of CD137 (TNFRSF9) and PSBM9 in NSCLC tumors was increased in immunotherapy responders [49], while an elevated expression of PSBM9 was associated with higher PFS. Interestingly, PSBM9 was not selected under our criteria but almost reached statistical significance in our study (FDR = 0.01 and log 2 Fold Change = 0.96, upregulated in the pro-immunogenic subgroup) which suggests that our biomarker filtering criteria could perhaps be reassessed in order to identify additional genes with potential as biomarkers. In contrast, previous research proposed that a highly or poorly proliferative tumor microenvironment was associated with a worse response to immunotherapy in NSCLC compared to moderately proliferative tumors. This proliferative status was assessed by the evaluation of expression of a 10-gene signature, including some of our pro-tumorigenic biomarkers such as CCNB2, KIAA0101, MAD2L1 and TOP2A [54]. Of note, MKI67, a proliferation marker, was also significantly upregulated in the pro-tumorigenic tumors of our study, supporting the proliferative-prone status of this subset. The association of our selected markers and potential responses to immunotherapy is not exclusive of NSCLC. It was described, for example, that melanoma responders to antiPD-1 therapy had significantly increased numbers of CD69 + NK cells [55]. Another study reported that CRTAM was the most upregulated gene in responders to immunotherapy with diffuse large B-cell lymphoma [56].
Precision oncology is defined as the designation of the right treatment for the right patient. In the context of NSCLC, great efforts are ongoing to provide more accurate and individualized descriptions of adenocarcinomas and squamous cell carcinomas as these are the main pathology in approximately 90% of NSCLC patients. However, current approaches might be omitting less frequent histologies such as LCC. Our study reports that comprehensively describing tumors from LCC patients can result in the identification of key molecular and immune characteristics that could be exploited to increase the benefit obtained from immunotherapy. Nevertheless, more robust data will require similar research to be performed in larger patient cohorts. Furthermore, the predictive value of the selected biomarkers needs to be validated in patients who have previously received immunotherapy. Previously mentioned research has demonstrated the remarkable value of multiparametric studies to improve immunotherapy outcomes. It must be ensured that all tumor types, and ultimately all patients, benefit from these new research approaches that are paving the way towards personalized medicine.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jcm11061500/s1, Figure S1: PD-L1 expression and level of infiltration of the evaluated immune cell populations, Figure S2: Representative genes of each cluster and respective OIRRA gene categories, Figure S3: Progression-free survival analysis of the LCC tumors according to their stage and transcriptomic subgroup, Table S1: Quality control metrics for DNA, RNA and sequencing libraries, Table S2: Molecular alterations detected with TruSight Tumor  170, Table S3: Expression data evaluated by Oncomine Immune Response Research Assay, Table S4: Level of infiltration of several immune cell populations, PD-L1 expression and histological markers evaluated by immunohistochemistry, Table S5: Analysis of differential expression profiles between the pro-immunogenic and pro-tumorigenic LCC tumors.