The Clinically Actionable Molecular Profile of Early versus Late-Stage Non-Small Cell Lung Cancer, an Individual Age and Sex Propensity-Matched Pair Analysis

Background: Despite meticulous surgery for non-small cell lung cancer (NSCLC), relapse is as high as 70% at 5 years. Many institutions do not conduct reflexive molecular testing on early stage specimens, although targeted gene therapy may extend life by years in the event of recurrence. This ultimately delays definitive treatment with additional biopsy risking suboptimal tissue acquisition and quality for molecular testing. Objective: To compare molecular profiles of genetic alterations in early and late NSCLC to provide evidence that reflexive molecular testing provides clinically valuable information. Methods: A single-center propensity matched retrospective analysis was conducted using prospectively collected data. Adults with early and late-stage NSCLC had tissue subject to targeted panel-based NGS. Frequencies of putative drivers were compared, with 1:3 matching on the propensity score; p < 0.05 deemed statistically significant. Results: In total, 635 NSCLC patients underwent NGS (59 early, 576 late); 276 (43.5%) females; age 70.9 (±10.2) years; never smokers 140 (22.0%); 527 (83.0%) adenocarcinomas. Unadjusted frequencies of EGFR mutations were higher in the early cohort (30% vs. 18%). Following adjustment for sex and smoking status, similar frequencies for both early and late NSCLC were observed for variants in EGFR, KRAS, ALK, MET, and ROS1. Conclusion: The frequency of clinically actionable variants in early and late-stage NSCLC was found to be similar, providing evidence that molecular profiling should be performed on surgical specimens. This pre-determined profile is essential to avoid treatment delay for patients who will derive clinical benefit from targeted systemic therapy, in the high likelihood of subsequent relapse.


Background
Each year lung cancer kills more people than colon, breast, and prostate cancers combined [1]. In fact, 26% of all cancer-related deaths are attributable to lung cancer [1]. The World Health Organization (WHO) classifies lung cancer into two broad categories on the basis of tumor biology, treatment, and prognosis: non-small cell lung cancer (NSCLC) and small cell lung cancer (SCLC) [2,3]. The only patients with a prospect of being cured are those with early (stage I/II/IIIA) NSCLC who are amenable to surgical resection [4]. Molecular testing with next-generation sequencing (NGS) of resected early-stage NSCLC for clinically actionable genetic alterations is not routinely conducted in many institutions following standard of care curative intent pulmonary resection. Adjuvant targeted systemic therapy is currently only recently reserved for select cases of EGFR (epidermal growth factor receptor) sensitizing mutations based on demonstrated benefit in 36-month relapse free survival [5].
Further targeted systemic therapy is limited to non-resectable advanced stage patients with tumor genetic alterations such as ALK (anaplastic lymphoma kinase), MET (Met protooncogene), RET Ret proto-oncogene), and ROS1 (c-ROS proto-oncogene 1) [6][7][8][9][10][11][12][13][14][15]. Novel targeted therapeutics are also under development for KRAS G12C (Kirsten rat sarcoma viral oncogene homolog) [11,16,17]. Despite undergoing curative intent surgery for early-stage disease, up to 70% of patients with early-stage NSCLC at presentation still suffer recurrent metastatic disease [1]. NSCLC patients with recurrent metastatic disease are subsequently eligible for targeted systemic therapy if the tumor expresses clinically actionable genetic alterations such as EGFR, ALK, MET, RET, and ROS1. They are furthermore potentially eligible for participation in clinical trials involving KRAS G12C targeted therapeutics. Prior studies of NGS in early-stage lung cancer have been conducted demonstrating that putative driver mutations present at the time of surgical resection portend prognosis [18]. Additionally, although an element of subclonal heterogeneity may be present at the time of recurrent lung cancer following initial curative intent surgery, the dominant clinically relevant clonal "oncogenic driver" identified at the time of surgery was also present at the time of recurrence conferring sensitivity to targeted therapy [19]. At the time of recurrent disease, technical challenges often exist with respect to the ability to biopsy and acquire sufficient quantities of malignant tissue to undergo robust molecular profiling with NGS [15]. We hypothesized that both early and late-stage NSCLC display similar odds of driver mutation and fusion molecular alteration profiles in our population regardless of disease stage at presentation. This study would provide clinical rationale for routine targeted panel next-generation sequencing testing of all resected early-stage NSCLC formalin fixed paraffin embedded (FFPE) surgical specimens, providing crucial knowledge that could alter postoperative treatment options for this life-threatening disease.

Objective
The objective of the current study is to compare the molecular profiles of clinically actionable genetic alterations in early and late non-small cell lung cancer in our cohort.

Study Design and Setting
A propensity matched cohort study was conducted at the BC Cancer (Vancouver Cancer Centre or VCC) and the Vancouver General Hospital (VGH) Division of Thoracic Surgery. Tumor molecular data derived during the time period August 2019 to August 2021. This study, #H18-03295-A009, was approved by the BC Cancer Institutional research ethics board.

Participants and Data Sources
Prospectively collected molecular, demographic, and smoking status variables for consecutive cases of early and late-stage NSCLC were retrospectively analyzed. The early-stage NSCLC dataset was derived from that collected as part of a separate pilot study database assessing the use of targeted cancer gene panels in early-stage NSCLC patients. Early-stage NSCLC patients were eligible for targeted molecular testing as part of this study following surgical resection for early stage I/II NSCLC (American Joint Commission on Cancer [AJCC] Staging 8th edition) [4] if their tumor was ≥10 mm in maximal diameter and solid in morphology on preoperative CT chest. The early-stage NSCLC patient group additionally underwent standard of case physiologic assessment including detailed pulmonary function testing and clinical staging with fluorodeoxyglucose positron emission tomography (FDG-PET) scan [15]. A selective approach to preoperative invasive mediastinal staging was undertaken. The late-stage NSCLC variables were derived from the Cancer Genetics & Genomics Laboratory (CGL) testing database of patients who presented to BC Cancer for management on unresectable late-stage NSCLC.

Primary Outcomes
The primary outcomes of interest were frequency of clinically actionable lung tumor genetic alterations in the early and late-stage cohorts. Potential confounding variables of interest identified from the literature included sex, age, and tobacco smoking status (current, former, never) [15].

Targeted NGS Panels for NSCLC Genetic Alterations
For both the early and late-stage cohorts, the determination of clinically actionable molecular alteration status was conducted using formalin fixed paraffin embedded (FFPE) tumor tissue.
Early stage surgically resected NSCLC FFPE slides were assessed for tumor content by a thoracic pathologist and sent to the Canexia Health laboratory for DNA/RNA extractions and sequencing. DNA was extracted using the Qiagen Generead kit and UNG treatment using the manufacturer's instructions. RNA was extracted using the Promega RNA FFPE kit using the Promega Maxwell RSC instrument. The Canexia Health Find ItTM assay, a FPPE solid tumor DNA-based assay, and the RNA-based Fusions assay was performed on the early stage surgically resected NSCLC FFPE tumor tissue specimens. The Find It assay is an amplicon-based targeted multiplex NGS test that is focused on clinically actionable hotspot gene content for detection of single nucleotide variants (SNVs) and insertions and deletions (indels). The fusions assay is an RNA-based gene partner agnostic multiplex NGS panel that can identify clinically relevant structural rearrangements or gene fusions. In brief, the Find It assay amplifies FFPE DNA in 2-3 separate primer pools using maximum 25 ng DNA in each pool. PCR template products were then pooled, purified, and amplified with Nextera XT Index kit V2 adapters or IDT for Illumina UDIs (unique dual indexes) for sequencing using the Illumina MiSeq v2 300 cycle kits. The in-house developed Canexia Health cloud-based bioinformatics pipeline uses BWA to align to the human reference genome GRCh37/hg19, undergoes multiple data filtering and QC, then utilizes artificial intelligence models trained to identify SNVs to variant allele frequencies (VAFs) of <1%. Post alignment, indels were analyzed using Strelka [20]. The fusions assay briefly uses reverse transcription for the conversion of FFPE derived RNA to cDNA. The cDNA was then subjected to amplification, ligation, then PCR for targeted amplification. The libraries were amplified using Illumina UDI adapters, purified and sequenced using the Illumina MiSeq v2 300 cycle kits. The in-house developed fusions analysis pipeline and algorithm identifies total unique fusion reads to identify high, medium, and low confidence fusion events. Immunohistochemistry was used to orthogonally validate the ALK gene fusion event.
The DNA-based hybrid-capture multiplex NGS assay ("oncopanel") from the Cancer Genetics & Genomic Laboratory (CGL) at BC Cancer was utilized for the late-stage NSCLC cohort. Genomic DNA was extracted with an automated system (Promega Maxwell) followed by FFPE repair, ligation-based library construction, PCR amplification, hybridization capture, and sequencing on a HiSeq2500 platform. Single-strand consensus sequences are generated from UMI-indexed reads using fgbio and aligned to the GRCh37 human genome reference using BWA. Variant calling of DNA mutations and insertions/deletions (INDELs) was performed using samtools and VarScan2. Annotation and filtering of variants is performed with Agilent's Alissa Interpret platform. For gene fusions in the late-stage cohort, immunohistochemistry was employed to determine aberrant protein expression of ALK, RET, and ROS1 status from matched FFPE slides.

Statistical Methods
Continuous variables were summarized by mean and standard deviation and analyzed using the Student's t-test. Categorical variables were expressed as frequencies and percentages and compared by Chi-square or Fisher's exact tests if appropriate. Single-factor and multi-factor analyses in relation to outcomes (e.g., EGFR) required the use of the logistic regression.
A propensity-matched comparison was conducted to control for potentially confounding variables. Age, sex, smoking status, and tumor characteristics were used in a logistic regression model to generate a propensity score for each patient with early stage of NSCLC or late stage of NSCLC. The matched cohort was derived using 1:3 matching with a maximum allowable absolute difference between the propensity scores of 0.20. The type of matching optimization was by closeness. The quality of the matching was assessed by using the standardized mean difference as well. A robust variance estimator was used to account for the clustering within matched sets when using a logistic regression model to compare the variables or to regress the outcomes on the stage of NSCLC.
The conventional level of statistical significance (p < 0.05) is used throughout the study as an indicator of a potential effect. All tests were two-sided. Additionally, all statistical analyses were performed with SAS software version 9.4 (SAS Institute, Cary, NC, USA).

Results
The final study cohort included 635 NSCLC patient samples (59 early stage and 576 latestage) with targeted panel NGS-based molecular profiling for tumor mutations and fusions. The early-stage group included 22 (37.3%) females, 21 (35.6%) never smokers, and 49 (83.1%) adenocarcinoma histology, with a mean age of 68.0 (±10.3) years. The late-stage group was composed of 254 (44.1%) females, 119 (20.7%) never smokers, 478 (83.0%) adenocarcinomas, and a mean age of 71.2 (±10.2) years. A total of 17 late-stage patients missing smoking status data were excluded from the final analysis. Of the major NSCLC histologic WHO subgroups, we reported 527 (83.0%) adenocarcinomas, 19 (3.0%) squamous cell carcinoma, and 89 (14.0%) other pulmonary carcinomas such as large cell and adenosquamous carcinoma. Baseline characteristics by stage of NSCLC among all patients are depicted in Table 1. Using the propensity-matched comparison method, we identified a total of 53 out of the 59 patients from the early-stage group that matched with 159 of the 576 patients from the late-stage group. A total of 212 patients in the matched cohort were obtained. Baseline characteristics by stage of NSCLC among matched patients are depicted in Table 2. Molecular alteration outcomes by NSCLC stage group are presented in Table 3 for all patients and Table 4 for the propensity-matched patients.      Table 3). In the 576 sample late-stage lung cohort, the most common mutations identified were also EGFR (18.2%) and KRAS (39.6%), including KRAS G12C (18.2%), MET exon 14 skipping (2.8%), ERBB2 (2.6%), ALK (2.4%), and ROS1 (0.2%) (Figure 2. late stage oncoprint, Table 3). Both cohorts showed similar frequencies of TP53 mutations with 45.8% in the early-stage cohort and 51.4% in the late-stage cohort. Mutations in PIK3CA (10.2% early stage, 3.5% late-stage) and BRAF (1.7% early stage, 6.8% late-stage) were also identified in both the early and late-stage cohorts at varying frequencies. Table 3). In the 576 sample late-stage lung cohort, the most common mutations identified were also EGFR (18.2%) and KRAS (39.6%), including KRAS G12C (18.2%), MET exon 14 skipping (2.8%), ERBB2 (2.6%), ALK (2.4%), and ROS1 (0.2%) (Figure 2. late stage oncoprint, Table 3). Both cohorts showed similar frequencies of TP53 mutations with 45.8% in the early-stage cohort and 51.4% in the late-stage cohort. Mutations in PIK3CA (10.2% early stage, 3.5% late-stage) and BRAF (1.7% early stage, 6.8% late-stage) were also identified in both the early and late-stage cohorts at varying frequencies.  Unadjusted analysis revealed a significantly higher frequency of any EGFR mutations in the early-stage group compared to the late-stage group (30.5% versus 18.2%; p = 0.023) ( Table 3). However, after matching on the propensity score for sex and smoking status, no significant difference in the EGFR mutation frequency was observed (32.1% versus 28.3%; p = 0.65) ( Table 4). Unadjusted analysis also revealed a significantly higher frequency of uncommon EGFR mutations (tyro-sine kinase inhibitor (TKI) sensitizing mutations such as EGFR G719X, S768I, and L861Q) in the early-stage group compared to the late-stage group (8.5% versus 3.1%; p = 0.05) ( Table 3). With propensity matching, no statistically significant difference remained for uncommon TKI-sensitive EGFR variants (EGFR 9.4% versus 4.4%; p = 0.17) ( Table 4). The late-stage cohort harbored a higher frequency of KRAS variants (28.8% early stage vs. 39.6% late-stage); however, this was not statistically significant. There were also no statistical differences between the cohorts when comparing the frequency of gene mutations found in MET, ERBB2, TP53, ALK, RET, ROS1, PIK3CA, and BRAF.
A regression analysis was performed to summarize of the odds of mutations and fusion molecular alteration profiles comparing early to late-stage NSCLC for all patients and propensity-matched patients (Table 5). This analysis revealed similar odds of tumors with clinically actionable NSCLC molecular profiles between groups, notably for any EGFR mutation with adjustment for age, sex, and smoking status (p = 0.58). In contrast, odds of uncommon EGFR TKI-sensitive variants were increased in the early-stage group (p = 0.025). Unadjusted analysis revealed a significantly higher frequency of any EGFR mutations in the early-stage group compared to the late-stage group (30.5% versus 18.2%; p = 0.023) ( Table 3). However, after matching on the propensity score for sex and smoking status, no significant difference in the EGFR mutation frequency was observed (32.1% versus 28.3%; p = 0.65) ( Table 4). Unadjusted analysis also revealed a significantly higher frequency of uncommon EGFR mutations (tyro-sine kinase inhibitor (TKI) sensitizing mutations such as EGFR G719X, S768I, and L861Q) in the early-stage group compared to the late-stage group (8.5% versus 3.1%; p = 0.05) ( Table 3). With propensity matching, no statistically significant difference remained for uncommon TKI-sensitive EGFR variants (EGFR 9.4% versus 4.4%; p = 0.17) ( Table 4). The late-stage cohort harbored a higher frequency of KRAS variants (28.8% early stage vs. 39.6% late-stage); however, this was not statistically significant. There were also no statistical differences between the cohorts when comparing the frequency of gene mutations found in MET, ERBB2, TP53, ALK, RET, ROS1, PIK3CA, and BRAF.
A regression analysis was performed to summarize of the odds of mutations and fusion molecular alteration profiles comparing early to late-stage NSCLC for all patients and propensity-matched patients (Table 5). This analysis revealed similar odds of tumors with clinically actionable NSCLC molecular profiles between groups, notably for any EGFR mutation with adjustment for age, sex, and smoking status (p = 0.58). In contrast, odds of uncommon EGFR TKI-sensitive variants were increased in the early-stage group (p = 0.025).
* Not compared due to lack of events in the groups. ** Uncommon missense change in exon 21 of EGFR (also known as EGFR L861Q); uncommon EGFR G719X, S768I, and L861Q mutations.

Discussion
Multiple genetic alterations have been identified that impact the selection of systemic therapy to improve survival for patients with recurrent or late-stage NSCLC. Testing of tumor tissue for these alterations is important to identify potentially efficacious targeted therapy for patients, as well as to inform treatment plan thereby avoiding systemic therapeutic options unlikely to provide clinical benefit. Even early-stage lung cancers amenable to surgical resection have a high risk of recurrence, with 5-year survival ranging from 36% to 82% depending on NSCLC stage at presentation [4]. In the event of recurrent NSCLC, adequate tissue for molecular profiling diagnosis may be difficult with small needle biopsies or simply not accessible for biopsy due to anatomic location. This is a major limitation to obtaining important molecular testing information to guide treatment options, as these small samples may provide insufficient substrate for histologic, biomarker, and molecular testing. The very process of the need to re-acquire and await molecular testing results in the setting of disease relapse introduces important delays to timely delivery of systemic therapy.
While some centers are fortunate to have reflexive molecular testing workflows established postoperatively, many institutions still do not conduct reflexive molecular testing that includes an expansive cancer gene panel for resected early-stage NSCLC. Further to this, at institutions where molecular testing is conducted for resected early-stage NSCLC, it is often limited to common sensitizing EGFR variants. However, as determined in this study, uncommon TKI sensitizing EGFR variants such as EGFR G719X, S7681, and L861Q mutations were detected in the early-stage cohort, despite the small group size [21][22][23]. Systematic testing of large samples of FFPE tumor tissue from initial surgical resection for clinically actionable molecular alterations with an expanded targeted cancer gene panelbased approach for NGS is the ideal option to inform therapeutic options that will improve survival in the event of recurrence and avoid needless delay in systemic treatment selection. This is assuming that the frequency of such targetable alterations is similar in those with early and late-stage NSCLC. In this study, we provide the first report comparing the frequency of molecular alterations in early and late NSCLC in propensity-matched cohorts.

Molecular Alteration Frequency Profiles-Early versus Late-Stage NSCLC
The mutations most frequently reported in NSCLC occur in the KRAS and EGFR genes, which are typically mutually exclusive [14]. This was also observed in our matched early and late-stage cohorts, consistent with previous reports. Of interest, our late-stage cohort was noted to have an unadjusted comparatively high KRAS mutation frequency (for all variants) compared to the early-stage cohort. The presence of a KRAS mutation is a known prognostic marker of poor survival compared to patients whose tumors do not express KRAS [15]. The observed frequency of KRAS mutations in the late-stage sub-group is congruent with its status as a marker of aggressive tumor biology, portending poor clinical prognosis. We observed similar frequency of KRAS G12C in both the early and late cohorts. This is particularly relevant as novel systemic therapeutics are under development and USFDA approved to clinically target KRAS G12C [11,16,17].
EGFR mutations in lung adenocarcinoma are known to occur in 10-50% of NSCLC depending on the population, tending to occur more often in females, Asians, and never smokers [14]. These mutations typically cluster around exons 18, 19, 20, and 21. We observed a higher frequency of EGFR in the early-stage cohort compared to the later stage in the unadjusted analysis; however, this was not significantly different in pair-matched multivariable analysis. The higher observed frequency of EGFR in the unadjusted earlystage cohort may be related to the higher frequency of never smokers in this group. Never smoking status has previously been reported to be associated with EGFR expression [14,15]. Likewise, current or former smoker status has been reported to be associated with KRAS mutations in NSCLC.14 We did observe a higher frequency of current or former smokers in the late-stage cohort (64.4% early vs. 79.4% late).
All other frequencies were observed to be similar between the early and late-stage sub-groups. This included similar frequencies of the most common clinically actionable EGFR mutation variants (L858R and exon 19 deletion). These mutations confer sensitivity to targeted therapies with tyrosine kinase inhibitors (TKI), such as Osimertinib [24,25]. Interestingly, frequent EGFR insertion mutations in exon 20 were identified in the cohort. This is a clinically relevant finding, as such mutations may confer resistance to first line TKI systemic therapy [26]. However, the USFDA has recently approved the use of Mobocertinib as an irreversible TKI specifically for activating EGFR exon 20 insertion mutations, as well as the monoclonal antibody Amivantamab [27][28][29].
Genetic alterations in ALK occur in frequencies 2-7% of NSCLC patients, primarily chromosomal inversions or translocations that commonly result in the ALK-EML4 fusion [6,7,14]. ALK gene alterations noted in our report for both the early and late-stage sub-groups occurred at frequencies similar to that reported in the literature. Chromosomal rearrangements in the ROS1 proto-oncogene are reported to occur in approximately 1-2% of NSCLC patients, a frequency similarly observed in our early-stage cohort [14,30,31]. MET exon 14 "skipping" mutations occur in approximately 1-5% of NSCLC [14,32]. Similarly, the findings in our study are concordant with the frequency in the literature for both our early and late-stage sub-groups.
Such individuals with alterations in MET, ALK, and ROS1 may be eligible for treatment with targeted therapy [15,33].
RET rearrangements occur in 1-2% of NSCLC, portending sensitivity to a number of targeted inhibitors [34][35][36]. In the early stage cohort, we observed two patient samples both harboring KIF5B-RET fusions; however, we did not observe any RET-rearranged NSCLC in the late-stage cohort. This finding is highly clinically relevant, as previous reports show RET to be associated with potentially aggressive disease biology [33][34][35][36].

Study Limitations
Our findings are not without limitation, including the retrospective nature of the analysis which inherently introduced information bias despite the prospective nature of database variable procurement. The retrospective design also limits our assessment of tumor clonal heterogeneity both in space and time in that we are unable to map clonal evolution [37]. Furthermore, although we observed interesting molecular alteration profiles in our cohort, the small sample size of the early-stage sub-group in particular limits study power. A larger sample size is necessary to inform on the clinical relevance of the uncommon EGFR variants detected in our early-stage group. Additionally, this work represents the experience of a single large tertiary level regional thoracic surgical center of excellence. As such, the generalizability of these findings to the populations served by other treatment centers remains unknown.

Conclusions
Unadjusted analysis revealed higher frequency of common and uncommon sensitizing EGFR mutations in the early-stage NSCLC group. However, the propensity-matched analysis controlling for sex and smoking status demonstrated a similar frequency and odds of clinically actionable EGFR molecular alterations in early and late-stage NSCLC. This was in addition to the identification of many additional potential therapeutic targets, for example, KRAS G12C, MET exon 14 skipping, and gene fusion events in ALK, RET, and ROS1. Given the high risk of disease relapse even in those presenting with early-stage disease undergoing surgical resection, and similar clinically actionable mutation profiles compared to late-stage NSCLC, strong consideration should be given to reflexive panelbased targeted molecular profiling of FFPE tissue. With the adaptation of this molecular tumor tissue testing approach, more patients will derive clinical benefit from timely targeted systemic therapy delivery in case of subsequent relapse with the predetermined molecular profile of their tumor.