Real-World Outcomes of Patients with Advanced Epidermal Growth Factor Receptor-Mutated Non-Small Cell Lung Cancer in Canada Using Data Extracted by Large Language Model-Based Artificial Intelligence

Real-world evidence for patients with advanced EGFR-mutated non-small cell lung cancer (NSCLC) in Canada is limited. This study’s objective was to use previously validated DARWENTM artificial intelligence (AI) to extract data from electronic heath records of patients with non-squamous NSCLC at University Health Network (UHN) to describe EGFR mutation prevalence, treatment patterns, and outcomes. Of 2154 patients with NSCLC, 613 had advanced disease. Of these, 136 (22%) had common sensitizing EGFR mutations (cEGFRm; ex19del, L858R), 8 (1%) had exon 20 insertions (ex20ins), and 338 (55%) had EGFR wild type. One-year overall survival (OS) (95% CI) for patients with cEGFRm, ex20ins, and EGFR wild type tumours was 88% (83, 94), 100% (100, 100), and 59% (53, 65), respectively. In total, 38% patients with ex20ins received experimental ex20ins targeting treatment as their first-line therapy. A total of 57 patients (36%) with cEGFRm received osimertinib as their first-line treatment, and 61 (39%) received it as their second-line treatment. One-year OS (95% CI) following the discontinuation of osimertinib was 35% (17, 75) post-first-line and 20% (9, 44) post-second-line. In this real-world AI-generated dataset, survival post-osimertinib was poor in patients with cEGFR mutations. Patients with ex20ins in this cohort had improved outcomes, possibly due to ex20ins targeting treatment, highlighting the need for more effective treatments for patients with advanced EGFRm NSCLC.


Introduction
Lung cancer is the most common cancer diagnosis in Canada, with an estimated 1 in 15 Canadians receiving a diagnosis in their lifetime [1].While the prognosis and outcomes of lung cancer have improved in recent decades, largely as a result of novel, innovative therapies and increased awareness of the risk factors, this disease remains the deadliest cancer in Canada [1,2].Approximately 85% of patients with lung cancer present with NSCLC, with up to two-thirds harbouring actionable driver mutations, most commonly occurring in the epidermal growth factor receptor (EGFR) [3][4][5].EGFR mutations can be categorized based on the type of mutation and the exon in which they occur.Exon 19 deletions (ex19del) and exon 21 L858R point mutations account for up to 90% of all EGFR mutations and are often referred to as common sensitizing EGFR mutations (cEGFRm) [6].The third most frequently occurring mutations are exon 20 insertion mutations (ex20ins) and represent approximately 1-12% of all EGFR mutations, and 0.1-4% of all NSCLC mutations [7].However, uncertainty in the real-world estimates of these mutations exist, partly due to the evolution of testing methods, with recent guidelines recommending next-generation sequencing (NGS) for identifying actionable driver alterations, such as EGFR [8,9].This technique has improved sensitivity, can detect mutations using a smaller amount of DNA, and sequences a greater part of the gene compared with the historical standard, polymerase chain reaction (PCR), which is limited to specific loci and can miss up to 50% of ex20ins mutations, but it requires a smaller tissue sample than NGS [10][11][12].
The treatment of patients with EGFR mutations has been revolutionized by tyrosine kinase inhibitor (TKI) targeted therapy.The recommended first-line therapy for advanced-stage patients with cEGFRm in Canada is the third-generation kinase inhibitor, osimertinib [13,14].However, the long-term benefit of this therapy is limited by the development of acquired resistance via multiple mechanisms [15].Recently, multiple new options for overcoming osimertinib resistance have emerged, including amivantamab + lazertinib, chemotherapy, local therapy (surgery or radiation), chemotherapy + amivantamab/lazertinib, antibody-drug conjugates (ADCs), including patritumab deruxtecan and datopotamab deruxtecan, and combined targeted therapies against emergent targetable alterations (e.g., for MET amplification: osimertinib + savolitinib and tepotinib + osimertinib) [16].These emerging treatment options are particularly important as many patients with cEGFRm who are treated with a first-line TKI die before receiving a second-line one [17]; thus, there remains a high unmet need for effective and safe therapies early in patients' treatment journeys, and there is currently a lack of real-world evidence (RWE), specifically in the Canadian setting, on patients with cEGFRm who may benefit from these therapies.
Independent of acquired resistance, ex20ins are associated with limited response to TKIs [18].Compared with other EGFR mutations, patients with ex20ins have especially poor prognosis, with markedly reduced sensitivity to approved EGFR kinase inhibitors [18][19][20].Until recently, there have been limited treatment options for patients with ex20ins, with the recommended first-line treatment being either platinum-based chemotherapy or clinical trial [13].However, the Canadian treatment landscape is evolving, as the results from the phase III PAPILLON study have established amivantamab + chemotherapy as a new first-line standard for this patient population [21].As the treatment landscape changes, there is a need to gain a better understanding of the patients who may benefit from these newer therapies.
Over the past two decades, the generation of RWE from electronic health record (EHR) systems has contributed new insights into the prevalence of lung cancer subtypes and the disease characteristics and clinical outcomes for these patients.Through the routine collection of clinical evidence, real-world data (RWD) from EHRs can be harnessed to study disease progression, treatment patterns, and measure survival outcomes over time.Recent advances in artificial intelligence (AI) and Natural Language Processing (NLP) have enabled the extraction and analysis of RWD from clinical documentation and unstructured text (such as clinical notes and lab results) housed within EHR systems, with higher accuracy and at a significantly greater scale than manual abstraction, the current standard practice for extracting RWD from EHRs [22,23].It is increasingly being recognized that these technologies play an important role in clinical medicine by allowing clinician's and researchers access to previously inaccessible data, which can be used to inform clinical decision making and enhance clinical care [24].
The aim of this study was to leverage the previously validated, commercially available AI technology, Pentavere's DARWEN TM , to identify patients and extract RWD from EHRs at the University Health Network Princess Margaret Cancer Centre (UHN-PMCC), the largest cancer-treating centre in Canada, to understand the prevalence, treatment patterns, and clinical outcomes of patients diagnosed with advanced cEGFRm (ex19del and exon 21 L858R) and ex20ins mutations.

Study Design
This was a retrospective cohort study of data elements from EHRs stored at the UHN-PMCC using AI technology.The AI engine combines large language models and an ensemble of other techniques that have previously been evaluated and validated against manual abstraction across multiple disease domains, including lung cancer [22,25], breast cancer [26], dermatology [27], and infectious diseases [28] at multiple Canadian institutions, including the UHN-PMCC.
The study period extended from 1 January 2017 to 1 March 2022 and used the institutional Cancer Registry.All adult patients who were ≥18 years of age with non-squamous NSCLC and seen at the UHN-PMCC during the study period were included in the study.Follow-up data from EHRs were included up to the extent that they were available within the study period.The initial list of patients was provided from the UHN-PMCC's Molecular Testing Database.

Data Extraction
Clinical features extracted included mutation status, clinical and demographic characteristics, treatment information, and clinical outcomes.Data were extracted directly from the EHRs of all patients with non-squamous NSCLC seen at the UHN-PMCC between 1 January 2017 and 1 March 2022.The AI engine was installed on the UHN-PMCC's infrastructure and used to extract relevant data variables directly from the source systems where available.Clinical outcomes were derived using the extracted data, including time to treatment discontinuation (TTD) and overall survival (OS).All features were extracted following a set of pre-defined rules and definitions developed by the UHN-PMCC Principal Investigator.DARWEN TM AI has previously been validated against the manual chart review for the same clinical features at the UHN-PMCC, the process for which has previously been described [22].

Outcomes
The primary outcome of interest was mutation prevalence.Other outcomes of interest included the frequency of patients receiving each type of therapy by line of therapy (LoT), time from diagnosis to treatment initiation per LoT, and clinical outcomes, including TTD, OS, and OS post-osimertinib.TTD was measured from the date of the treatment initiation of one line of therapy to the last known date of the treatment of the same line of therapy.TTD was derived for first-line, second-line, and third-line therapies.OS was measured from date of diagnosis to date of death, and from date of treatment initiation to date of death for first-line and second-line therapies.Patients who did not experience the event before the study's end period were censored at their date of last follow-up or the study's end date, whichever came first.Overall survival, specifically for patients who had discontinued osimertinib, was explored and measured from the stop date of osimertinib to date of death.Patients who did not experience the event before the study's end period were censored at their date of last follow-up or the study's end date, whichever came first.OS was derived from the end of first-line osimertinib and the end of second-line osimertinib.

Statistical Analyses
Descriptive analyses were performed to summarize the patients' demographics, disease characteristics, treatment patterns, and outcomes of interest across the study cohort.Continuous variables were described using mean and standard deviation (SD) and the median and range.Categorical variables were described by frequencies and related percentages.The number of missing observations was reported for all variables.Time to event(s) was described using Kaplan-Meier curves that visually estimated the distribution of times to some events (e.g., OS) and accounted for patients for whom the event had not yet occurred, i.e., following standard censoring rules.Numbers at risk and the cumulative number of events were reported for each curve.

Patients
Between 1 January 2017 and 1 March 2022, 2154 patients were identified with nonsquamous NSCLC and were seen at the UHN-PMCC.Of these patients, 613 patients had advanced-stage disease, of which 136 (22%) patients had cEGFRm at diagnosis, 8 (1%) had ex20ins at diagnosis, 338 (55%) had EGFR wild type tumours at diagnosis, and 131 (21%) did not have mutation testing at diagnosis conducted at the UHN-PMCC.A flow diagram of the included patients is presented in Supplementary Figure S1.
Across all 613 patients with advanced-stage disease, median (range) age at advanced diagnosis was 67 years ; 51% of patients were male, 84% had adenocarcinoma, and 38% had never been smokers.At advanced diagnosis, 30% of patients presented with bone metastases, and 14% had brain metastases (Table 1).The majority of patients (81%) were diagnosed at the UHN-PMCC.Of the 131 patients who did not have mutation testing at the UHN-PMCC, 56% were also not diagnosed at the UHN-PMCC, and all 131 were not included in the clinical outcome analyses.The median (range) duration of the follow-up from diagnosis for all patients was 12.3 months (0.0-61.8) (Table 1).AI validation metrics for the AI-extracted clinical features are presented in Supplementary Table S1.a Patients could have had multiple metastatic sites at diagnosis, and therefore percentages may not add up to 100%.Further, patients may have had metastases to body parts other than the bone, brain, lung, and liver, which also explains why percentages may not add up to 100%.b Includes patients with a negative EGFR test within 3 months of NSCLC diagnosis but does not exclude the possibility of other mutations.ECOG: Eastern Cooperative Oncology Group; NSCLC: non-small-cell lung cancer; SD: standard deviation; UHN: University Health Network.

Treatment Patterns
Treatment patterns were assessed from the date of diagnosis until date of death, date of last follow-up, or the end of the study period, whichever came first.For advanced-stage patients with cEGFRm at diagnosis, 129/136 (95%) received first-line therapy, of which 124/129 (96%) received an EGFR TKI in their first-line treatment (Figure 1A; Supplementary Table S2).Of patients with cEGFRm, 62/136 (46%) did not go on to receive second-line treatment during the study period (Figure 1A) (34 of which received osimertinib in their first-line therapy and 19 of which received gefitinib in their first-line therapy), and 21/62 (34%) of these patients died.Of patients who did go on to receive second-line (74/136 [54%]) and third-line therapies (27/136 [20%]), the most common treatment type was also EGFR TKIs in those lines (Figure 1A).Between 2017 and 2019, gefitinib was the most common first-line EGFR TKI administered for patients with cEGFRm, with 81% of patients who initiated an EGFR TKI in 2017-2019 receiving gefitinib (Table 2).Coincident with provincial funding as of January 2020, osimertinib was the most frequently used first-line EGFR TKI from 2020 to 2022, with 93% of patients who initiated an EGFR TKI in this period receiving osimertinib (Table 2).For advanced-stage patients with ex20ins at diagnosis, 7/8 (88%) received first-line therapy (Figure 1B), and 3/8 (38%) received the experimental ex20ins targeting TKI, poziotinib (Supplementary Table S1).Second-line therapy was received by 5/8 (63%) patients (4/8 received chemotherapy), and 1/8 (13%) went on to receive third-line therapy (Figure 1B).For advanced-stage patients with EGFR wild type tumours at diagnosis, treatment patterns were generally heterogeneous across all lines of therapy (Supplementary Table S1).Bolded N includes patients who initiated a first-line EGFR TKI in the specified year.EGFR: epidermal growth factor receptor; NSCLC: non-small-cell lung cancer; TKI: tyrosine kinase inhibitor.
The median time from advanced diagnosis to first-line treatment initiation for patients with cEGFRm, ex20ins, and EGFR wild type tumours was 0.8 months, 2.5 months, and 1.5 months, respectively (Supplementary Table S1).Longer time from advanced diagnosis to first-line treatment initiation was observed for patients with ex20ins, likely due to a lack of clear treatment options for these patients, and time required for clinical trial enrolment.

Discussion
This study identified Canadian patients with non-squamous NSCLC at the largest cancer treatment centre in Canada and described the real-world characteristics, treatment patterns, and clinical outcomes for patients with advanced ex19del, exon 21 L858R, and ex20ins EGFR mutations using AI-extracted data.It was found that, as expected, patients with cEGFRm were primarily treated with EGFR TKIs.TKI treatment use changed over time with the approval of novel therapies.From 2020, osimertinib emerged as the most frequently administered EGFR TKI, in line with the treatment guidelines.Importantly, it was found that patients with cEGFRm treated with osimertinib progressed on therapy and exhibited poor survival rates after discontinuing treatment, emphasizing the need for more efficacious therapies earlier in patients' treatment journeys.It was also found that several patients with ex20ins were treated with the experimental ex20ins TKI, poziotinib, and may have had better survival as a result.
Among 2154 patients with non-squamous NSCLC and seen at the UHN-PMCC during the study period, 613 had advanced disease, of which 1% had ex20ins at diagnosis, consistent with other real-world estimates in Canada, and at the UHN-PMCC [29][30][31], median time from advanced diagnosis to initiating first-line therapy was longer for patients with ex20ins in comparison to patients with cEGFRm (2.5 months versus 0.8 months, respectively), likely due to the absence of a clear first-line targeted treatment option for these patients, coupled with the time required for clinical trial enrolment.
A recent European RWE registry study investigated the use of different treatment types and their impact on survival rates among patients with EGFR ex20ins mutations.Novel targeted agents, including amivantamab, mobocertinib, and poziotinib, were associated with improved survival rates in the first-line setting.As well, in the multivariate analysis, type of treatment (novel targeted therapy versus chemotherapy) had a significant effect on OS (p = 0.03) [32].In this study, of patients with ex20ins, 38% received the experimental exon 20 targeting TKI, poziotinib, in their first-line therapy and achieved better

Discussion
This study identified Canadian patients with non-squamous NSCLC at the largest cancer treatment centre in Canada and described the real-world characteristics, treatment patterns, and clinical outcomes for patients with advanced ex19del, exon 21 L858R, and ex20ins EGFR mutations using AI-extracted data.It was found that, as expected, patients with cEGFRm were primarily treated with EGFR TKIs.TKI treatment use changed over time with the approval of novel therapies.From 2020, osimertinib emerged as the most frequently administered EGFR TKI, in line with the treatment guidelines.Importantly, it was found that patients with cEGFRm treated with osimertinib progressed on therapy and exhibited poor survival rates after discontinuing treatment, emphasizing the need for more efficacious therapies earlier in patients' treatment journeys.It was also found that several patients with ex20ins were treated with the experimental ex20ins TKI, poziotinib, and may have had better survival as a result.
Among 2154 patients with non-squamous NSCLC and seen at the UHN-PMCC during the study period, 613 had advanced disease, of which 1% had ex20ins at diagnosis, consistent with other real-world estimates in Canada, and at the UHN-PMCC [29][30][31], median time from advanced diagnosis to initiating first-line therapy was longer for patients with ex20ins in comparison to patients with cEGFRm (2.5 months versus 0.8 months, respectively), likely due to the absence of a clear first-line targeted treatment option for these patients, coupled with the time required for clinical trial enrolment.
A recent European RWE registry study investigated the use of different treatment types and their impact on survival rates among patients with EGFR ex20ins mutations.Novel targeted agents, including amivantamab, mobocertinib, and poziotinib, were associated with improved survival rates in the first-line setting.As well, in the multivariate analysis, type of treatment (novel targeted therapy versus chemotherapy) had a significant effect on OS (p = 0.03) [32].In this study, of patients with ex20ins, 38% received the experimental exon 20 targeting TKI, poziotinib, in their first-line therapy and achieved better survival than patients with cEGFRm or EGFR wild type, emphasizing the benefit of novel, targeted therapies; although, it is important to acknowledge the limitation of the survival analyses for the ex20ins patient group in this study due to the small sample size associated with this rare mutation.However, in the phase II trial of poziotinib, serious adverse events were observed, including grade ≥3 diarrhoea and rash, leading to treatment interruptions, which could explain the shorter TTD1 for patients with ex20ins in this study compared with cEGFRm.Further, the recent phase III trial of mobocertinib in first-line therapy for ex20ins patients was terminated early due to futility.These results highlight the need for efficacious and safe exon 20 targeting therapies to improve survival outcomes for these patients, in alignment with the evolving treatment landscape.
Over the study period, treatment patterns for patients with cEGFRm evolved with the introduction of novel third-generation EGFR TKIs.From 2017 to 2019, gefitinib was the predominant first-line EGFR TKI, followed by osimertinib in 2020-2022.However, it is noteworthy that 62/136 (46%) of patients with cEGFRm (34 of which received osimertinib in their first-line treatment) did not go on to receive second-line therapy during the study period, and of these patients, 21/62 (34%) died.For patients with cEGFRm who received osimertinib either in their first-line or second-line therapies, OS following the discontinuation of osimertinib was poor (1-year OS [95% CI] was 35% (17, 75) post-first-line osimertinib), aligning with findings observed in the RWE study of US databases conducted by Girard et al. (2023) [33].These observations highlight the importance of effective novel treatment options early in patients' treatment journeys.Further studies may wish to investigate the specific risk factors associated with the mortality of patients prior to receiving second-line therapy.
As this study was a retrospective study of data extracted from EHRs, limitations due to the availability and accuracy of data captured in the EHR were observed.For example, many patient deaths occurred in the community setting rather than the hospital, and dates of death are only collected when hospitals are notified of a patient's death, which may have resulted in missing mortality data.This could have led to higher levels of data censoring in Kaplan-Meier curves and survival analyses.Additionally, at the UHN-PMCC, oral therapy prescription data are only dictated into the clinical notes and, therefore, these records are susceptible to incompleteness and human dictation error.Further, as this study was conducted at one urban treatment site in Toronto, Ontario, the cohort may not accurately represent the wider provincial or national population and may not be directly reproducible; however, the prevalence rates observed in this study do align with previous studies in Canada and at the UHN-PMCC [29][30][31].

Conclusions
This study identified patients with non-squamous NSCLC at one of Canada's largest cancer treatment centres using previously validated AI technology.Using these types of technologies allows for the extraction of previously unavailable data in a more consistent, efficient, and scalable way compared to manual chart review [22].The results from this study highlight the importance of effective novel targeted therapies for improving survival outcomes in patients with ex20ins EGFR mutations, in alignment with the evolving treatment landscape for first-line therapy.The findings also emphasize the need for optimal therapies early in the treatment of patients with cEGFRm.

Supplementary Materials:
The following supporting information can be downloaded at: https://www.mdpi.com/article/10.3390/curroncol31040146/s1,Table S1: Evaluation of DARWEN TM AI.Table S2: First-line treatments in advanced-stage NSCLC patients stratified by mutation status at diagnosis.Figure S1: Summary of included patients.

Figure 1 .Figure 1 .
Figure 1.Overall treatment patterns in advanced-stage NSCLC patients by mutation status at diagnosis.Line of therapy is denoted by the number followed by the treatment regimen, with first-line on the left and subsequent lines to the right."Other" includes capmatinib, savolitinib, poziotinib, mobocertinib, lazertinib, and telisotuzumab.EGFR: epidermal growth factor receptor; NSCLC: non-Figure 1. Overall treatment patterns in advanced-stage NSCLC patients by mutation status at diagnosis.Line of therapy is denoted by the number followed by the treatment regimen, with firstline on the left and subsequent lines to the right."Other" includes capmatinib, savolitinib, poziotinib, mobocertinib, lazertinib, and telisotuzumab.EGFR: epidermal growth factor receptor; NSCLC: nonsmall-cell lung cancer; TKI: tyrosine kinase inhibitor.(A) Common sensitizing EGFR mutations.(B) Exon 20 insertion mutations.

Table 2 .
First-line EGFR TKI treatment patterns in advanced-stage NSCLC patients stratified by year of initiating treatment and mutation status at diagnosis.

Figure 2 .
Figure 2. TTD in advanced-stage NSCLC patients stratified by mutation status.a Probability of staying on the line treatment.EGFR: epidermal growth factor receptor; NSCLC: non-small-cell lung cancer; TTD: time to treatment discontinuation.(A) TTD1.(B) TTD2.(C) TTD3.

Figure 2 .
Figure 2. TTD in advanced-stage NSCLC patients stratified by mutation status.a Probability of staying on the line treatment.EGFR: epidermal growth factor receptor; NSCLC: non-small-cell lung cancer; TTD: time to treatment discontinuation.(A) TTD1.(B) TTD2.(C) TTD3.

Figure 3 .
Figure 3. OS from end of first-line or second-line in patients with common sensitizing EGFR mutations who received osimertinib.(A): OS from end of first-line osimertinib; (B): OS from end of second-line osimertinib.EGFR: epidermal growth factor receptor; OS: overall survival.

Figure 3 .
Figure 3. OS from end of first-line or second-line in patients with common sensitizing EGFR mutations who received osimertinib.(A): OS from end of first-line osimertinib; (B): OS from end of second-line osimertinib.EGFR: epidermal growth factor receptor; OS: overall survival.

Table 1 .
Clinical, demographic, and disease characteristics of advanced-stage NSCLC patients stratified by EGFR mutation status at diagnosis.

Table 3 .
Time to event analyses for patients stratified by mutation status at diagnosis.
CI: confidence interval; EGFR: epidermal growth factor receptor; NA: Not applicable either due to small sample size or confidence interval not reached; OS: overall survival; TTD: time to treatment discontinuation.