An All-In-One Transcriptome-Based Assay to Identify Therapy-Guiding Genomic Aberrations in Nonsmall Cell Lung Cancer Patients

Simple Summary Treatment of patients diagnosed with advanced pulmonary adenocarcinoma depends on the presence of genomic aberrations that are targetable for a specific tyrosine kinase inhibitor. Subsequent treatment lines depend on presence of mutations that are associated with emerging resistance. These aberrations include a variety of gene activating mutations, including single nucleotide variants, small insertion-deletions, exon skipping events, and gene fusions. At this moment different assays are used to detect these aberrations in the clinic. In this paper we introduce a novel method that can detect these genomic alterations in a single, RNA-based, assay. The design of the all-in-one assay is flexible allowing addition of new targets in subsequent designs. We show that this all-in-one assay has a high accuracy even on formalin-fixed-paraffin-embedded tissue samples, making it readily applicable in a clinical diagnostic setting. Abstract The number of genomic aberrations known to be relevant in making therapeutic decisions for non-small cell lung cancer patients has increased in the past decade. Multiple molecular tests are required to reliably establish the presence of these aberrations, which is challenging because available tissue specimens are generally small. To optimize diagnostic testing, we developed a transcriptome-based next-generation sequencing (NGS) assay based on single primed enrichment technology. We interrogated 11 cell lines, two patient-derived frozen biopsies, nine pleural effusion, and 29 formalin-fixed paraffin-embedded (FFPE) samples. All clinical samples were selected based on previously identified mutations at the DNA level in EGFR, KRAS, ALK, PIK3CA, BRAF, AKT1, MET, NRAS, or ROS1 at the DNA level, or fusion genes at the chromosome level, or by aberrant protein expression of ALK, ROS1, RET, and NTRK1. A successful analysis is dependent on the number of unique reads and the RNA quality, as indicated by the DV200 value. In 27 out of 51 samples with >50 K unique reads and a DV200 >30, all 19 single nucleotide variants (SNVs)/small insertions and deletions (INDELs), three MET exon 14 skipping events, and 13 fusion gene transcripts were detected at the RNA level, giving a test accuracy of 100%. In summary, this lung-cancer-specific all-in-one transcriptome-based assay for the simultaneous detection of mutations and fusion genes is highly sensitive.

A recurrent problem, especially for advanced-stage NSCLC patients, is the small biopsy size in combination with the frequently low tumor content. This hampers comprehensive molecular testing using a combination of different tests to reliably screen for the presence of all types of clinically relevant genomic aberrations. To overcome this limitation, a comprehensive test to detect all aberrations in a single assay is needed. In several studies, a combination of DNA-and RNA-based next-generation sequencing (NGS) tests was applied, limiting the screening to two parallel tests [11][12][13][14][15]. In a study using a targeted RNA-based NGS test on frozen cytological samples from lung cancer and thyroid cancer, both fusion genes and mutations were simultaneously identified [16]. However, the currently available all-in-one approaches that cover all the different types of aberrations are not yet commonly applied in routine clinical settings.
In this study, we design a lung-cancer-specific targeted all-in-one transcriptome-based assay based on single primed enrichment technology (SPET) to simultaneously identify mutations, gene fusions, and exon skipping events. The assay covers all the gene loci that are currently relevant for selecting optimal targeted therapy in advanced stage NSCLC patients and does not require prior knowledge about the fusion gene partners. We tested the effectiveness of our comprehensive assay in samples with known aberrations, either based on the literature (i.e., cell lines) or on our routine molecular diagnostic test results. We specifically aimed to test its feasibility on formalin-fixed paraffin-embedded (FFPE) tissue samples.

Sequencing Results
In total, we analyzed 51 samples derived from 37 patients and 11 cell lines that carried, in total, 60 known genomic alterations ( Figure 1). All available RNA samples (n = 51) were subjected to our assay, irrespective of RNA quality and quantity, to gain insight into the overall performance of our assay. DV200 values were available for 42 samples. The median number of total reads obtained was 2.3 M (range: 1.6 to 5.1 M), and the median number of unique reads was 156 K (range: 1.2 to 1.3 M).
For an overview of all quality-control data, see Table S1.

Detection of SNVs and INDELs
Visual inspection of the aligned reads of our transcriptome assay in the Integrated Genomics Viewer (IGV 2.6.3, Cambridge, MA, USA) [17] revealed the presence of 34 of the 42 expected variants, with at least three mutant reads (Table 1). These included 25 SNVs, 6 INDELs, and the consequences of three MET exon 14 skipping mutations at the transcript level ( Figure 2).

Detection of SNVs and INDELs
Visual inspection of the aligned reads of our transcriptome assay in the Integrated Genomics Viewer (IGV 2.6.3, Cambridge, MA, USA) [17] revealed the presence of 34 of the 42 expected variants, with at least three mutant reads (Table 1). These included 25 SNVs, 6 INDELs, and the consequences of three MET exon 14 skipping mutations at the transcript level ( Figure 2).

Detection of SNVs and INDELs
Visual inspection of the aligned reads of our transcriptome assay in the Integrated Genomics Viewer (IGV 2.6.3, Cambridge, MA, USA) [17] revealed the presence of 34 of the 42 expected variants, with at least three mutant reads (Table 1). These included 25 SNVs, 6 INDELs, and the consequences of three MET exon 14 skipping mutations at the transcript level ( Figure 2).   Of the eight variants that we did not detect in IGV, we observed no mutant reads for seven cases and one mutant read for one case ( Table 1). The total read depth was below 15 at the position of the expected mutation in all eight samples. To quantify the expression level of the wild-type and mutant alleles in the cases where we did not detect the mutation at the transcriptome level, we set up a highly sensitive RNA-based ddPCR assay. We first tested the efficiency of the ddPCR assay in 13 samples harboring 14 confirmed SNVs/INDELs using 1-8 ng of RNA input and detected all SNVs and INDELs. A good correlation was observed between the variant allele frequencies (VAFs) determined by our all-in-one assay and by the ddPCR assay using the same batch of RNA (R-squared 0.83; Figure 3A and Table S2). For six of the eight samples in which we did not observe the expected variant in our assay, RNA was available for ddPCR. Despite using a high RNA input (497 ng), only eight wild-type and no mutant KRAS p.G12A droplets were detected for patient P32, indicating a very low KRAS expression level. For the other five samples, we did observe mutant droplets with a frequency ranging from 10% to 71% using an RNA input ranging from 25 to 370 ng. The number of mutant droplets was still much lower than the numbers observed in the positive control samples, for which we used a much lower RNA input. This implies that the abundances of both the wild-type and mutant transcripts were below the detection limit of our assay, due to either poor RNA quality, low expression of the gene, or insufficient unique reads in all six samples.
Cancers 2020, 12, x 6 of 18 Of the eight variants that we did not detect in IGV, we observed no mutant reads for seven cases and one mutant read for one case ( Table 1). The total read depth was below 15 at the position of the expected mutation in all eight samples. To quantify the expression level of the wild-type and mutant alleles in the cases where we did not detect the mutation at the transcriptome level, we set up a highly sensitive RNA-based ddPCR assay. We first tested the efficiency of the ddPCR assay in 13 samples harboring 14 confirmed SNVs/INDELs using 1-8 ng of RNA input and detected all SNVs and INDELs. A good correlation was observed between the variant allele frequencies (VAFs) determined by our all-in-one assay and by the ddPCR assay using the same batch of RNA (R-squared 0.83; Figure  3A and Table S2). For six of the eight samples in which we did not observe the expected variant in our assay, RNA was available for ddPCR. Despite using a high RNA input (497 ng), only eight wildtype and no mutant KRAS p.G12A droplets were detected for patient P32, indicating a very low KRAS expression level. For the other five samples, we did observe mutant droplets with a frequency ranging from 10% to 71% using an RNA input ranging from 25 to 370 ng. The number of mutant droplets was still much lower than the numbers observed in the positive control samples, for which we used a much lower RNA input. This implies that the abundances of both the wild-type and mutant transcripts were below the detection limit of our assay, due to either poor RNA quality, low expression of the gene, or insufficient unique reads in all six samples. Comparison of the variant allele frequencies (VAFs) as detected by the all-in-one transcriptome-based assay and ddPCR. The Y-axis represents VAFs of the mutations, as assessed by our all-in-one nextgeneration sequencing (NGS) assay. The X-axis represents the fraction abundance calculated from ddPCR. (B) Overview of unique read counts (Y-axis) in samples for which we did and did not observe the genomic aberrations with our all-in-one transcriptome-based assay. •Blue dots indicate samples with DV200 above 30. •Red dots indicate samples with DV200 below 30. •Black dots indicate samples for which the DV200 value was not measured. Dashed line indicates the cut-off level of 50,000 unique reads.
Next, we tested the performance of the inhouse pipeline in calling the variants observed by IGV 2.6.3. The three MET exon 14 skipping events were excluded as our transcriptome-based assay does not allow the detection of intronic variants. We analyzed all 51 samples with our pipeline and were able to call 28 of the 31 SNVs and INDELs observed by IGV. The three variants that were observed in IGV but not called by the pipeline were (1) AKT Serine/Threonine Kinase 1 (AKT1) p.E17K in P35, with 81 mutant reads out of 292 total reads, (2) ALK p.G1269A in P13, with 44 mutant reads out of 240 total reads, and (3) EGFR E19 DEL in P22, with 13 mutant reads out of 38 total reads. A second variant caller, Freebayes, did call the AKT1 p.E17K mutation but not the other two. The most likely reason that our pipeline did not call the EGFR E19 DEL is the improper alignment of part of the mutant reads ( Figure S1). In the samples with confirmed EGFR E19 DELs, the number of reads with Next, we tested the performance of the inhouse pipeline in calling the variants observed by IGV 2.6.3. The three MET exon 14 skipping events were excluded as our transcriptome-based assay does not allow the detection of intronic variants. We analyzed all 51 samples with our pipeline and were able to call 28 of the 31 SNVs and INDELs observed by IGV. The three variants that were observed in IGV but not called by the pipeline were (1) AKT Serine/Threonine Kinase 1 (AKT1) p.E17K in P35, with 81 mutant reads out of 292 total reads, (2) ALK p.G1269A in P13, with 44 mutant reads out of 240 total reads, and (3) EGFR E19 DEL in P22, with 13 mutant reads out of 38 total reads. A second variant caller, Freebayes, did call the AKT1 p.E17K mutation but not the other two. The most likely reason that our pipeline did not call the EGFR E19 DEL is the improper alignment of part of the mutant reads ( Figure S1). In the samples with confirmed EGFR E19 DELs, the number of reads with the deletion ranged from 20 to 311 in IGV, whereas the reported read counts ranged from 8 to 128 according to the pipeline. Apparently, our pipeline missed a subset of the reads containing the EGFR E19 DEL due to improper alignment or short read length. It is unclear why our pipeline did not call the other two variants.
In addition to the expected mutations, we detected four novel mutations ( Table 1). Three of them were frameshift mutations (NRAS p.G15fs, BRAF p.Y472fs, and KRAS p.L56fs) that are not included in the molecular diagnostic reports because they are, as yet, not relevant in making therapeutic decisions. The fourth mutation was an EGFR p.V834L mutation observed in an FFPE sample (P03), in which we also detected a KRAS p.G12A mutation. The KRAS mutation was reported in the molecular diagnostics, whereas the EGFR mutation was not observed. Reanalysis of the same RNA sample with our SPET assay did not reveal this uncommon EGFR variant, leading us to conclude that this was a false-positive observation.

Fusion Gene Detection
We identified fusion transcripts for 13 of the 18 fusions reported by diagnostic tests (clinical samples) or literature (cell lines) using two fusion detection pipelines. No additional chimeric transcripts were identified. During visual inspection of the aligned reads in IGV, we observed partly unaligned reads for three additional samples. BLAST of the unaligned sequences indicated the presence of fusion transcripts that were not called by the two pipelines. Thus, of the 18 expected fusion transcripts, we confirmed 16 with our all-in-one transcriptome assay (11 ALK, three ROS1, one RET, and one NTRK1; Table 2). In addition to the detection of the target fusion genes, our assay also pinpointed the intron in which the break occurred and the fusion partner in all cases. Ten of the 11 ALK fusion transcripts correspond to previously published fusion transcripts: seven Echinoderm Microtubule Associated Protein Like 4 (EML4)_E6-ALK_E20, one Dynactin Subunit 1 (DCTN1)_E26-ALK_E20, and one Kinesin Family Member 5B (KIF5B)_E24-ALK_E20. For P36, we observed an uncommon breakpoint region located in intron 18 of the ALK gene that resulted in an EML4_E6-ALK_E18 fusion transcript. In P42, a novel ALK fusion transcript, Myosin Phosphatase Rho Interacting Protein (MPRIP)_E21-ALK_E20, was identified. The five non-EML4_ALK fusions were defined as Ezrin (EZR)_E10-ROS1_E34 in two cases, CD74_E6-ROS1_E34 in one case, KIF5B_E15-RET_E12 in one case, and a Tropomyosin 3 (TPM3)_E7-Neurotrophic Receptor Tyrosine Kinase 1 (NTRK1)_E9 in one case.
For two FISH-break and/or IHC-positive cases, no fusion gene transcripts were identified by our assay. P07 was scored as positive based on both ALK IHC and FISH, but no fusion transcript was observed with our transcriptome assay. Of note, we also missed the SNV in ALK for this patient due to the low total read count. For P08, RET was scored positive based on a FISH break pattern in 27% of the cells (2% true split and 25% extra red signals), while an ALK break was seen in 63% of the cells. With our assay, we only observed ALK fusion transcripts and no RET fusion transcripts.
As an independent validation, we applied NanoString to detect the fusion gene transcripts. We validated the assays on five confirmed cases and were able to identify the expected fusion transcripts by NanoString (Table S3). We next examined the two cases in which we did not find the expected fusion transcripts using our all-in-one assay. For P07, NanoString detected an EML4_E6-ALK_E20 fusion transcript. In P08, NanoString detected the ALK, but not the RET fusion transcript, consistent with our all-in-one assay. Thus, the negative result of our all-in-one assay was consistent with NanoString for one case.
As a previous study showed a potential association between survival on TKI treatment and the fusion gene partner [18], we also analyzed progression-free survival (PFS) in relation to the fusion partner of the nine ALK-positive patients (Table S4). Five patients with the canonical EML4_E6-ALK_E20 fusion transcripts were treated with crizotinib and had PFS of 6,8,9,14, and 15 months. Patient P36, with an EML4_E6-ALK_E18 fusion gene, had PFS of 24 months. Patient P42, with MPRIP_E21-ALK_E20, had PFS of 8 months. Patient P08, with a KIF5B_E24-ALK_E20 fusion gene, had PFS of 19 months. Patient P14, with DCTN1_E26-ALK_E20 fusion transcript, did not respond to crizotinib, and treatment was changed to alectinib, again with no tumor response.

RNA Input Limit
It is challenging to set a clear RNA input limit for transcriptome-based assays. The tumor content of the sample and the expression level of the genes in nontumor cells are just two of the variables that play a role in setting the minimum amount of RNA needed for the assay. Despite these issues, we tried to get some insight into the detection limit of our assay. Eight samples were resequenced with an 8-fold lower library input. As a result, the number of unique reads for these samples decreased (Table S5). In all eight cases, the expected aberrations were again successfully called by the pipeline, with VAFs similar to those observed under standard conditions (Table S5). In addition, we repeated library preparation for three RNA samples using 4-fold and 20-fold lower RNA input. For two cases, this resulted in a 3-and 4-fold decrease of the number of unique reads with a 4-fold lower RNA input, and 5-and 12-fold decrease with 20-fold lower RNA input. For the third case, the pattern was less consistent for the 4-fold lower RNA input library (Table S6). Again, we were able to detect the expected variants for all samples irrespective of the RNA input. This indicates that a total RNA input of as low as 10 ng for library preparation is feasible for samples with sufficient RNA quality and high tumor cell content.

Quality Criteria for Successful Mutation Detection
We next established quality criteria for the successful detection of mutations. Without setting any quality criteria, the overall sensitivity of the assay was 87% (Table 3 and Table S7). The main factor for the successful identification of aberrations is the number of unique reads obtained in the NGS analysis (Table 3 and Figure 3B). To identify factors associated with the percentage of unique reads, we performed a univariable linear regression analysis. A significant correlation was observed with panel design version (1, 2, or 3), material type (FFPE, non-FFPE), RNA input, DV200, and the number of cycles used to amplify the library. In a multivariable analysis, material type and DV200 remained significant (Table S8). As FFPE is the standard material type in a routine diagnostic setting, we decided to reanalyze the sensitivity of our assay using a threshold of 50 K unique reads in combination with a DV200 threshold of 30 as quality criteria. In addition to the good quality samples, 13 FFPE samples fulfilled these quality criteria ( Figure 3B and Table 3). When both criteria were applied, e.g., DV200 >30 and unique reads >50 K, all 35 expected variants were detected, leading to a sensitivity of 100% (Table 3). In the analyses of all the hotspots for which our series of samples were supposed to be negative (including the five nonmutated NSCLC cases), we had one false-positive observation. This also indicates that the specificity of the assay is close to 100%.

Discussion
The currently used tests to select therapy for advanced-stage NSCLC patients include sequencing-based methods to detect mutations in hotspot regions, FISH techniques to detect chromosomal breaks, and IHC to detect aberrant protein expression. A major advantage of the targeted transcriptome-based sequencing assay we report here is that it can efficiently pinpoint all types of somatic mutations that result in an aberrant transcript in a single test. As we interrogate the transcriptome, our assay also provides information on the expression of mutant alleles. In addition, our assay provides information on fusion partners and shows the consequence of MET exon skipping mutations at the transcript level. By setting quality criteria for both RNA (DV200 >30) and the total number of unique reads (>50 K reads), our assay identified all the expected mutations at the transcriptome level, with almost no false-positives, and thus reached an accuracy of close to 100%. The application of these two criteria to our FFPE samples would have resulted in the exclusion of 8 out of 21 FFPE samples for which DV200 data were available. For four of these cases, the DV200 value was too low, whereas for the other four samples, the unique read counts were too low. In a diagnostic setting, a new tissue sample would have been requested for all eight cases. The main reason for potential dropout in a routine setting will be related to the availability of sufficient tissue in combination with tumor cell content and RNA quality. The use of freshly prepared FFPE blocks in combination with the macrodissection of tumor-cell-rich regions, which is standard procedure in diagnostic settings, will most likely increase the number of successfully analyzed samples. We expect that the dropout frequency for fresh FFPE blocks will be similar to the current dropout using DNA-based NGS assays, which are also dependent on the availability of sufficient tissue in combination with tumor content and quality of the isolated DNA. Thus, our all-in-one transcriptome-based assay is expected to have an overall good performance on clinical FFPE samples using predefined quality criteria. In a related study, we successfully used input RNA amounts of as low as 150 ng, isolated from fresh FFPE material. A limited number of experiments performed in our current study indicated that with fresh frozen material, even 10-ng quantities will suffice (Table S6).
For fusion genes, both the juxtaposed exons of the target gene as well as the specific fusion partners were identified. In addition to the common fusion products, we found several uncommon fusion partners. The MPRIP_E21-ALK_E20 fusion product has only been reported once in a conference abstract [19]. A second uncommon fusion transcript involving ALK had a break in intron 18 of the ALK gene, and the resulting EML4_E6-ALK_E18 fusion transcript has only been reported once [20]. Detailed knowledge of the fusion partner and/or breakpoints could have clinical implications. In a recent study on ALK FISH-positive lung cancer, patients with canonical fusion partners involving EML4 were found to have better overall survival (20.6 months vs. 5.4 months, p < 0.01) than those with noncanonical ALK fusions [18]. In our study of a limited number of patients, the PFS of patients with canonical and noncanonical breaks were similar. Still, the implementation of techniques to identify the fusion gene partner may become important in routine diagnostics in situations where knowledge about the fusion gene does predict drug response. These studies further underscore the importance of implementing transcriptome-based tests in the diagnostic setting.
At the DNA level, several specific mutations in introns 13 and 14 of MET have been linked to exon 14 skipping at the transcript level, but for other novel mutations, it remains unclear whether this indeed leads to MET exon 14 skipping [21]. Detection of MET exon skipping using a transcriptome-based NGS method, as described in this study, directly measures the consequence of the mutation even though the actual mutation causing the exon skipping event will not be identified.
Another potential application of our assay might be the assessment of overexpression of specific genes and its use as an indirect method to identify gene amplifications. This is relevant for MET and ERBB2, as amplification of their gene loci has been reported as a resistance mechanism for targeted TKI treatment. Implementation of this application will require additional validation experiments that are beyond the scope of the current study. Application of an all-in-one transcriptome-based assay maximizes the success rate of detecting aberrations, especially for lung cancer biopsies with generally limited tissue volume. Previously, a SPET-based method has been shown to successfully identify fusion transcripts [14]. In current molecular diagnostic settings, a few methods are available to simultaneously identify SNVs, INDELs, exon skipping, and gene fusions. A bait-based library enrichment method used to detect SNVs, INDELs, translocations, inversions, and copy number variations (CNVs) in FFPE-derived DNA confirmed the presence of all 34 known aberrations [22]. In addition, they identified ALK fusions, including the fusion partner in six out of seven ALK IHC-positive cases. In another study, 100% concordance was found between 10 paired FFPE and frozen biopsy cases [16]. Nowadays, a few commercial platforms use RNA to detect gene fusions. A comparison between their performances has very recently been published [23]. Several platforms combine the RNA-based part with a DNA-based procedure to enable the detection of SNVs and small indels. Our assay competes with currently available commercial panels, which have a much larger number of target genes. The large commercial panels require higher total numbers of reads per sample, which effect the number of samples that can be pooled in one sequence run, and thus increase the per-sample costs. Moreover, our assay can easily be adapted when the number of therapy-guiding mutations increases. An additional advantage of our assay is that it not only efficiently identifies various types of driver/actionable mutations but, at the same time, verifies the expression of the mutant alleles in the tumor cells. The SPET strategy, allowing target enrichment using a single gene-specific landing-probe designed close to the mutational hotspot, can give clinically relevant results even for samples with highly fragmented RNA. The use of UMIs allows reliable filtering for unique reads and, thereby, allows reliable identification of all variants, including unknown fusion partners. Although the use of RNA instead of DNA may be regarded as challenging, RNA-based assays are being used more regularly in diagnostic settings in academic hospitals. As several steps of the procedure can be automated, the amount of hands-on time will be similar to currently used DNA-based tests. Overall, we anticipate that our test will be competitive with current techniques for both hands-on time and costs.
To further assess the applicability of our all-in-one assay in a clinical diagnostic setting, a prospective study on routinely requested lung tumor samples should be carried out.
In total, these 51 samples covered 60 known variants, i.e., 39 SNVs/INDELs, three MET exon skipping, and 18 fusion genes ( Figure 1). Five lung tumor tissue samples without molecular aberrations, according to the routine molecular tests, were included as nonmutated NSCLC cases. All patient samples were obtained from the UMCG pathology biobank and were anonymized for the investigators. The study protocol is consistent with the Research Code of the University Medical Centre Groningen (https://www.umcg.nl/SiteCollectionDocuments/English/ Researchcode/umcg-research-code-2018-en.pdf, accessed on 22 January 2020) and national ethical and professional guidelines ("Code of conduct; Dutch federation of biomedical scientific societies", htttp://www.federa.org/codes-conduct, accessed on 1 June 2019).

RNA Isolation
RNA from cell lines and PE samples was isolated using the GeneJET RNA Purification Kit (Thermo Fisher Scientific, Waltham, MA, USA), and RNA from frozen tissue samples was isolated using a TRIzol (Invitrogen, Waltham, MA, USA)-based standard laboratory protocol, including a phase separation step with chloroform and subsequent RNA precipitation with isopropanol. The RNeasy FFPE Kit (QIAGEN GmbH, Hilden, Germany) was used for RNA isolation from total tissue sections of FFPE tissue samples without enrichment of tumor-cell-rich areas. For all kits, isolation procedures were done according to the manufacturer's protocol. The quantity and quality of the RNA samples were analyzed using nanodrop (Thermo Fisher Scientific, Waltham, MA, USA) and Fragment Analyzer (Advanced Analytical, Armes, IA, USA), respectively. The obtained DV200 value obtained indicates the percentage of RNA fragments that are longer than 200 nucleotides (Table S1).

Design of All-In-One Lung Cancer Assay
Our assay is based on the single primer enrichment technology developed by NuGEN © (SPET, United States Patent 9,650,628; San Carlos, CA, USA). The target region for the assay was designed to cover all clinically relevant genomic aberrations (Table S9) Over the course of this study, we developed three versions of our design. Minor changes in landing probe regions were subsequently made to optimize coverage of the hotspot regions, whereas the target regions remained the same. Furthermore, the highly expressed housekeeping genes added in Design 1 were replaced by less abundantly expressed housekeeping genes in later designs. Landing probes in genes relevant for immunotherapy were added in the third design, but these were not analyzed in detail in this study. The number of landing probes for each design is indicated in Table S9. The genomic locations of the landing probes are indicated in the File S1.

Library Preparation
We aimed for an RNA input of 200 ng for non-FFPE and 500 ng for FFPE samples for library preparation, as measured by nanodrop, and cDNA synthesis was done using the cDNA Module for Target enrichment (NuGEN), following the recommendations of the manufacturer. Library preparation for the SPET procedure was done according to the protocol provided by the manufacturer NuGEN © (for a graphical outline, see https://ww.nugen.com/products/technology) and described in detail elsewhere [14,37]. Briefly, after ds-cDNA synthesis, adaptors containing an 8-nt sample-specific barcode, a 6-nt unique molecular identifier, and a universal forward primer were ligated to the fragments. The resulting fragments were denatured, and landing probes containing the universal reverse primer were hybridized overnight, followed by an extension step. Subsequently, a test qPCR was done to determine the optimal number of cycles for library amplification. The number of cycles used for the library amplification was 0 to 4 above the cycle threshold determined by the test qPCR, as recommended by the manufacturer. After amplification, TapeStation measurement and/or Kapa qPCR were done to determine the molarity of the library. Eight or sixteen libraries were mixed in equimolar amounts and subjected to NGS on a MiSeq platform (Illumina, San Diego, CA, USA), with a 150 bp paired-end sequencing protocol provided by the manufacturer. Adaptor and primer sequences are shown in Figure S2.
For eight samples, the library was sequenced a second time with 1/8 of the standard input. For three high-quality RNA samples, libraries were prepared using three different amounts of RNA input: 200, 50, and 10 ng (Table S7).

NGS Data Analysis
The FASTQ files were processed with an inhouse pipeline. Alignment of reads was done using Hisat2, and Genome Analysis Toolkit (GATK 3.8.0, Cambridge, MA, USA) human genome reference build GRCh37, with decoys from the GATK bundle [38,39]. Picard Tools was used for format conversion and marking duplicates, including the unique molecular identifier information of the reads. We carried out a manual check using the IGV browser V2.6.3 for all known SNVs and INDELs, starting from the aligned reads. In addition, we designed a pipeline for variant detection. Haplotype Caller was used for the integrated calling of the variants for all samples. Variants were annotated using SnpEff/SnpSift with the Ensembl release 75 gene annotations and the dbNSFP2.7, dbsnp 138, Cosmic v72, 1000 genomes phase 3, and the ExAC 0.3 databases [40][41][42]. Our variant-calling pipelines for SNVs and INDELs are an adaptation of the GATK workflow and use Molgenis Compute [43] as the workflow management software. The data were filtered for quality metrics similar to GATK recommendations, using custom filters for population frequency and variant effect. Synonymous mutations, variants present in the 1000 human genome project at a frequency of more than 2%, variants with less than three altered read counts or a variant allele frequency (VAF) less than 5%, and variants with CADD scores less than 20 were filtered out [44]. Fusion gene detection was done with Fusion Catcher and Strand NGS software (Strand Genomics, San Francisco, USA) [45]. We focused on fusion gene analyses on ALK, ROS1, RET, and NTRK1. We only report inframe fusion transcripts with the tyrosine kinase domain of the indicated genes as well as those in which the sum of spanning and splitting read counts was at least five. Recurrent mutations that exclusively occur at the end of the reads were excluded because these most likely represent technical artefacts. Reads that could only be aligned to part of the fusion gene region, i.e., indicative of a fusion gene breakpoint split read, were subjected to BLAT analysis to identify the fusion gene partner [46].

Detection of Fusion Gene Transcripts by NanoString
Using the Lung Gene Fusion Panel (NanoString Technologies, Seattle, WA, USA), we detected fusion transcripts in ALK, RET, and ROS1. A total amount of 100-200 ng RNA was hybridized overnight following the manufacturer's protocol. The next day, samples were loaded on streptavidin-coated cartridges and analyzed on an nCounter ® SPRINT Profiler (NanoString technologies). The raw barcode counts were background-adjusted with a truncated Poisson correction using negative-control spikes and normalized relative to the positive-control spikes. Samples with good hybridization quality, as determined by good signals for housekeeping genes and counts below 30 for negative controls, were included for calling fusion transcripts. A t-test between the 3 and 5 probe counts was applied to identify imbalance probes. The presence of a fusion transcript is defined as positive based on the following criteria: the p-value is <0.01 for the 3 and 5 count difference, the 3 /5 ratio and the 3 /negative control count ratio are both >1.5, the absolute counts of fusion-specific probes are >20 (ALK and ROS1) or >30 (RET), and counts of the fusion-specific probe are >2× SD of the mean probe count across the gene, except for the outlier counts, which are above the upper Tukey fence (Q3 + 1.5*IQR).

Variant Detection by Droplet Digital (dd) PCR
We applied ddPCR (Bio-Rad, Hercules, CA, USA) on cDNA to quantify expression of the mutant allele of the most commonly observed mutations, e.g., T790M, L858R, E19 DEL in EGFR, and G12A, G12D, G12F in KRAS. For RNA samples from cell lines, cDNA was synthesized with the RevertAid H Minus First Strand cDNA Synthesis Kit (Thermo Scientific, Madison, WI, USA). For RNA from clinical FFPE tissues and pleural effusions, cDNA was synthesized with the iScript cDNA Synthesis Kit (Bio-Rad). Negative and positive control samples were included for each variant in all experiments. For good quality samples, RNA input was 1-2 ng. For other samples, the input varied between 5 and 479 ng. Reaction mixes included 11 µL ddPCR Supermix for probes and 1 µL mutation assay in a final volume of 22 µL. Droplets were generated using the QX100 droplet generator after the addition of 70 µL droplet generation oil (Bio-Rad). PCR was performed on a T100 Thermal Cycler (Bio-Rad), using the following PCR conditions: 95 • C for 10 min, 39 cycles of 95 • C for 30 s, 59 • C for KRAS or 55 • C for EGFR and ALK for 60 s, 72 • C for 15 s, 98 • C for 10 min, followed by a cooling down to 4 • C. The temperature ramp change was 2 • C per second for all steps. Droplets were counted on a QX-100 droplet reader (Bio-Rad), and data were analyzed by Quantasoft software version 1.6.6 (Quantasoft, Prague, Czech Republic) for detection of FAM and HEX signals. For EGFR T790M, the forward and reverse primers were 3 -CAAGGAAATCCTCGATGAAGCC-5 and 3 -GTCTTTGTGTTCCCGGACATAGT-5 with a HEX-labelled wild-type probe 3 -ATGAGCTG CGTGATGAG-5 and a FAM-labelled mutant probe 3-ATGAGCTGCATGATGAG-5 .
For EGFR L858R, the forward and reverse primers were 3 -GCAGCATGTCAAGATCACAGATT-5 and 3 -CATCCACTTGATAGGCACTTTGC-5 with a HEX-labelled wild-type probe 3 -AGTTTGGCCAGCCCAA-5 and a FAM-labelled mutant probe 3 -AGTTTGGCCCGCCCAA-5 . For EGFR exon 19 deletions, primers used were 3 -GTGAGAAA GTTAAAATTCCCGTC-5 and 3 -TGGCCATCACGTAGGCTTC-5 with a FAM-labelled probe covering the deletion part of exon 19 3 -AAGGAATTAAGAGAAGCAACATCTCC-5' and a wild-type HEX control probe upstream of the commonly deleted region 3 -ATCGAGGATT TCCTTGTTGGCT-5 . Primer and probe details of KRAS are according to the literature [47]. Data were analyzed using Bio-Rad QuantaSoft™ Analysis Pro (Bio-Rad). The threshold for the mutant droplet signal was set manually. The number of mutant and wild-type copies was estimated from the Poisson distribution, as indicated by the manufacturer. Fractional abundance was calculated using mutant copies divided by the sum of mutant and wild-type copies.

Statistics
To estimate which sample and test conditions were important for an optimal result of the all-in-one transcriptome test, linear regression analysis was performed using SPSS (version 23.0, IBM, Armonk, NY, USA). The uniquely aligned sequencing reads were the dependent variable. The independent variables were design version, tissue origin (FFPE or non-FFPE), RNA input, DV200, and the number of PCR cycles used for library preparation. Parameters with p < 0.1 in the univariable analyses were further included in the multivariable analysis. To study the relationship between the ALK fusion partner and survival, PFS was calculated from the date that treatment was started to the date of progressive disease on CT imaging, according to the Response Evaluation Criteria in Solid Tumors (RECIST v1.1) [48].

Conclusions
This study demonstrates the feasibility and technical validity of a targeted all-in-one transcriptome-based assay for the simultaneous detection of mutations and fusions in relatively small FFPE tissue biopsies. We expect that for routine diagnostic testing using 150 ng RNA isolated from recent FFPE tissue samples in combination with an enrichment step to increase tumor cell content, the success rate will be similar to currently used diagnostic tests, with an overall accuracy close to 100%. To get the highest performance, it is important to set minimum requirements for RNA quality and total unique reads, similar to the standard procedures for DNA-based NGS diagnostic tests.

Supplementary Materials:
The following are available online at http://www.mdpi.com/2072-6694/12/10/2843/s1, Figure S1: IGV screenshot of EGFR exon 19 with some correctly aligned reads with additional soft-clipping reads as a result of exon 19 deletion in P22, Figure S2: Schematic presentation of the final cDNA fragments of the sequence libraries, Table S1: QC data of our all-in-one transcriptome-based NGS assay, Table S2: Overview of RNA-based ddPCR results, Table S3: Overview of NanoString results on samples with expected fusions, Table S4: Progression free survival of ALK positive patients, Table S5: Overview of variant calling for samples sequenced with low and normal library input, Table S6: Overview of variant calling for samples with normal and low RNA input, Table S7: Overview of variant calling results in relation to QC data, Table S8: Linear Regression analysis to identify variables associated with percentage of unique reads, Table S9: Number of landing probes per gene in the three designs, File S1: landingprobes.