1. Introduction
Lung cancer (LC) is a leading cause of all cancer-related deaths in men and women worldwide, accounting for more than 14% of all newly diagnosed cancers [
1,
2]. Although the understanding of lung cancer pathobiology has significantly improved over the last decades, disease prognosis continue to remain poor with a five-year survival rate of only 17% and 7% for non-small cell lung cancer (NSCLC) and small cell lung cancers (SCLC), respectively [
3,
4]. As with other cancers, poor disease prognosis is partially attributed to late stages at diagnosis, given that there are very few early symptoms [
5,
6]. Indeed, about 67% of all lung cancer cases are diagnosed at a late stage, leading to high mortality rates [
6]. Lung cancer treatment is stage-specific. Early stage disease (I-II UICC 8th edition) can be cured by surgical resection, while locally advanced disease requires multimodal treatment, including chemotherapy, radiotherapy, and surgery for selected cases. Recurrent metastatic disease is treated with palliative systemic therapy [
7,
8,
9]. When diagnosed at an early stage, patients with NSCLC have a 5-year survival rate of about 71%, as opposed to less than 2% for patients diagnosed with stage IV disease [
10]. Early diagnosis may thus improve patient outcome [
11].
To this end, low-dose computed tomography (LDCT) was developed for screening in high-risk groups, with a reported sensitivity of about 93% and a 20% reduction in mortality [
12,
13]. Despite this, its wide application is hampered by consideration of cost effectiveness, high rates of false positives, risk of developing radiation-related cancers, and over-diagnosis [
14,
15]. Therefore, there is still a considerable unmet need to develop non-invasive complementary biomarkers for early detection of lung cancer.
Currently, protein- and microRNA (miRNA)-based markers are at advanced validation stages or already in clinical use [
16]. Carcinogen embryonic antigen (CEA) is one of the most widely used biomarkers for cancer screening in several entities, including lung cancer. Cytokeratin-19 fragment (Cyfra21-1) is also a tumor marker for lung cancer [
17,
18]. However, high false positive elevated levels of these markers in benign disease and restricted sensitivity limit their use for routine screening for early diagnosis of cancer [
6,
18,
19,
20]. Immune-based markers are attractive. However, they show very low sensitivity. For example, one commercially available autoimmune antibodies test showed only 36% sensitivity but 91% specificity in NSCLC [
21]. miRNAs have been also tested in several studies on a plethora of platforms, including microarray, next-generation sequencing (NGS), and RT-PCR. While NGS and microarray-based approaches are highly reproducible and specific, they lack the accuracy and sensitivity required for early detection, especially in the context of low copy transcripts [
22].
The most attractive non-invasive solution for the identification of biomarkers for early detection of lung cancer is through simple blood test [
23]. To date, very few studies have addressed the utility of cell-free RNA (cfRNA) for early diagnosis of cancer, despite previous reports highlighting the potential clinical utility of this liquid compartment. Initially identified in patients with malignant melanoma [
24] and nasopharyngeal carcinoma [
25], cfRNA has been reported in several other cancer entities, including breast cancer [
26], colorectal cancer [
27,
28,
29], follicular lymphoma [
30], and hepatocellular carcinoma [
31]. Recently, cfRNA has been used for non-invasive determination of gestational aged with better performance compared with standard techniques [
32]. cfRNA is therefore a potential source of analyte for early detection biomarker discovery. In light of the aforementioned, we herein report on the use of cfRNA for early detection of cancer in solid cancers with an application to lung, pancreas, bladder, and skin cancers. We use a combination of NGS-based cfRNA profiling and real-time digital droplet PCR (RT-ddPCR) for the analysis of cfRNA for early diagnosis of solids cancers.
3. Discussion
Early diagnosis of cancer may improve survival outcome, as it allows for curative surgical intervention. Indeed, in lung cancer, the five-year survival rate of patient diagnosed at early stages is much higher than patients diagnosed at later stages [
35,
36]. Identification and validation of non-invasive biomarkers for early detection of cancer is, therefore, a considerable unmet need. Ideally, a biomarker should offer high sensitivity and sufficient specificity in a cost- and time-efficient manner to allow for its implementation into routine clinical practice [
37]. Early diagnosis of lung cancer using LDCT has been shown to reduce mortality about 20% [
38]. However, repeated radiation exposure, high cost, and high false positive rates represent significant challenges, and better alternatives are still needed.
Liquid biopsy-based test have been approved for detection of
EGFR mutation (
mutEGFR) in metastatic lung cancer [
39]. While this represents a significant breakthrough,
mutEGFR is only present in about 10–40% of all NSCLC cases [
40]. Besides, the test targets metastatic patients, thus offering little or no chances for early detection for potential curative intervention. Although cfRNA was first reported to be present in several cancer entities more than a decade ago [
24,
25,
26,
27,
28,
29,
30,
31], its diagnostic value has not been properly evaluated, especially in cancer. In spite of the fact that several legitimate concerns remain regarding the stability of cfRNA, a recent study highlighted the predictive strengths of cfRNA in pregnancy [
32]. This demonstrates that cfRNA sequencing is feasible in the context of biomarker identification.
The study presented here aimed to explore a previously unexplored liquid biopsy compartment with significant opportunities for biomarker discovery. We used two independent highly sensitive technologies (next generation sequencing and droplet digital PCR) for profiling and absolute quantification of circulating-free and vesicle-encapsulated RNA in serum and plasma samples from four different solid tumor entities. Furthermore, we evaluated the expression of selected candidate transcripts in a large cohort of tumor samples and adjacent non-tumor tissue. We validated the expression pattern of a potential candidate in early and late-stage plasma samples from lung cancer patients, as well as tumor/normal tissue samples. Patient mutation status was not considered.
Unlike tissue-derived RNA, which is predominantly contributed by cells of a specific organ, cfRNA is usually contributed by several sources as blood circulates through the body. We performed transcript signal deconvolution using reference maps from 45 different cell types and identified a distinctive epithelial source of cancer-associated transcripts. Our study reveals that the majority of transcripts that are upregulated in cancer patient plasma are in fact non-coding transcripts. Indeed, some studies [
41,
42] have reported non-coding RNAs as liquid biopsy-based biomarkers in cancer. Very few coding transcripts were identified, most probably because of endogenous cellular degradation of these transcripts following translation by the
CNOT complex [
43]. Interestingly, we identified a previously reported lncRNA species (
HOTAIRM1) that is associated with colorectal cancer [
34,
44]. Furthermore, we identified a panel of transcripts that are highly expressed in plasma from cancer patients (PDAC and lung cancers). We validated a previously reported cancer-associated transcript
POU6F2-AS2 in plasma and serum samples from lung cancer, pancreatic ductal adenocarcinoma, malignant melanoma, and bladder cancer samples. We showed that plasma and tissue transcript levels of certain transcripts correlate to a high degree, demonstrating specificity of the plasma marker for malignant tissue.
We investigated the performance of a candidate marker in early and late stage lung cancer patients. The transcript levels of
POU6F2-AS2 showed a good diagnostic performance in both early and late stage disease with an AUC of 0.82 and 0.76, respectively. This suggests that cfRNA transcript indeed may serve as a potential lung cancer biomarker. The association of tumor tissue in such discovery studies is of crucial importance, as we found out that the lung cancer associated lncRNA
MALAT1, which is upregulated in tumor tissue [
45] was upregulated in healthy plasma. Since cfRNA arises from diverse cellular sources, differential transcript contribution from different sources into the cfRNA pool may indeed affect individual transcript abundance. Hence, only transcripts with significant or exclusive tumor sources are very likely to be informative. Therefore, we performed cell-type signal deconvolution on our cfRNA sequencing data. We could show that only signals from epithelial origin could distinguish patients from healthy donors. Additionally, there was a significantly high contribution of monocytes to the cfRNA pool in healthy controls compared with patients. Furthermore, plasma total cfRNA abundance was similar in healthy donors and lung cancer patients plasma of all stages. This suggests that cfRNA abundance in plasma does not affect transcript expression patterns. On the other hand, serum derived total cfRNA abundance was significantly higher in PDAC serum samples compared with healthy serum. Although the diagnostic value of such differences was not explored here, it may represent an opportunity to be exploited.
While our study does not aim at evaluating the utility of cfRNA in disease surveillance and treatment monitoring, it does provide first-hand evidence of the usefulness of cfRNA in early diagnosis of solid tumors, especially NSCLC. Most importantly, the cross-entity validation and tumor tissue association of selected candidates strengthens the data and opens new opportunities for biomarker discovery. Finally, the fact that transcriptomic changes occur relatively faster than genomic changes and that cellular transcriptional machinery amplifies analyte molecules improves sensitivity and also allows for detection of patients irrespective of the presence or absence of measurable genetic alterations.
4. Materials and Methods
4.1. Study Design and Patient Population
Patients analyzed in this study were prospectively recruited at the different study sites, either within clinical trials or clinical translational studies. Lung cancer patients were recruited at the outpatient unit of the Department of Medical Oncology at the West German Cancer Center, University Hospital Essen (stage IV) and at the Ruhrlandklinik (stage I–III) within the framework of the CEVIR study (
https://www.transcanfp7.eu/index.php/abstract/cevir.html). Patients with either borderline or locally advanced but non-metastatic pancreatic ductal adenocarcinoma were recruited within the framework of the NEOPLAP prospective, randomized, open-label, phase II clinical trial study (
https://clinicaltrials.gov/ct2/show/NCT02125136). Samples from bladder cancer patients were prospectively collected for an institutional biobank between 2008 and 2013, and samples from melanoma patients were collected within standard biobanking procedures at the Department of Dermatology of the University Hospital in Essen. Samples from healthy blood donors were obtained from blood donors at the Department of Transfusion Medicine, University Hospital Essen.
In our pilot study, we performed total cfRNA plasma sequencing on 11 NSCLC patients, 4 PDAC, and 4 healthy blood donors. In the validation study, we performed RT-ddPCR on cfRNA from 45 stage IV, as well as 39 stage I–III NSCL samples, 20 PDAC, 12 melanoma, 22 bladder cancer, and more than 61 healthy donors (serum/plasma). Additionally, RT-ddPCR was performed on a retrospective collection of 18 (unmatched to plasma) lung tumor FFPE samples and 9 adjacent non-tumor lung tissues. The local institutional review boards approved the studies at all sites, and all participants provided written informed consent for biomedical research allowing for molecular analysis on tissue and plasma samples. (NSCLC: 17-7740-BO, 14-6056-BO, PDAC: 17-7729-BO & 92/14 ff, MM: 16-7132-BO, UBC: 08-3942). All healthy controls included in the study provided consent for the use of their samples for molecular analysis, and ethical approval was obtained from the ethical committee at the Medical Faculty of the University of Essen (17-7729-BO).
4.2. Blood Sampling and Plasma/Serum Preparation
Blood samples were collected prior to treatment initiation. For plasma preparation, blood samples were collected in 7.5 mL EDTA blood tubes (ref # 01.1605.001, Sarstedt, Nümbrecht, Germany) and centrifuged at 200× g for 10 min at 4 °C. The upper phase was then transferred into 2 mL tubes (cat # 296920064, Neolab, Heidelberg, Germany) and then centrifuged at 800× g for 10 min at 4 °C. Finally, the supernatant was aliquoted into 2-mL tubes and then centrifuged at 16,000× g for 10 min at 4 °C. The resulting plasma was transferred into new 2-mL tubes and stored at −80 °C until cfRNA extraction. When plasma preparation was not possible immediately, the samples were immediately kept at 4 °C and processed within 2 h of blood draw. Serum was prepared from blood collected in 10-mL BD Vacutainer® plus plastic tubes (cat # 366430, BD, Heidelberg, Germany). Blood samples were allowed to stand at room temperature for 30 min and then centrifuged at 2000× g for 10 min at 4 °C. The resulting serum was aliquoted and stored at −80 °C until cfRNA isolation.
4.3. cfRNA Isolation and Quantification
Cell-free RNA was isolated using the Plasma/Serum RNA Purification Mini Kit (Cat. 55,000, Norgen Biotek, Canada), which isolates both cfRNA and vesicle-encapsulated RNA, following the manufacturer’s instruction with slight modifications. Briefly, plasma/serum samples were allowed to thaw on ice and then 200 µL of plasma/serum cleared at 16,000 × g for 2 min at 4 °C and the supernatant transferred into a new tube. The precleared plasma/serum was then combined with 3 volumes (600 µL) of Lysis buffer A, containing 0.01% β-mercaptoethanol and mixed by vortexing. To one volume of lysate (800 µL), an equal amount of absolute ethanol was added and mixed by vortexing for 5 s. The lysate was then loaded onto a mini column and allowed to stand at room temperature for 10 min for RNA binding. The columns were then centrifuged at 6000 rpm for 3 min. The columns were then washed 3× with 400 µL of wash buffer A and dried by centrifugation at 13,000 rpm for 2 min. RNA was eluted in 25 µL of elution buffer A after incubation on the column for 15 min. The RNA concentration was measured using a Quantus fluorometer (Promega, Madison, Fitchburg WI, USA), and samples were stored at −80 °C until analyzed.
4.4. RNA Isolation from Tumor/Non-Tumor Tissue
RNA isolation was performed using the simplyRNA tissue kit (cat# AS1340, Promega Corporation), following manufacturer’s instructions. Briefly, tissue pieces were cut on dry ice with a sterile scalpel blade and placed in a Precellys lysing kit tube (cat# P000912-LYSKO-A, Bertin Corp, Montigny-le Bretonneux, France). Sample homogenization buffer containing 0.02% thioglycerol was added, and the tube was placed into a precooled Precellys device. The Precellys machine was then programmed to rupture hard tissue for 2 min. After the tissue was completely disintegrated, 200 µL of lysis buffer was added to the cell suspension and then transferred into a cartridge of Maxwell SimplyRNA tissue kit (cat# AS1340, Promega Corporation), and 5 µL of DNase 1 was added to the appropriate well of the cartridge. RNA isolation was done following the installed kit-specific protocol and eluted in 60 µL of nuclease-free water. Quantification was carried out using a Quantus fluorometer (Promega), and samples were stored at −80 °C until analyses.
4.5. RNA Isolation from FFPE Sections
RNA was isolated from FFPE sections using the Maxwell® RSC RNA FFPE kit (Promega, cat# AS1440), following manufacturer’s instructions. Briefly, 2.0 mm3 of tissue sections were placed in a 1.5 mL Eppendorf tube and 300 µL of mineral oil was added and vortexed for 10 s. The samples were then heated to 80 °C for 2 min and brought to room temperature. The samples were digested and lysed and then pellet collected by centrifugation at 10,000× g for 20 s. The aqueous phase containing the pellet was heated at 56 °C for 15 min and then at 80 °C for 1 h and then incubated at room temperature for 30 min. Samples were DNA-digested and then loaded into the Maxwell cartridge, and RNA isolation was performed following the RSC RNA FFPE kit protocol in the fully automated system. RNA was eluted in 60 µL of nuclease-free water and stored at −80 °C until use.
4.6. In Silico Data Mining
In order to identify tumor-associated transcripts that were upregulated in plasma samples from cancer patients, we downloaded publicly available RNA sequencing data from the gene expression omnibus (GEO). A lung cancer data set GSE81089 comprising of both early and late stage cancers was downloaded [
46]. The data set was composed of 199 NSCLC fresh frozen samples and 19 paired non-tumor lung tissues.
4.7. Total cfRNA Sequencing
Total RNA sequencing libraries were prepared from 450 pg of total cfRNA using the SMARTer Stranded Total RNA-Seq Kit v2—Pico Input Mammalian (cat# 634412, Takara Bio, Mountain View, CA, USA), after depletion of ribosomal RNA. RNA libraries were sequenced on an Illumina Novaseq 6000 with 100 bp paired end read. Fastq files were quality checked with the Fastqc tool, and the reads were trimmed with Trimmomatic tool. Reads were mapped to the human genome (Grch 38), and features were quantified by means of Htseq counts. Gene expression matrices with raw counts were processed with Deseq2 [
47] for differential gene expression and validated with edgeR and LIMMA [
48,
49].
4.8. cfRNA Transcript Deconvolution
Cell-type cfRNA signal deconvolution was performed using reference maps from 45 different cell types, as previously described [
50]. A complete list of all cell types included in the deconvolution is presented in
Supplementary Table S3. We then used random forest ensemble classifier to rank the importance of cell type signals differentiating patient and healthy samples as previously described [
51]. Hierarchical clustering and statistical test (Wilcoxon-Mann-Whitney U test) were performed in R environment.
4.9. Transcript Quantification by RT-ddPCR
Two transcripts,
POU6F2-AS2 and
AC022126.1, both of which had no measurable transcripts in the healthy donors in the cfRNA-seq data, were further analyzed. One of these two transcripts,
POU6F2-AS2 has previously been associated with cancer pathogenesis. Circulating levels of
POU6F2-AS2 and
AC022126.1 transcripts were measured by means of RT-ddPCR using the 1-Step RT-ddPCR Advanced Kit for Probes (Bio-Rad, Hercules, CA, USA) in samples from our validation cohort, which included 45 stage IV lung cancer patients, 39 stage I–III lung cancer patients, 20 PDAC, 22 bladder cancer, and more than 65 healthy control samples. Samples from the validation cohort were both plasma and serum. All reactions were performed in duplicates using 2 µL of cfRNA from each sample. The reaction components were constituted following manufacturers instruction to a final volume of 22 µL, of which 20 µL were used for droplet generation in a QX100™/QX200™ droplet generator (Bio-Rad). RT-ddPCR reactions were performed in a C1000 Touch™ thermocycler (Bio-rad), and droplets were read in a QX100™/QX200™ droplet reader (Bio-Rad). The raw transcript concentration (copies/20 µL reaction) was used to determine the absolute transcript load per ml of plasma/serum using the formula:
where
For tissue-derived samples, RT-ddPCR was performed on 4 ng of total RNA for each sample and data expressed per ng of RNA.
4.10. Statistical Analysis
Students’
t-test was used to compare the mean of two groups, and the one-way ANOVA was used to compare three or more groups. Categorical data was analyzed by Fishers’ exact test or by Chi-squared test. Strength of relationship between cfRNA copies and amounts was assessed by Person correlation. Statistical significance was set to a
p value < 0.05. The diagnostic performance of the marker was evaluated with the ROCR package [
52]. Data analysis was performed in the R version 3.6-environment and Graphpad prism version 8.3 (GraphPad Software, Inc, La Jolla, CA, USA).