1. Introduction
Over the past decade, the concept of liquid biopsies has been introduced as an alternative to a conventional needle (tissue) biopsies for several cancers. Several possible advantages have been recently discussed, including the use of different biofluids (blood, urine, sputum, saliva), early warning of pending resistance, serial monitoring of patients undergoing numerous cycles of treatment, less risk to patients, especially those who are not candidates for invasive biopsy [
1,
2]. For example, tests can be conducted on a blood sample (i.e., non-invasively) to examine circulating tumour cells (CTCs) or circulating tumour DNA (ctDNA) with the aim to provide a better insight into the real-time dynamics of a disease.
CTCs are tumour cells present in the bloodstream that shed either from the primary tumour or its metastasis [
3,
4]. These cells are different to normal circulating blood cells and express tumour-specific characteristics [
5,
6]. A sub-population of CTCs have the functional capacity to enter distant sites and progress towards metastatic events; termed disseminated tumour cells (DTCs) [
7]. CTCs are of prognostic value in various cancers, but their clinical value is still under intense investigation [
8,
9,
10].
Another area where advancements have been made is that of the use of ctDNA, which is fragmented DNA that originates directly from the cancer cell itself [
11]. Emerging studies relate ctDNA to tumour progression and rate of tumour cell turnover, a biological measure of tumour aggressiveness [
12,
13]. In a recent study, we have demonstrated a potential use of Alu repeats ratios for prognostic purposes in the advanced setting for lung cancer (LC) patients [
14].
We therefore hypothesised that the assessment of circulating blood markers as surrogate real-time biopsies for is possible. In this study, we employed a wide-repertoire of techniques to explore this possibility for patients suffering with LC. LC is still one of the biggest killers in the western world today, with incidence remaining high, and as many as 46,403 new cases reported in the UK [
15,
16]. Some 30–50% of non-small cell LC patients (NSCLC) treated with curative intent suffer recurrence, and variable response rates to chemotherapy drugs are hindering survival rates [
17,
18,
19]. Thus, a routine biomarker monitoring system post-radical therapy could be of significant prognostic value, detecting earlier relapse and allowing rapid initiation of treatment. Such a biomarker system would also benefit patients undergoing therapies (i.e., radiotherapy, chemotherapy) as an alternate or additional method of monitoring and predicting response, rather than rely entirely on imaging.
In this study, using a novel flow-cytometry imaging technology, we have successfully detected and characterised CTCs from LC patients, and identified an 18-gene signature based on RNA-seq analyses. We have also demonstrated that measuring the chemical composition plasma using Raman spectroscopy can be of diagnostic value. Finally, we provide evidence (based on CNI scoring of material from two patients) that their liquid biopsy is a true representative of all the genetic changes of their tumours.
3. Discussion
In this study, we demonstrate using a wide repertoire of techniques that liquid biopsy can become a valuable alternative tool when it comes to cancer screening. We provide evidence that characterisation, and quantification of CTCs in the blood of LC patients without enrichment is possible, using the ImageStream™ technology. Our findings corroborate previously published reports in terms of presence of CTCs in LC [
24,
25,
26]. However, we were able to identify substantially more CTCs per patient without enrichment (apart of removal of RBCs) compared with published techniques where EpCAM is utilised to enrich blood samples for CTCs [
27,
28]. One caveat of these approaches is that EpCAM-based enrichment might miss CTCs undergoing epithelial to mesenchymal transition (EMT) [
29]. Current enrichment methods vary markedly in their approaches of CTC isolation and characterisation: from size- and gradient-based to dielectrophoresis and use of surface-based antigens like the EpCAM CellSearch™ System (Menarini-Silicon Biosystems, Castel Maggiore (BO), Italy), is the only FDA-approved method that monitors CTC levels in patients with prostate, breast and colorectal cancers. This platform uses ferrofluid nanoparticles containing EpCAM antibodies to capture CTCs, that are then further verified by cytokeratin staining [
30].
Moreover, there is still controversy around the actual numbers of CTCs in cancer patients. For example, in a meritorious review article on the prognostic utility of CTCs, the authors mention an enormous range of CTCs in LC patients (range 0–5986 cells per ml of blood) [
31]. Furthermore, a study that used a microcavity array (MCA) system that is label-free, the authors detected up to 2329 CTCs in a blood sample of LC patients [
32]. We also acknowledge that there are differences in CTC enumeration to previous published studies [
33]. This could stem out of experimental procedures (i.e., choice of collection tubes, handling of blood samples, their half-life in blood, choice of antibodies, use of IDEAS software) as well as interpatient variation that is well documented in LC. To date no standardisation has been achieved yet due to a plethora of different isolation and characterisation platforms, so further studies are also needed to compare sensitivity and specificity.
We would also like to put forward another possibility of large numbers detected. Recently, it has been shown that circulating endothelial cells (CECs) may be useful for the assessment of NSCLC patients [
34]. Interestingly, high numbers of CECs appear to present in cancer patients raising the possibility of cytokeratin staining detecting more than one type of cells; apart from WBCs. We appreciate this might be a limitation of our study but provides preliminary evidence for a far wider repertoire of studies to dissect the nature of these cells. Therefore, we would like to propose to the scientific community working in liquid biopsies to use the term “circulating tumour-related cells” rather “circulating tumour cells” until a bona fide universal description is adopted. Collectively, our data demonstrate a potential use of CTCs as a non-invasive screening tool in support of early diagnosis in LC.
Genomic analysis in the form of ctDNA or total RNA is another growing area of interest in the field of cancer diagnosis and/or prognosis. Exploration of RNA-seq data from the blood and tissue of LC patients and benign control patients revealed 21 genes matching strict criteria for potential biomarkers (all exhibited similar differential expression patterns in both blood and tissue in LC patients versus benign control patients). Interestingly, X inactive-specific transcript (XIST)-a non-coding RNA gene- shows a marked up-regulation in cancer patients for both blood and tissue compared to benign control patients [
35].
Long non-coding RNAs are known to often contribute to unrestricted growth and invasion of cancer cells, with XIST shown to be up-regulated in several cancers, including colorectal, gastric, non-small cell lung cancer (NSCLC), [
36,
37,
38]. Silencing of the XIST gene resulted in suppressive functions, including, inhibition of cell proliferation and invasion, and induced apoptosis [
39]. These findings suggest that XIST can act as a potential biomarker and/or target for therapeutic interventions [
37].
Another liquid biopsy technique explored here was that of Raman spectroscopy. This is a method of measuring the biochemical composition of a material using the inelastic scattering of laser light [
40,
41]. Drop coated deposition Raman spectroscopy (or DCDRS) has been demonstrated to provide protein quantification [
42] from body fluids such as tears [
43] and blood [
44] and be effective in identifying colon cancer [
45]. In the present analysis we observed a significant loss in carotenoids in LC patients compared to controls. Carotenoids are a structurally and functionally diverse group of natural pigments of the polyene type, known to be very efficient physical and chemical quenchers of both singlet oxygen (
1O
2), and potent scavengers of other reactive oxygen species (ROS) [
46]. Furthermore, it has been proposed that carotenoids such as β-cryptoxanthin stimulate the expression of an anti-oncogene, and
p73 (a p53-related gene) [
47]. Thus, this observed loss in carotenoids could be regarded as a blood-based biomarker of tumorigenesis in LC patients.
These findings are similar to the metabolomic differences observed between primary breast cancer cell lines compared to normal cells, using Raman spectroscopy [
48]. Differences in the biochemical and structural make-up of tumour tissue were also noted in melanoma patients, allowing identification of specific components affected in this cohort [
23,
49]. Furthermore, with the complimentary technique of Fourier transform infrared spectroscopy on sputum it has been shown that biochemical differences are apparent in LC patients [
50].
We acknowledge that our study has several limitations. One of them, frequently encountered in clinical studies is inter-patient variability [
51]. Inter-patient variation might mask trends in gene and protein changes, particularly if these changes are subtle. This was evident in the CTC levels of LC patients. As highlighted earlier, the exact mechanism of CTC shedding is not well understood [
52,
53], and the frequency of CTCs as well as size seems to differ from patient to patient. A larger cohort of patients, including LCs from all different stages will provide a better insight. Moreover, use of more antibodies like CK7 or TTF-1 will also enable us to characterise CTCs from the entire pool of circulating cells detected. We also intend to correlate CNI, Raman spectra and CTC levels in a larger cohort of patients undergoing chemotherapy with a view to see if these readouts can be of any prognostic value. With the advancement of isolation technologies, it will also be interesting to obtain molecular and CNI profiling from isolated single CTCs. Collectively our data provide a broader repertoire of tests that can be performed from a single liquid biopsy (
Figure 7). This approach can potentially facilitate the development of novel therapeutics and provide new insights in to the mechanisms and biology of the disease beyond LC [
54,
55,
56].
4. Materials and Methods
4.1. Sample Collection and Preparation
Tissue and blood samples were collected from patients undergoing tissue biopsies/surgical resections for benign lung conditions and LC. Lung tissue was collected from Harefield Hospital, London, UK (
Table 1,
Table 2 and
Table 3). All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the local Ethics Committee (IRAS number 151666). All tissue samples were retrieved in formalin fixed, paraffin embedded blocks. Tissue sections were cut at five µm using a microtome, followed by oil immersion to remove paraffin. 40 milligrams of lung tissue were lysed in a Tissue Lyser II (Qiagen, Manchester, UK) for two minutes with a three mm stainless steel ball bearing. Resulting RNA was stored at −80 °C. Blood samples were collected into EDTA tubes, inverted 10 times, and processed within four hours of acquiring. RNA was extracted from 0.5 mL of whole blood using the Ribopure purification kit, blood (ThermoFisher, Waltham, MA USA), according to the manufacturers’ instructions. Plasma was obtained from whole blood samples by centrifuging at 2500 rpm. The resulting plasma layer (~2 mL) was subjected to a further spin at 2500 rpm to remove any impurities. The extracted plasma sample was then stored at −80 °C and 1 mL of plasma was used for cfDNA extraction.
4.2. Cell Lines
A549 (ATCC® CCL-185™) and H1975 (ATCC® CRL-5908™) human adherent epithelial cells were used as in vitro models of human LC. A549 cells were grown in complete DMEM (Dulbecco’s Modified Eagle’s Medium, ThermoFisher, Waltham, MA, USA) with 10% fetal bovine serum (FBS, Gibco, ThermoFisher), 1% penicillin/streptomycin (Pen/strep) (Gibco) and 1% L-glutamine (Gibco). H1975 cells were grown in RPMI + 1% L-Glutamine (Gibco), 10% FBS and 1% Pen/strep. Cell lines were cultured at 37°C, 5% carbon dioxide (CO2) and subcultured when approached 80% confluency, approximately two times a week. Cell suspension was resuspended in different cell concentrations which were 5000, 2500, 1200, 600, 300 and 150 cells/mL and spiked in 1 mL blood from a donor.
4.3. ImageStream Processing and Analysis
One mL of whole blood per patient was treated with nine ml of red blood cell lysis buffer (RBC, G Biosciences, St. Louis, MO, USA), inverted eight times and incubated for 10 min with gentle agitation. The solution was then spun for 10 min at 2500 rpm and the supernatant removed. Then 3 mL of RBC lysis buffer was added to resuspend the pellet and another 10 min incubation and 10 min of spinning followed. The resulting pellet was then resuspended in 1 mL of ice cold 4% paraformaldehyde (PFA, Sigma-Aldrich, Gillingham, UK) and transferred in a 1.5 mL microcentrifuge tube for 5 min in ice. All the centrifuge steps following were for 3.5 min at 3500 rpm. The cell suspension was centrifuged and the PFA was removed. The pellet was washed with Phosphate-buffered saline (PBS) and centrifuged, and the PBS was removed. Samples were then blocked for one hour in 10% blocking buffer (10% Bovine Serum Albumin (BSA) in PBS; Sigma-Aldrich, Gillingham, UK). 1:100 conjugated AE1/AE3 antibody and 1:100 conjugated CD45/LCA PE-Texas Red® (Life Technologies, Carlsbad, CA, USA), in BSA at 4 °C overnight in gentle agitation. Following the overnight incubation, the cells were spun and washed with washing buffer (0.1% Tween in PBS) to remove any remaining antibody.
Washing buffer was removed and resuspended in 100 μL Accumax (Innovative Cell Technologies, San Diego, CA, USA) to dissociate any cellular aggregates. 0.5% of DRAQ5 DNA (Biostatus Ltd., Loughborough, UK) was added for nuclear stain before the visualisation on the ImageStream™. All the data files were then analysed on the IDEAS software® Cells (AMNIS, Seattle, WA, USA) were gated using the intensity of the staining and their size. Samples positive for the AF488 (green), negative for CD45 (orange) and positive for the DRAQ5 (Biostatus Ltd.) nuclear stain (red), were classed as CTCs. Cells were quantified per 10,000 captured cells.
4.4. Raman Spectroscopy
For each sample three replicate 1 μL drops of blood plasma were deposited onto a stainless-steel substrate and were allowed to fully dry under RTP conditions [
57,
58]. Two Raman maps (~300 × 300 μm) were collected from each drop in Streamline
® mode using an InVia
® system (Renishaw, Wotton-under-Edge, UK) under the following parameters: λ
ex 830 nm, 130 mW, 50× long working objective, 600 L/mm grating and a 3 s exposure time. In total, 120 Raman maps (20 patients by three drops by two maps = 120) were collected for LC, and 60 for the control samples (10 participants by three drops by two maps = 60). Each Raman map was averaged, baseline subtracted, and all data was normalised using standard normal variate (SNV) approach. This was to take into account any variations in signal that may have originated from subtle differences in focussing or the thickness of dried rings. Raman spectra contain myriad of information on biomolecules present. To ensure we did not discard information of use for discrimination, we included all the spectral data in the analysis, i.e., we did not preselect any known biochemical peaks in advance. To ensure independent testing of any classification models developed here, we left all the data from each participant (i.e., 6 mean spectra from the 6 maps) out of the model in turn.
We then performed multivariate analysis in the form of principal component analysis (PCA) initially to reduce the dimensions of the data by calculating the most significant spectral variance described in the training dataset (all the data not including the left out participant’s data) and using a small number of scores to describe the relative contribution to each spectrum of each of the spectral components described in the PCA. We then used the PCA score values for each spectrum in a supervised classification approach, linear discriminant analysis (LDA) to calculate a single discrimination function combining the key contributions for discrimination from the PCA scores. This minimised the separation within samples from the same pathology groups and maximized the separation between them. With 30 participants there were 30 training models calculated from the remaining 29 participants and the pathology of the left out participant was prediction using the classification model. This is called leave-one-out cross-validation. The results show the prediction of pathology for each individual mean spectrum, i.e., six from each participant [
59,
60].
4.5. Chromosome Number Instability Scoring and Analysis
cfDNA was extracted from 2 mL of plasma using the Large Volume Viral Nucleic Acids Extraction Kit (Roche, Basel, Switzerland) according to the manufacturer’s instructions, but without addition of carrier RNA. Extracted cfDNA samples were processed using the ThruPLEX DNA-seq Kit (Takara Bio USA, Mountain View, CA, USA) according to the manufacturer’s instructions using dual-indexed adapters. The resulting sequencing libraries were pooled and paired-end sequenced (38bp/37bp) on a NextSeq500 (Illumina, Cambridge, UK). DNA from fresh tissue was extracted using the DNeasy Blood & Tissue Kit (Qiagen) and DNA from FFPE tissue was extracted using the GeneRead DNA FFPE Kit (Qiagen). Sequencing libraries from tissue DNA were constructed using the NEBNext Ultra II DNA Library Prep Kit for Illumina (New England Biolabs, Ipswich, MA, USA). CNI Scores were assessed as described [
55,
61]. Briefly, shotgun sequencing libraries were prepared from the extracted cfDNA and sequenced at low coverage (∼30 M reads per sample). The copy-number differences are detected by read counting statistics after mapping to the reference genome (HG19). Read counts were transformed into log2 transformed read ratios were obtained for 701 windows (each ∼5.5 Mbp) distributed over the whole genome and converted into Z values using values obtained from a reference group of 133 normal cfDNA samples. For the windows in which the null hypothesis (equality to reference) is rejected at a 0.2% false-positive rate, the absolute values of the Gaussian cumulative density function are summed to generate the CNI, thus reflecting the number and the amplitude of aberrations in the tumour as well as the tumour fraction of cfDNA. 8.6 ng for MAS4 and 4.1 ng for MAS12 was the cfDNA was input into the library preparation for the CNI experiments.
Circos plots show the data of the CNI analyses; each dot represents a genomic bin for which the copy-number was calculated. Plasma: Values that are significantly different from normal individuals are displayed as red (gain)/purple (loss) dots. Tumour: Read counts per bin were normalized to the median read counts over all bins. Ratios are displayed as log2 values, whereby log2 values > 0.15 (gain) or < –0.15 (loss) are displayed as red or purple dots.
4.6. RNA-Seq Processing and Analysis
Extracted RNA samples from matched blood and tissue of LC patients (n = 3) and controls (n = 3) were sent to the Wellcome Trust Genomic Centre (Oxford University, Oxford UK) for RNA-sequencing. RNA samples were normalised to 630 ng total RNA and the libraries prepared with the Illumina TruSeq Stranded mRNA Library Prep Kit (Illumina, Cambridge, UK) which involves isolation of the polyA containing mRNA molecules using poly-T oligo attached magnetic beads. All libraries were pooled equimolar and sequenced on one lane of HiSeq4000 at 75 bp paired end according to Illumina specifications.
The RNA-seq data was analysed using open source software from the Tuxedo suite: namely TopHat2 [
62] and Cufflinks [
63]. The paired end raw reads were mapped to the human reference genome hg37 (Ensembl 74) using the annotations from GENCODE 19 [
22], withTopHat2 (Bowtie 2) under standard conditions. The resulting alignments were filtered for high quality hits using Samtools [
64] with a minimum selection threshold score of 30. Next, we used Cufflinks to assemble the mapped reads into transcripts and quantify their expression levels in patient and control samples. Finally, we used Cuffdiff, as part of the Cufflinks package, to identify differentially transcribed genes and transcripts between any two states (cancer tissue vs. normal, cancer blood vs. normal, cancer tissue vs. cancer blood, and normal tissue vs. normal blood). Functional enrichment analyses and Venn diagrams were performed in the open software FunRich [
65]. The statistical cut-off of functional enrichment analyses using this stand-alone software was kept at default setting with a
p-value < 0.05 after Bonferroni correction [
66].
4.7. Statistical Analyses
Statistical analyses were performed using one-way ANOVA followed by Tukey’s Multiple Comparison Test with significance determined at the level of p < 0.05.