Multiplex Analysis of CircRNAs from Plasma Extracellular Vesicle-Enriched Samples for the Detection of Early-Stage Non-Small Cell Lung Cancer

Background: The analysis of liquid biopsies brings new opportunities in the precision oncology field. Under this context, extracellular vesicle circular RNAs (EV-circRNAs) have gained interest as biomarkers for lung cancer (LC) detection. However, standardized and robust protocols need to be developed to boost their potential in the clinical setting. Although nCounter has been used for the analysis of other liquid biopsy substrates and biomarkers, it has never been employed for EV-circRNA analysis of LC patients. Methods: EVs were isolated from early-stage LC patients (n = 36) and controls (n = 30). Different volumes of plasma, together with different number of pre-amplification cycles, were tested to reach the best nCounter outcome. Differential expression analysis of circRNAs was performed, along with the testing of different machine learning (ML) methods for the development of a prognostic signature for LC. Results: A combination of 500 μL of plasma input with 10 cycles of pre-amplification was selected for the rest of the study. Eight circRNAs were found upregulated in LC. Further ML analysis selected a 10-circRNA signature able to discriminate LC from controls with AUC ROC of 0.86. Conclusions: This study validates the use of the nCounter platform for multiplexed EV-circRNA expression studies in LC patient samples, allowing the development of prognostic signatures.


Introduction
With 350 deaths per day projected for 2022, lung cancer stands as the main cause of cancer-related mortality, leading the second highest incidence in the United States and The study was carried out in accordance with the principles of the Declaration of Helsinki, under an approved protocol of the institutional review board of Quirón Hospitals. We obtained and documented written informed consent from all the patients. A total of 36 samples from early-stage NSCLC (stages IA to IIIA) were selected from our institution, along with 30 samples from non-cancer controls (Table 1). Clinical information from patients and controls included age, gender, smoking status, tumor histology and stage, when applicable. All samples were de-identified before further processing for confidentiality purposes.

Plasma Processing
Around 10 mL of whole blood was collected from the participants enrolled in the study using sterile EDTA Vacutainer tubes (BD, Plymouth, UK) and processed within the next 2 h. Blood samples were centrifuged twice at 2000× g at room temperature (RT) in a Rotina 380 R centrifuge (Hettich, Tuttlingen, Germany) for 10 min to separate plasma from red/white blood cells, platelets, and cell debris. Aliquoted plasma samples were then stored at −80 • C until downstream processing.

Enrichment of EVs
EVs were isolated from plasma using differential ultracentrifugation (UC) as described previously [27] or the miRCURY Exosome Serum/Plasma Kit (Qiagen, Hilden, Germany).
In the case of UC, 500 µL plasma samples were transferred into 15 mL sterile highspeed centrifuge tubes (VWR-Avantor, Philadelphia, PA, USA), filled up with sterile 1× phosphate-buffered saline (PBS) and centrifuged twice at 10,000× g for 30 min at 4 • C in a Sorvall RC 6 Plus centrifuge (Thermo Fisher Scientific, Waltham, MA, USA). Supernatants were then transferred into UC tubes (Beckman Coulter, Brea, CA, USA), equilibrated with sterile 1× PBS, and spun twice at 70,000× g for 1 h at 4 • C in the Sorvall WX Ultra 100 centrifuge (Thermo Fisher Scientific). The EV enriched pellets were resuspended in 100 µL sterile PBS and stored at −80 • C until used. EV enrichment with the miRCURY Kit was performed as described [25]. Debris was cleared from 500 µL plasma samples with thrombin and subsequent centrifugation at 10,000× g for 5 min at RT. Samples were then incubated with Precipitation Buffer overnight at 4 • C and centrifuged twice (500× g, 5 min Pharmaceutics 2022, 14,2034 4 of 18 at RT). Supernatants were discarded, EV enriched pellets were resuspended in 270 µL of Resuspension Buffer and stored at −80 • C until used.

Transmission Electron Microscopy (TEM)
Visualization of EV samples was performed by the TEM service of the Universitat Autónoma de Barcelona (UAB). A volume of 3.9 µL of EV-enriched sample was blotted onto a Holey Carbon Film Supported Nickel Grid (Merck, Darmstadt, Germany) previously glow-discharged in a PELCO easiGlow glow cleaning system (Ted Pella Inc, Redding, CA, USA). Next, the grid containing the sample was plunged into a Leica EM GP cryo-work station (Leica, Wetzlar, Germany) comprising a liquid ethane bath cooled to −180 • C, and subsequently transferred and visualized in a JEOL 2011 TEM (Jeol Ltd., Tokyo, Japan) operating at 200 kV. Samples were kept at −180 • C during the observation and captures were obtained with a Gatan Model 895 UltraScan 4000 4k × 4k CCD camera (Gatan Inc, Pleasanton, CA, USA). Image processing was performed using ImageJ software (version 1.8.0, National Institutes of Health, Bethesda, MD, USA).

Nano-Flow Cytometry Measurements
The volume of EV samples was brought to 500 µL with sterile PBS. Size-exclusion chromatography (SEC) columns (qEVoriginal/35 nm, Izon Science, Oxford, UK) were equilibrated with 20-30 mL of sterile PBS and eluted using the same buffer. Collection started immediately after loading the sample into the column, according to manufacturer instructions. Eluted EV-enriched samples were directly analyzed with the nanoFCM (NanoFCM Ltd., Nottingham, UK), a nanoparticle flow cytometer. Instrument calibration with standard beads enabled accurate measurements of both size and concentration of 40-200 nm particles through the detection of their side scatter [28].

RNA Isolation and DNase Treatment
EV-enriched samples were treated with 4 µg/mL of RNase A (Sigma-Aldrich, Burlington, MA, USA) for 1 h at 37 • C, to eliminate any non-vesicular RNA. TRI Reagent (MRC, Cincinnati, OH, USA) was added to a final volume of 1 mL and incubated at RT for 20 min. Then, 200 µL of a Chloroform and Isoamyl Alcohol dilution (24:1) (Panreac Química SLU, Barcelona, Spain) were added, followed by vigorous shaking and centrifugation at 12,000× g for 15 min at 4 • C. Upper fraction was collected, and RNA was precipitated by adding 2.5 µL of glycogen (Merck) and 500 µL 2-propanol (Merck), followed by incubation at RT for 10 min and further centrifugation at 12,000× g for 10 min at 4 • C. RNA pellet was then washed with 75% ethanol, dried at 95 • C for 3 min and resuspended in 12 µL of nuclease-free water.
The DNA-free DNA Removal Kit (Thermo Fisher Scientific) was used to eliminate any DNA remaining in the samples. Following the manufacturer's protocol, 0.75 µL of DNase buffer and 1 µL enzyme were added to 7.5 µL RNA sample and incubated at 37 • C for 30 min. A volume of 0.75 µL of DNase inactivation reagent was then added to the reaction, incubated for 2 min at RT and centrifuged for 1.5 min at 10,000× g and RT. The supernatant containing EV-RNA was then transferred to a fresh tube and stored at −80 • C until further use.

RT-qPCR and Sanger Sequencing Analysis
RT-qPCR and Sanger sequencing of circRNA junction sites were performed as previously described [29]. Divergent primers and probe sets were designed using Primer Express 3.0 Software (version 3.0.1, Applied Biosystems, Waltham, MA, USA) with the probes spanning the circRNA junction site (Table 2). Five microliters of EV-RNA was converted into cDNA using the M-MLV reverse transcriptase enzyme and random hexamers (both from Invitrogen, Waltham, MA, USA). A 1:3 dilution of cDNA was performed, and 2.5 µL were added to the Taqman Universal Master Mix (Applied Biosystems) in a 12.5 µL reaction containing a specific pair of primers and probe for each circRNA. Three replicas of each sample were run for the quantification of the expression of each assessed circular transcript. Quantification of gene expression was performed using the QuantStudioTM 6 Flex System (Applied Biosystems) and the comparative Ct method. For Sanger sequencing, 10 µL of each PCR product was subjected to electrophoresis in a 2× agarose gel (100 V, 30 min) and visualized under UV light (E-Gel™ Safe Imager™ Real-Time Transilluminator, Invitrogen) after electrophoresis (E-Gel™ iBase™ Power System, Invitrogen). Five microliters of each cDNA sample were purified using the PCR ExoSAP-IT Product Clean up Reagent (Applied Biosystems). Sequencing PCRs were set up using the BigDye Terminator v3.1 Cycle Sequencing Kit (Applied Biosystems), forward primer, cDNA and water in a final volume of 20 µL and performed using a Verity 96-well thermal cycler (Applied Biosystems). After sequencing amplification, samples were loaded into a 96-well plate and subjected to Sanger sequencing using the 3130 Genetic Analyzer (Applied Biosystems).

nCounter Processing
The nCounter Low RNA Input Amplification Kit (NanoString Technologies, Seattle, WA, USA) was used to retrotranscribe and pre-amplify 4 µL of EV-derived RNA in a Verity thermal cycler (Applied Biosystems) following NanoString's guidelines. Briefly, samples were denatured at 95 • C for 10 min and hybridized for 18 h at 67 • C. Our custommade nCounter panel (including 78 circRNAs, 6 linear reference genes and 4 mRNAs [30] was used to analyze EV-derived pre-amplified cDNA according to the manufacturer's instructions. RCC files containing data outputted by the NanoString nCounter Flex System (NanoString Technologies) from each run were exported to the nSolver Analysis Software (Version 4.0.70, NanoString Technologies).

Differential Expression Analysis
Raw count nCounter values were exported to Microsoft Excel (Version 16.40, Microsoft, Redmond, WA, USA) using nSolver Analysis Software. The background was calculated for each sample as (geo)mean ± 2SD of the negative probe counts (NCs) Raw counts lower than the background were automatically excluded from further analysis. The raw circRNA counts were normalized using the total number of counts of the sample and multiplied by 10,000. Differential expression analysis was performed comparing the means of the normalized counts for each circRNA in the early-stage NSCLC vs. non-cancer controls. The circRNAs with a fold change >1 and p-value < 0.05 were considered as differentially expressed (DE).

Data Pre-Processing and Normalization for Signature Development
Raw RCC-formatted data files were exported from the nSolver Analysis Software (NanoString Technologies). R (Version 4.0.3, R Core Team and the R Foundation for Statistical Computing, Vienna, Austria) and R studio (Version 2021.09.0, RStudio PBC, Boston, MA, USA) were used for pre-processing and normalization analysis of the imported files. Initial evaluation of the quality and integrity of the RCC data was performed using the NanoStringQCPro (Version 1.22.0) package. During this process, we looked for potential outliers based on the performance of standard control metrics provided by NanoString, such as Imaging, Binding Density, Positive Control Linearity, and Limit of Detection. After this first pre-analytical step, samples were subjected to supplementary exploratory examination, including Principal Component Analysis (PCA) and interquartile range (1.5 IQR rule) analysis. Samples found as outliers by both methods were then excluded from downstream analyses.
NCs were employed to exclude lowly expressed circRNAs with excessive background noise. The arithmetic mean of the NC ± 2SD was subtracted from each endogenous circRNA for each sample. Any transcript scoring a value below 0 in more than 75% of the analyzed samples was then excluded from further analysis. PCA plot was used to re-assess the data after the aforementioned filtering step. Technical variability correction and normalization were performed using the RUVSeq/RUVg function (Version 1.24.0) and DESeq2 (Version 1.30.1) packages (RUVseq-DESeq2). First, the RUVg function was used to estimate the unwanted variation among samples based on the DE genes. DESeq2 and edgeR (Version 3.32.1) performed a first pass DE analysis and the intersected least significant genes (with adjusted p-value above 0.1) were used as "in silico empirical" negative controls. DESeq2 was then utilized with default parameters along with the RUV factors to perform the normalization of the raw filtered data. The normalization performance was assessed using the standard relative log expression (RLE) plot.

Machine Learning (ML) for Signature Development
The Recursive Feature Elimination (RFE) algorithm along with leave-one-out crossvalidation (LOOCV) and the random forest (RF) classifier were used to perform feature selection on the normalized data previously generated by RUVseq-DESeq2. The optimal number of features was automatically selected by keeping only those yielding best performance after cross-validation. These final features were to constitute the prognostic signature. To test the predictive power of the selected signature, extra trees classifier (ETC), k-nearest neighbor (KNN) and RF models were built using default parameters. The 5-fold cross validation (5-CV) algorithm was applied for this purpose. During this process, the dataset was randomly split into k-folds (k = 5), being 4/5 of the data used to train the model, while the remaining 1/5 was used to test its behavior. The classifier showing the highest area under the ROC curve (AUC ROC) value was selected as the final model. Signature scores for each sample were obtained from the final model. A confidence threshold of 0.5 was considered for the calculation of the positive and negative predictive values (PPV-NPV). Additional statistical indicators such as accuracy, sensitivity, specificity, and Cohen's κ were also calculated.

Univariant and Multivariant Analyses
Association between clinical characteristics and ML-generated signature was assessed with a univariate Cox proportional-hazard regression model. Odds ratios, with a Confidence Interval (CI) of 95% was calculated using the MedCal Statistical Software (MedCalc Software Ltd. Odds ratio calculator. https://www.medcalc.org/calc/odds_ratio.php. Accessed last on 5 September 2022). Multivariant analysis using logistic regression was performed using SAS software (v9.4, SAS Institute, Cary, NC, USA). Significance was set at p < 0.05 for all statistical tests.

Enrichment of Plasma EVs and Workflow Development for nCounter CircRNA Analysis
Two replicated 500 µL plasma samples from an early-stage NSCLC patient and a noncancer control were submitted to EV enrichment by ultracentrifugation (UC) or using the miRCURY Exosome Serum/Plasma kit. Enriched EVs were characterized by transmission electron microscopy (TEM) and nanoparticle flow cytometry via nanoFCM. TEM images revealed different clusters of diverse-sized EVs (30 to 300 nm, all within the reported EV size range [31][32][33]) in all samples regardless of the enrichment method used (Figure 1a). Samples extracted using the miRCURY kit showed a higher proportion of vesicles with an exosome-like size range (30-100 nm) by TEM, compared to the more heterogeneous UC samples (Figure 1a). NanoFCM analysis revealed a higher concentration of 40-100 nm particles in samples enriched using the miRCURY kit ( Figure 1b). In addition, nanoFCM indicated a higher number of particles/mL in the NSCLC patient sample when compared to the control, both in the UC and miRCURY preparations (Figure 1b).

Enrichment of Plasma EVs and Workflow Development for nCounter CircRNA Analysis
Two replicated 500 μL plasma samples from an early-stage NSCLC patient and a non-cancer control were submitted to EV enrichment by ultracentrifugation (UC) or using the miRCURY Exosome Serum/Plasma kit. Enriched EVs were characterized by transmission electron microscopy (TEM) and nanoparticle flow cytometry via nanoFCM. TEM images revealed different clusters of diverse-sized EVs (30 to 300 nm, all within the reported EV size range [31][32][33]) in all samples regardless of the enrichment method used ( Figure  1A). Samples extracted using the miRCURY kit showed a higher proportion of vesicles with an exosome-like size range (30-100 nm) by TEM, compared to the more heterogeneous UC samples ( Figure 1a). NanoFCM analysis revealed a higher concentration of 40-100 nm particles in samples enriched using the miRCURY kit ( Figure 1b). In addition, nanoFCM indicated a higher number of particles/mL in the NSCLC patient sample when compared to the control, both in the UC and miRCURY preparations (Figure 1b). Next, different volumes of plasma (500 μL, 1000 μL and 1500 μL) from a NSCLC patient were tested in duplicates to assess the effect of initial volumes on downstream circRNA analysis by nCounter using the custom panel we previously developed [30]. Since RNA concentration from EV enriched samples has been demonstrated to be insufficient for direct nCounter analysis [25], pre-amplification steps of 14 and 20 cycles were tested. The utmost total number of counts was achieved using an input of 500 μL both with 14 and 20 cycles (14,151 ± 1864 and 686,525 ± 345,655, respectively; Figure 2a). Consequently, 500 μL of plasma was also the volume allowing the detection of more circRNAs (n = 27.5 ± 4.95 and 33 ± 7.07 for 14 and 20 cycles, respectively; Figure 2b), even if only those with a score above 10 counts after background removal were selected (Table S1, Figure S1). Next, different volumes of plasma (500 µL, 1000 µL and 1500 µL) from a NSCLC patient were tested in duplicates to assess the effect of initial volumes on downstream circRNA analysis by nCounter using the custom panel we previously developed [30]. Since RNA concentration from EV enriched samples has been demonstrated to be insufficient for direct nCounter analysis [25], pre-amplification steps of 14 and 20 cycles were tested. The utmost total number of counts was achieved using an input of 500 µL both with 14 and 20 cycles (14,151 ± 1864 and 686,525 ± 345,655, respectively; Figure 2a). Consequently, 500 µL of plasma was also the volume allowing the detection of more circRNAs (n = 27.5 ± 4.95 and 33 ± 7.07 for 14 and 20 cycles, respectively; Figure 2b), even if only those with a score above 10 counts after background removal were selected (Table S1, Figure S1). Different amplification cycles (10, 12 and 14) were subsequently tested in a 500 µL plasma sample. The highest number of raw counts was obtained with 14 cycles (Figure 3a). Regarding the number of circRNAs, 10 and 12 cycles yielded similar results (n = 51.5 ± 9.19 and 52.5 ± 7.78 respectively). More circRNAs were detected at 14 cycles (n = 59 ± 16.97) with a high variability between replicates (Figure 3b, Table S2). In view of these results, we selected for EV-circRNA analysis a protocol that included 500 µL of plasma input, EV enrichment with the miRCURY kit, extravesicular RNA elimination with RNase A, EV lysis and RNA extraction with TRI reagent, retrotranscription and nCounter analysis with a 10-cycle preamplification step ( Figure 4). Different amplification cycles (10, 12 and 14) were subsequently tested in a 50 plasma sample. The highest number of raw counts was obtained with 14 cycles (F 3a). Regarding the number of circRNAs, 10 and 12 cycles yielded similar results (n = ± 9.19 and 52.5 ± 7.78 respectively). More circRNAs were detected at 14 cycles (n = 16.97) with a high variability between replicates (Figure 3b, Table S2). In view of results, we selected for EV-circRNA analysis a protocol that included 500 μL of pl input, EV enrichment with the miRCURY kit, extravesicular RNA elimination with R A, EV lysis and RNA extraction with TRI reagent, retrotranscription and nCounter a sis with a 10-cycle preamplification step ( Figure 4).   Different amplification cycles (10, 12 and 14) were subsequently tested in a 50 plasma sample. The highest number of raw counts was obtained with 14 cycles (Fi 3a). Regarding the number of circRNAs, 10 and 12 cycles yielded similar results (n = ± 9.19 and 52.5 ± 7.78 respectively). More circRNAs were detected at 14 cycles (n = 16.97) with a high variability between replicates (Figure 3b, Table S2). In view of t results, we selected for EV-circRNA analysis a protocol that included 500 μL of pla input, EV enrichment with the miRCURY kit, extravesicular RNA elimination with R A, EV lysis and RNA extraction with TRI reagent, retrotranscription and nCounter a sis with a 10-cycle preamplification step ( Figure 4). The repeatability of the protocol was first tested by submitting to nCounter duplicates of a preamplified plasma sample. A strong correlation between the normalized counts was found between the duplicates, represented by a Pearson's r of 0.98, p < 0.0001 (Figure 3c). When the same plasma sample was re-purified and re-analyzed, nCounter results also showed a strong correlation with the initial duplicates (Pearson's r = 0.90-0.91; p < 0.0001) (Figure 3d). results obtained in an independent nCounter assay of the same sample. Pearson's correlation coefficient is indicated.

Figure 4.
Final workflow established for the study of circRNAs from plasma extracellular vesicles (EVs) using the nCounter technology. A volume of 500 μL of plasma was used in the miRCURY kit to precipitate EVs. Rnase A was used to remove any non-vesicular RNA that could be present in the sample before proceeding with manual RNA extraction with TRI reagent. RNA samples were treated with DNase to eliminate any trace of genomic DNA, followed by retro-transcription and a pre-amplification step of 10 cycles. Finally, samples were hybridized overnight before nCounter processing.
The repeatability of the protocol was first tested by submitting to nCounter duplicates of a preamplified plasma sample. A strong correlation between the normalized counts was found between the duplicates, represented by a Pearson's r of 0.98, p < 0.0001 (Figure 3c). When the same plasma sample was re-purified and re-analyzed, nCounter results also showed a strong correlation with the initial duplicates (Pearson's r = 0.90-0.91; p < 0.0001) (Figure 3d).

CircRNA Expression in Plasma EV Samples
Plasma from 66 individuals, 36 early-stage NSCLC patients and 30 non-cancer donors, were analyzed using the protocol previously described in Section 3.1 (Figure 4). An average of 40 ± 14 EV-circRNAs per sample were detected in controls vs. 47 ± 9 in the NSCLC cohort. This difference was found to not be significant by the Mann-Whitney U test (Figure 5a). Among the 78 circRNAs included in the panel, 70 were detected in at least one NSCLC sample and 68 in at least one non-cancer control. A total of 66 EV-circRNAs were shared by both cohorts, while four EV-circRNAs were exclusive to NSCLC patients and two to non-cancer donors (Figure 5b, Table S3). Final workflow established for the study of circRNAs from plasma extracellular vesicles (EVs) using the nCounter technology. A volume of 500 µL of plasma was used in the miRCURY kit to precipitate EVs. Rnase A was used to remove any non-vesicular RNA that could be present in the sample before proceeding with manual RNA extraction with TRI reagent. RNA samples were treated with DNase to eliminate any trace of genomic DNA, followed by retro-transcription and a pre-amplification step of 10 cycles. Finally, samples were hybridized overnight before nCounter processing.

CircRNA Expression in Plasma EV Samples
Plasma from 66 individuals, 36 early-stage NSCLC patients and 30 non-cancer donors, were analyzed using the protocol previously described in Section 3.1 (Figure 4). An average of 40 ± 14 EV-circRNAs per sample were detected in controls vs. 47 ± 9 in the NSCLC cohort. This difference was found to not be significant by the Mann-Whitney U test (Figure 5a). Among the 78 circRNAs included in the panel, 70 were detected in at least one NSCLC sample and 68 in at least one non-cancer control. A total of 66 EV-circRNAs were shared by both cohorts, while four EV-circRNAs were exclusive to NSCLC patients and two to non-cancer donors (Figure 5b, Table S3).
DE analysis revealed eight circRNAs significantly upregulated in EV-enriched samples from NSCLC patients vs. controls; namely circular Erythrocyte Membrane protein Band 4.1 Like 2 (circEPB41L2), circular Core 1 Synthase, Glycoprotein-N-Acetylgalactosamine -3-Beta-Galactosyltransferase 1 (circC1GALT1), circular Zinc Finger RNA Binding Protein (circZFR), circular Ubiquitin Specific Peptidase 3 (circUSP3), circular Zinc Finger CCHC Domain-Containing Protein 6 (circZCCHC6), circular Cyclin B1 (circCCNB1), circular DENN Domain Containing 1B (circDENN1B) and circular Homeodomain Interacting Protein Kinase 3 (circHIPK3) (Figure 5c). Of them, only circZFR and circC1GALT1 showed <10 counts in each cohort (Table S4). To validate these results, we tested the expression circZCCHC6 and circHIPK3 by RT-qPCR. Divergent primers and probes spanning the junction sites were designed for the specific amplification of these two circular transcripts (Table 2) in samples previously assessed by nCounter with sufficient remaining material. Gel electrophoresis of the RT-qPCR products revealed bands matching the size of expected amplicons and subsequent Sanger sequencing confirmed the expected junction sites in the circRNAs (Figure 6a,b). Among the six samples analyzed by RT-qPCR, four and six samples produced satisfactory results for circZCCHC6 and circHIPK3 respectively. A trend between nCounter counts and RT-qPCR ∆∆Cts was observed for both circRNAs (Figure 6c-d), with circZCCHC6 showing a strong correlation (Pearson's r = 0.99; p = 0.0076) (Figure 6c).  (Figure 5c). Of them, only circZFR and circC1GALT1 showed <10 counts in each cohort (Table S4). To validate these results, we tested the expression circZCCHC6 and circHIPK3 by RT-qPCR. Divergent primers and probes spanning the junction sites were designed for the specific amplification of these two circular transcripts (Table 2) in samples previously assessed by nCounter with sufficient remaining material. Gel electrophoresis of the RT-qPCR products revealed bands matching the size of expected amplicons and subsequent Sanger sequencing confirmed the expected junction sites in the circRNAs (Figure 6a,b). Among the six samples analyzed by RT-qPCR, four and six samples produced satisfactory results for circZCCHC6 and circHIPK3 respectively. A trend between nCounter counts and RT-qPCR ∆∆Cts was observed for both circRNAs (Figure 6c-d), with circZCCHC6 showing a strong correlation (Pearson's r = 0.99; p = 0.0076) (Figure 6c).

Development of a CircRNA-Signature Associated with Early-Stage NSCLC
Interquartile range analysis classified 9/66 samples as potential outliers (Figure 7 and PCA revealed that they deviated from the main cluster of observations (Figure S Consequently, these nine samples were excluded from further analysis. Then, different R packages including DESeq2, edgeR, RUVSeq and their combinatio were tested in order to select the normalization approach that best adapts to our data. A a result, RLE plots indicated a superior performance of RUVSeq-DESeq2 versus the oth combinations (Figure 7b, Figure S3). Consequently, RUVSeq-DESeq2 normalization w selected for the rest of the study. Pearson's correlation coefficient is indicated. ns, not significant. ** means two grades of significant (p < 0.01).

Development of a CircRNA-Signature Associated with Early-Stage NSCLC
Interquartile range analysis classified 9/66 samples as potential outliers (Figure 7a) and PCA revealed that they deviated from the main cluster of observations ( Figure S2). Consequently, these nine samples were excluded from further analysis. and PCA revealed that they deviated from the main cluster of observations ( Figure S2). Consequently, these nine samples were excluded from further analysis.
Then, different R packages including DESeq2, edgeR, RUVSeq and their combination were tested in order to select the normalization approach that best adapts to our data. As a result, RLE plots indicated a superior performance of RUVSeq-DESeq2 versus the other combinations (Figure 7b, Figure S3). Consequently, RUVSeq-DESeq2 normalization was selected for the rest of the study. Then, different R packages including DESeq2, edgeR, RUVSeq and their combination were tested in order to select the normalization approach that best adapts to our data. As a result, RLE plots indicated a superior performance of RUVSeq-DESeq2 versus the other combinations (Figures 7b and S3). Consequently, RUVSeq-DESeq2 normalization was selected for the rest of the study.
Next, ML was performed using RFE along with RF classifier and LOOCV, as described in Methods, in order to obtain a signature associated with NSCLC. As a result, ETC was selected as the best model, with a signature of 10 circRNAs (including circular Family With Sequence Similarity 13 Member B -circFAM13B, circular ADAM Metallopeptidase Domain 22 -circADAM22, circular UBX Domain Protein 7 -circUBXN7, circZCCHC6, circular Integrin Subunit Alpha X -circITGAX, circular Retinol Dehydrogenase 11 -circRDH11, circEPB41L2, circular CDC Like Kinase 1 -circCLK1, circular Phenylalanyl-tRNA Synthetase Subunit Alpha -circFARSA, and circular Phosphoinositide-3-Kinase Regulatory Subunit 1 -circPIK3R1) showing an AUC ROC of 0.86 (Figure 8a). Signature scores were found to be statistically different when comparing early-stage NSCLC and non-cancer controls (Mann-Whitney U test, p < 0001; Figure 8b). The sensitivity and specificity of the ETC signature were of 90% (CI = 73.47-97.89%) and 81% (CI = 61.92%%-93.70%) respectively, outperforming the RF and KNN classifiers ( Table 3). The accuracy achieved with ETC was 86%, resulting in 49 out of the 66 cases being correctly classified (Figure 8c). Next, ML was performed using RFE along with RF classifier and LOOCV, as described in Methods, in order to obtain a signature associated with NSCLC. As a result, ETC was selected as the best model, with a signature of 10 circRNAs (including circular Family With Sequence Similarity 13 Member B -circFAM13B, circular ADAM Metallopeptidase Domain 22 -circADAM22, circular UBX Domain Protein 7 -circUBXN7, circZCCHC6, circular Integrin Subunit Alpha X -circITGAX, circular Retinol Dehydrogenase 11 -circRDH11, circEPB41L2, circular CDC Like Kinase 1 -circCLK1, circular Phenylalanyl-tRNA Synthetase Subunit Alpha -circFARSA, and circular Phosphoinositide-3-Kinase Regulatory Subunit 1 -circPIK3R1) showing an AUC ROC of 0.86 (Figure 8a). Signature scores were found to be statistically different when comparing early-stage NSCLC and non-cancer controls (Mann-Whitney U test, p < 0001; Figure 8b). The sensitivity and specificity of the ETC signature were of 90% (CI = 73.47-97.89%) and 81% (CI = 61.92%%-93.70%) respectively, outperforming the RF and KNN classifiers ( Table 3). The accuracy achieved with ETC was 86%, resulting in 49 out of the 66 cases being correctly classified (Figure 8c).     Then, a univariate analysis was performed to explore the association of the ETC circRNA signature with gender, age, smoking, cancer status and tumor stage (Figure 9a). A statistically significant correlation was found for the signature with age (odds ratio = 24.91, p < 0.0001), and particularly cancer status (odds ratio of 39.6, p < 0.0001). Then, a univariate analysis was performed to explore the association of the ETC circRNA signature with gender, age, smoking, cancer status and tumor stage (Figure 9a). A statistically significant correlation was found for the signature with age (odds ratio = 24.91, p < 0.0001), and particularly cancer status (odds ratio of 39.6, p < 0.0001). Univariate analysis exploring associations between presented 10-circRNA signature and patient characteristics. Forest plot represents the odds ratios with a 95% Wald confidence limit. (b) Multivariate analysis exploring associations between presented 10-circRNA signature with age and cancer status. Forest plot represents the odds ratios with a 95% Wald confidence limit.
To further evaluate the implication of age and cancer status on the ML-developed signature, we first performed an exploratory study assessing the interconnexion of both variables by performing a chi-square test. As a result, a strong association between age and cancer status was found, with a p < 0.0001 (Table 4).  Figure 9. Association between clinical characteristics and ML-generated 10-circRNA signature.
(a) Univariate analysis exploring associations between presented 10-circRNA signature and patient characteristics. Forest plot represents the odds ratios with a 95% Wald confidence limit. (b) Multivariate analysis exploring associations between presented 10-circRNA signature with age and cancer status. Forest plot represents the odds ratios with a 95% Wald confidence limit.
To further evaluate the implication of age and cancer status on the ML-developed signature, we first performed an exploratory study assessing the interconnexion of both variables by performing a chi-square test. As a result, a strong association between age and cancer status was found, with a p < 0.0001 (Table 4). Next, a multivariate analysis was carried out. Results not only demonstrated dependency of these two variables, but also showed a statistically significant correlation between the signature and cancer status (p = 0.0036, Table 5, Figure 9b). No correlation was found between age and presented signature, in this regard (p = 0.0784, Table 5, Figure 9b)

Discussion
EVs are released by most cell types and play an important role in cancer cell communication. Many publications have demonstrated the role of EVs as key modulators in cancer progression [34,35], which requires intercellular communication mediated by the horizontal transferring of biological information via the EV cargo of proteins, DNA and coding/non-coding RNA, including circRNAs. Therefore, analysis of EVs can provide a snapshot of the tumor and serve as a valuable tool to discover liquid biopsy biomarkers. CircRNAs are highly enriched in EVs [7] and show a relatively high stability compared to other forms of RNA [8]. Several studies have highlighted their potential as liquid biopsy biomarkers [12] but current limitations in circRNA quantification methods are limiting their implementation in the clinical setting. Consequently, new, and robust protocols for circRNA analysis are needed. The nCounter platform has gained popularity among translational investigators for transcriptional research not only for solid biopsies but also for EV samples. However, studies focusing on circRNA analysis by nCounter are limited and mostly restricted to tissue specimens [30,[36][37][38][39][40]. In particular, to the best of our knowledge, nCounter has never been applied to the analysis of circRNA in liquid biopsies of lung cancer patients. Consequently, we developed a comprehensive protocol for nCounter-based EV-circRNA expression analysis, from EV enrichment to differential expression and subsequent ML analysis. Key points in this protocol were the initial volume of plasma, the EV purification method and the number of cycles for the pre-amplification step prior to nCounter testing.
UC is currently still the method of choice for EV isolation in the research setting and we have previously demonstrated its utility for the downstream analysis of cell line-derived EV circRNAs [41]. However, ultracentrifuges are not usually available in clinical laboratories, while precipitation-based kits such as the miRCURY Exosome Serum/Plasma Kit represent an easily implementable option with a simple, on-the-bench protocol and short hands-on time. In our study, we compared the two methodologies using plasma samples from an NSCLC patient and a healthy donor. The presence of EV-like particles in all preparations was confirmed by TEM and nanoFCM. Interestingly, a more uniform EV population with an exosomal size-range was found by TEM in both cancer and control samples processed with the miRCURY kit, along with a higher concentration of 40-200 nm particles observed by nanoFCM. A possible explanation to this event could be a size-selective enrichment attributed to this type of precipitation-based preparations, as previously reported in serum samples [42]. This finding prompted us to select miRCURY for further assay development. In addition, a higher number of EV-like particles was observed in the cancer sample compared to the control, regardless the isolation method used. Although a higher number of samples should be analyzed for further confirmation, preliminary results are in agreement with previous reports indicating a higher abundance of EVs in cancer patients [43].
Finally, adding to the evidence provided by TEM and nanoFCM, a treatment with RNase A was applied to EV-enriched samples prior to EV lysis and incorporated into our protocol to eliminate any extravesicular RNA. The resulting and subsequently analyzed RNA proved to be protected from the digestion of cited ribonuclease, indicating a vesicular origin of the transcripts.
In a previous study, a volume of 500 µL of plasma was found to be sufficient for the analysis of EV-derived mRNA by nCounter [25]. Here, we compared several plasma volumes and found that 500 µL outperformed 1000 and 1500 µL for circRNAs analysis, both in terms of the number of circRNA molecules detected and total counts. A possible explanation for these results may rely on saturation issues with the circRNAs/reporterprobe complexes when a higher plasma input is applied, which impede a correct molecule identification by the digital analyzer. Regarding the number of cycles for the preamplification step, we investigated a range from 10 to 20 in an effort to reduce amplification-related background noise to a minimum, and we found that a 10-cycle pre-amplification step yielded adequate results.
Then, we applied our protocol to assess circular transcripts in early-stage NSCLC samples (n = 36) and to non-tumor controls (n = 30). We found that eight circRNAs were found differentially expressed between the two cohorts. Among them, circEPB41L2, circZCCHC6 and circHIPK3 showed the highest number of counts in early-stage cancer patients (Table S4). Interestingly, we previously found circEPB41L2 differentially expressed in FFPE tissues of early-stage lung cancer patients [30] and found that it displayed four binding sites with hsa-miR-942, which has been described as an activator of the Wnt/βcatenin signaling pathway [44,45] in colorectal and esophageal cancers. Our results warrant further investigation in the biology of this circRNA to characterize its role in lung cancer. Regarding circHIPK3, it has been extensively investigated in lung cancer and found to exert a dual activity over miR-149 [46] and mir-124 [47,48], inducing cell proliferation and inhibiting apoptosis. Our results are in agreement with these findings, since circHIPK3 was upregulated in EV samples from early-stage NSCLC patients. Finally, circZCCHC6 has been recently described to regulate lysophosphatidylcholine acyltransferase 1 (LPCAT1) levels via miR-433-3p [49] in lung cancer. We used circinteractome (www.circinteractome. nia.nah.gov) to investigate possible additional miRNA binding sites, finding matches for 7 additional transcripts (miR-579-3p, miR-623, miR-1197, miR-1304 miR-548l, miR-605 and miR-935). All these miRNAs have been reported to be downregulated in lung tumors and have been related with poor prognosis, tumor growth and metastases [50][51][52][53][54][55][56].
ML and other computational methods based on artificial intelligence (AI) have emerged in the last decade for multileveled analysis of different datasets. In particular, ML enables computers to make predictions by finding patterns within analyzed data [57], offering a novel approach for the development of predictive signatures that often reach a higher predictive value than biomarkers found by differential expression analyses. Consequently, we decided to use ML in our study. To this end, we developed a pipeline with several steps. First, using IQR and PCA plots, we identified nine outliers, which were excluded from downstream analyses. An RLE plot from each different normalization procedure was generated, showing a higher performance of the RUVSeq-DESeq2 function when compared to the other combinations (Figures 7b and S3A-C). Finally, we used RFE along with LOOCV and the RF classifier as the feature selection algorithm to automatically determine the most significant circRNAs which are best suited for the construction of the prognostic signature. The final 10-circRNA signature included two of the eight circular transcripts previously found by differential expression analysis and eight additional transcripts, including circ-FARSA. Interestingly, circFARSA has been described as a plasma biomarker of NSCLC [58], promoting tumor invasion and metastases via the PTEN/PI3K/AKT axis [59].
Since we did not sort EV populations, we could not verify the vesicular cell or tissue origin of the circRNAs included in the ML signature nor the origin of the circular transcripts, either cancer cells or tumor microenvironment. Also, we did not investigate the biological role of the circRNAs, being out of the scope of our work.
In addition, while multivariate analysis could demonstrate that classification accuracy of presented signature is based on cancer status and no other clinicopathological characteristics (Figure 9), the lack of > 60-year-old individuals was a limitation in the study. The inclusion of equivalent cohorts in terms of age should be taking into consideration for the design of forthcoming validation studies.
Finally, all 36 cancer samples included in this study were lung adenocarcinomas, with the exception of 4 squamous carcinoma and 5 NSCLC samples with unknown histological subtype. A uniform inclusion of the different lung cancer histologies is suggested for future validation studies to assess the predictive power of the signature for other subtypes of NSCLC.

Conclusions
We have demonstrated the feasibility of using nCounter for the multiplex study of plasma-EV circRNAs in liquid biopsies of lung cancer patients, including differential expression analysis and development of predictive ML signatures. Further studies of larger cohorts are warranted in order to determine the clinical applicability of such signatures.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/pharmaceutics14102034/s1, Table S1: CircRNAs detected in the different plasma volumes of the same patient with 14 and 20 cycles of pre-amplification; Table S2: CircRNAs detected in the plasma of the same individual subjected to 10, 12 and 14 pre-amplification cycles; Table S3: CircRNAs identified in early-state NSCLC and non-cancer control cohorts; Table S4: Normalized counts of differentially expressed circRNAs found in the early-stage NSCLC cohort; Figure S1: Plasma input testing; Figure S2: Principal Component Analysis of the transformed raw data; Figure S3: Assessment of the different normalization processes by RLE plot analysis; Figure S4: Confusion matrices summarizing the performance of the different classification algorithms.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author (carlospedraz@icloud.com) upon reasonable request.