A Protocol for Low-Input RNA-Sequencing of Patients with Febrile Neutropenia Captures Relevant Immunological Information

Improved methods are needed for diagnosing infectious diseases in children with cancer. Most children have fever for other reasons than bacterial infection and are exposed to unnecessary antibiotics and hospital admission. Recent research has shown that host whole blood RNA transcriptomic signatures can distinguish bacterial infection from other causes of fever. Implementation of this method in clinics could change the diagnostic approach for children with cancer and suspected infection. However, extracting sufficient mRNA to perform transcriptome profiling by standard methods is challenging due to the patient’s low white blood cell (WBC) counts. In this prospective cohort study, we succeeded in sequencing 95% of samples from children with leukaemia and suspected infection by using a low-input protocol. This could be a solution to the issue of obtaining sufficient RNA for sequencing from patients with low white blood cell counts. Further studies are required to determine whether the captured immune gene signatures are clinically valid and thus useful to clinicians as a diagnostic tool for patients with cancer and suspected infection.


Introduction
Improved methods are needed for diagnosing infectious diseases in children with cancer. Due to the increased risk of possibly fatal infections in this population, they are hospitalised and given antibiotics when febrile [1,2]. However, the majority have fever for other reasons than bacterial infection and are exposed to unnecessary antibiotics and hospital admissions due to a lack of quick and accurate diagnostic tests [3]. This can lead to side effects and antibiotic resistance (in the society and the individual patient) [4][5][6].
Recent research has shown that host whole blood RNA transcriptomic signatures can distinguish bacterial from viral infection and inflammatory diseases [7][8][9]. This method is approaching clinical implementation and could change the diagnostic approach for children with suspected infection [10]. However, extracting sufficient mRNA from these patients to perform transcriptome profiling by standard methods is challenging due to their low white blood cell (WBC) counts. Only two studies have investigated transcriptome profiling from whole blood collected in RNA-stabilising blood tubes (PAXgene ® ) in patients with febrile neutropenia [11,12]. Wahlund et al. (2020) failed to extract sufficient RNA for sequencing in 20 out of 63 (32%) samples in children with febrile neutropenia, while Kelly et al. had to use a targeted gene panel (EdgeSeq) in 29 samples due to insufficient RNA for standard RNA-sequencing in a study of febrile neutropenia in adults. In a recent study, 2 of 12 Haeusler et al., obtained immune gene signatures in 73 of 80 (91%) patients with febrile neutropenia using EDTA tubes [13]. The disadvantage of using EDTA tubes is the need to freeze the samples within 2 h.
Advances in single-cell mRNA-sequencing (low input protocols) have pushed the boundaries for sensitive mRNA capture and quantification and may solve the challenges in obtaining sufficient RNA from patients with low WBC [14][15][16] Flow-based protocols such as 10x Genomics can capture many single cells at the expense of gene capture [17]. Here, we hypothesise that full-transcript plate-based low-input protocol can be both sensitive enough in terms of gene capture and practical in terms of handling in the laboratory to be used for patients with low WBC counts. We previously found that the SMART-Seq ® HT kit protocol performed well for single-cell sequencing, with high robustness and ease of implementation [18].
In 2019, we initiated a prospective nationwide cohort study investigating unique transcriptional signatures in whole blood from children with leukaemia and suspected infection. As in previous studies, we experienced challenges detecting RNA in samples with very low WBC counts. As a result, RNA concentrations and RNA integrity numbers (RIN) were low and standard RNA-sequencing was not possible.
In this study, we aimed to compare the SMART-Seq ® HT kit protocol (low-input protocol) with the TruSeq Stranded RNA Library Preparation Kit (standard RNA-sequencing protocol) to investigate if the low-input protocol could recapture the same transcriptional information. We also aimed to confirm the clinical utility of the approach by applying the low-input protocol to whole blood samples from children with leukaemia and suspected infection.
We sequenced the 15 samples with both methods to test whether the low-input protocol could recapture similar transcriptional information as the standard clinical RNA-seq protocol. Furthermore, we applied the low-input protocol to 88 samples to investigate the general clinical applicability.  Figure 1. Overview of the study population sequenced either by the low-input protocol or by both low-input and standard protocols as described in the methods. A total of 88 samples from 22 patients with leukaemia and suspected infection was sequenced by the low-input protocol (Takara SMART-seq HT) (96% succeeded) and 15 of these were also processed by the standard protocol (Truseq). Samples stated by the laboratory as WBC < 0.1 × 10 9 /L and absolute neutrophil count (ANC) < 0.1 × 10 9 /L were defined as WBC = 0 and ANC = 0, respectively. * Median (IQR) × 10 9 /L, ** median (range) × 10 9 /L. We sequenced the 15 samples with both methods to test whether the low-input protocol could recapture similar transcriptional information as the standard clinical RNA-seq protocol. Furthermore, we applied the low-input protocol to 88 samples to investigate the general clinical applicability.

Comparing the Low-Input RNA-Seq Protocol to the Standard RNA-Seq Protocol
The 15 samples sequenced by both low-input and standard protocols had a median WBC count of 2.8 × 10 9 /L (IQR 1.9-4.7) and an ANC of 2.1 × 10 9 /L (IQR 31.0-3.7). The median RIN was 7.4 (IQR 7.0-8.3), and one had undetectable RIN) ( Figure 1). The samples were selected to have enough cells and material to perform both protocols suitably. A comparison of methodological steps are in table 1.  *median (IQR) x 10 9 /l ** median (Range) x 10 9 /l Figure 1. Overview of the study population sequenced either by the low-input protocol or by both low-input and standard protocols as described in the methods. A total of 88 samples from 22 patients with leukaemia and suspected infection was sequenced by the low-input protocol (Takara SMART-seq HT) (96% succeeded) and 15 of these were also processed by the standard protocol (Truseq). Samples stated by the laboratory as WBC < 0.1 × 10 9 /L and absolute neutrophil count (ANC) < 0.1 × 10 9 /L were defined as WBC = 0 and ANC = 0, respectively. * Median (IQR) × 10 9 /L, ** median (range) × 10 9 /L.

Comparing the Low-Input RNA-Seq Protocol to the Standard RNA-Seq Protocol
The 15 samples sequenced by both low-input and standard protocols had a median WBC count of 2.8 × 10 9 /L (IQR 1.9-4.7) and an ANC of 2.1 × 10 9 /L (IQR 31.0-3.7). The median RIN was 7.4 (IQR 7.0-8.3), and one had undetectable RIN) ( Figure 1). The samples were selected to have enough cells and material to perform both protocols suitably. A comparison of methodological steps are in Table 1.

Library Sizes and Gene Expression
As expected, the samples sequenced by the standard protocol had significantly larger library sizes than the low-input protocol samples (p < 0.001) ( Figure 2A) and a correspondingly higher number of total detected genes (p < 0.001) ( Figure 2B). The number of captured immune signature genes was also significantly higher in the standard protocol samples (p < 0.001) ( Figure 2C). Even though the library size for the samples processed by the standard protocol on average was larger by a scale factor of 10.6×, the total number of captured genes differed only by a factor of 1.9×, and the number of immune signature genes only by a factor 1.2×. Hence, the loss of genes using the low-input protocol was smaller than the loss of sequencing reads.
-Clean-up and validation of amplified cDNA 2 Final library preparation and sequencing 2 Final library preparation and sequencing

Library Sizes and Gene Expression
As expected, the samples sequenced by the standard protocol had significantly larger library sizes than the low-input protocol samples (p < 0.001) ( Figure 2A) and a correspondingly higher number of total detected genes (p < 0.001) ( Figure 2B). The number of captured immune signature genes was also significantly higher in the standard protocol samples (p < 0.001) ( Figure 2C). Even though the library size for the samples processed by the standard protocol on average was larger by a scale factor of 10.6×, the total number of captured genes differed only by a factor of 1.9×, and the number of immune signature genes only by a factor 1.2×. Hence, the loss of genes using the low-input protocol was smaller than the loss of sequencing reads. The highest inter-protocol correlation was seen between samples originating from the same patient, based on the Spearman correlation coefficient ( Figure 3A-C), indicating that both methods recovered the same transcriptional signal. The correlation was most The highest inter-protocol correlation was seen between samples originating from the same patient, based on the Spearman correlation coefficient ( Figure 3A-C), indicating that both methods recovered the same transcriptional signal. The correlation was most prominent in the expression of the union of the 15 most differentially expressed genes and the 38-gene signature ( Figure 3A,C). As expected, the correlation of immune-related signatures showed greater inter-sample correlation because all samples were from children with suspected infection ( Figure 3B).
Between the two protocols, 75% of transcripts were overlapping ( Figure 4A), and no difference was found in the proportion of transcript types ( Figure 4B). prominent in the expression of the union of the 15 most differentially expressed genes and the 38-gene signature ( Figure 3A,C). As expected, the correlation of immune-related signatures showed greater inter-sample correlation because all samples were from children with suspected infection ( Figure 3B).
Between the two protocols, 75% of transcripts were overlapping ( Figure 4A), and no difference was found in the proportion of transcript types ( Figure 4B).

Application of the Low-Input Protocol to Samples with Low White Blood Cell Counts
Eighty-four of 88 (95%) samples sequenced with the low-input protocol succeeded. The four samples that failed had an ANC of 0.0. Three had RIN < 5 and RNA annotated as "too low" on HS Qubit. The last sample had a RIN of 9.1 with an RNA concentration of 6.62 ng/µL measured on Nanodrop. Thus, 25 of the 28 (89%) samples that initially had insufficient RNA for sequencing succeeded with the low-input protocol. We found a wide variation of RIN among the entire study population, but no significant difference was found in the WBC count between samples with RIN ≥ 5 and RIN < 5 ( Figure 5A,B).
Low-input protocol Standard RNA-seq protocol prominent in the expression of the union of the 15 most differentially expressed genes and the 38-gene signature ( Figure 3A,C). As expected, the correlation of immune-related signatures showed greater inter-sample correlation because all samples were from children with suspected infection ( Figure 3B).
Between the two protocols, 75% of transcripts were overlapping ( Figure 4A), and no difference was found in the proportion of transcript types ( Figure 4B).

Application of the Low-Input Protocol to Samples with Low White Blood Cell Counts
Eighty-four of 88 (95%) samples sequenced with the low-input protocol succeeded. The four samples that failed had an ANC of 0.0. Three had RIN < 5 and RNA annotated as "too low" on HS Qubit. The last sample had a RIN of 9.1 with an RNA concentration of 6.62 ng/µL measured on Nanodrop. Thus, 25 of the 28 (89%) samples that initially had insufficient RNA for sequencing succeeded with the low-input protocol. We found a wide variation of RIN among the entire study population, but no significant difference was found in the WBC count between samples with RIN ≥ 5 and RIN < 5 ( Figure 5A,B).
Low-input protocol Standard RNA-seq protocol

Application of the Low-Input Protocol to Samples with Low White Blood Cell Counts
Eighty-four of 88 (95%) samples sequenced with the low-input protocol succeeded. The four samples that failed had an ANC of 0.0. Three had RIN < 5 and RNA annotated as "too low" on HS Qubit. The last sample had a RIN of 9.1 with an RNA concentration of 6.62 ng/µL measured on Nanodrop. Thus, 25 of the 28 (89%) samples that initially had insufficient RNA for sequencing succeeded with the low-input protocol. We found a wide variation of RIN among the entire study population, but no significant difference was found in the WBC count between samples with RIN ≥ 5 and RIN < 5 ( Figure 5A,B).

Library Sizes and Gene Expression
We found a significant difference in library size between samples with RIN < 5 compared and samples with RIN ≥ 5 ( Figure 5C,D). On average, there were 254 immune signature genes detected in all 84 samples, although samples with RIN ≥ 5 (sd = 0.32) had a higher mean expression (p < 0.001). A larger variation was observed in the number of immune genes in samples with RIN < 5 (sd = 0.90), but several samples showed comparable numbers of immune signature genes to those of RIN ≥ 5 ( Figure 5E,F). The lower proportion of detected immune genes correlated with the lower number of total genes detected in samples with RIN < 5 ( Figure 5G,H).

Library Sizes and Gene Expression
We found a significant difference in library size between samples with RIN < 5 compared and samples with RIN ≥ 5 ( Figure 5C,D). On average, there were 254 immune signature genes detected in all 84 samples, although samples with RIN ≥ 5 (sd = 0.32) had a higher mean expression (p < 0.001). A larger variation was observed in the number of immune genes in samples with RIN < 5 (sd = 0.90), but several samples showed comparable numbers of immune signature genes to those of RIN ≥ 5 ( Figure 5E,F). The lower proportion of detected immune genes correlated with the lower number of total genes detected in samples with RIN < 5 ( Figure 5G,H).
A higher number of total genes (p < 0.001) was captured in samples with RIN ≥ 5 (sd = 3.80) compared to samples with RIN < 5 (sd = 6.65) ( Figure 6G,H). Yet, 70.6% of transcripts overlapped between the two groups ( Figure 7A). Furthermore, samples with RIN < 5 had a lower number of expressed genes and a higher number of unmapped genes than those with RIN ≥ 5 ( Figure 6B).

Library Sizes and Gene Expression
We found a significant difference in library size between samples with RIN < 5 compared and samples with RIN ≥ 5 ( Figure 5C,D). On average, there were 254 immune signature genes detected in all 84 samples, although samples with RIN ≥ 5 (sd = 0.32) had a higher mean expression (p < 0.001). A larger variation was observed in the number of immune genes in samples with RIN < 5 (sd = 0.90), but several samples showed comparable numbers of immune signature genes to those of RIN ≥ 5 ( Figure 5E,F). The lower proportion of detected immune genes correlated with the lower number of total genes detected in samples with RIN < 5 ( Figure 5G,H).
A higher number of total genes (p < 0.001) was captured in samples with RIN ≥ 5 (sd = 3.80) compared to samples with RIN < 5 (sd = 6.65) ( Figure 6G,H). Yet, 70.6% of transcripts overlapped between the two groups ( Figure 7A). Furthermore, samples with RIN < 5 had a lower number of expressed genes and a higher number of unmapped genes than those with RIN ≥ 5 ( Figure 6B).

Discussion
In this prospective cohort study, we succeeded in sequencing 95% of samples from children with leukaemia and suspected infection. Equivalent to the findings by Wahlund et al. (2020) [11], we found insufficient RNA for standard RNA-sequencing in one-third of the samples. Despite this, we succeeded in sequencing 89% of these samples using a lowinput protocol. This is to our knowledge the first study to successfully perform non-targeted sequencing on such a large number of blood samples collected in PAXgene tubes from patients with low WBC counts. The recent advancement in RNA-sequencing techniques has lowered the threshold for RNA detection and given rise to numerous commercial kits. However, obtaining adequate RNA quantity and quality to undertake standard RNAsequencing in samples with low white blood cell counts remained a challenge. In this study, we found a possible solution to that.

Comparing Low-Input to Standard RNA-Sequencing
We found that the low-input protocol was able to detect a similar level of immunerelated genes to the standard protocol. We also saw a correlation in immune transcript signatures between paired samples analysed with both protocols. This indicates a comparable ability between the two protocols to capture the same immune genes in the same patient sample. Furthermore, the percentage of total captured transcripts overlapped by 75% and no difference in the proportion of transcript types was observed. The results indicate that the low-input protocol performs comparably to the standard protocol requiring a 1000-fold larger RNA input. Furthermore, the low-input protocol can obtain useful information from samples with very low starting material.

The Impact of RIN and WBC Counts on the Performance of the Low-Input Protocol
We detected immune signature genes in all 84 samples, including samples from patients with very low or undetectable RIN and WBC counts. This suggests that the low-input protocol is a feasible solution for obtaining useful biological information for diagnostic purposes from children with cancer and suspected infection.
We found that high-quality RNA, quantified by RIN, resulted in high coverage both in total and immune-specific genes, whereas low-quality RNA more often showed low coverage. However, many samples with low WBC count yielded RIN ≥ 5, suggesting that RNA quality sufficient for sequencing can be retrieved from patients with low WBC count.
In samples with RIN ≥ 5, the number of immune signature genes and total genes was higher, indicating that the performance of the low-input protocol is more stable in samples with higher RIN. Furthermore, we found a lower number of expressed genes and a higher number of unmapped genes (70-80%) in samples with RIN < 5, showing that the quality of the starting material is reflected in the sequencing output. The number of correctly mapped reads to the reference genome in a high-quality sample is 70-90%. A high percentage of unmapped reads signals potential contamination during sequencing or sample preparation (Sangiovanni et al., 2019) [19]. The quality of our sequenced samples was acceptable, with 70-80% of correctly mapped reads.
Some samples with RIN < 5 showed comparable numbers of immune signature genes to those with RIN ≥ 5. However, samples with RIN ≥ 5 had significantly higher levels and less variation. Thus, an effort should be made to improve the RIN of the samples with low quality.
The samples are expected to show similar expression across immune-and Herberg signature genes, but the great majority of the low-input sequencing samples had the highest correlation with the matched standard sequencing protocol control ( Figure 3B,C).
It is likely that we could improve the RNA quality of samples by refining the handling process of the PAXgene tubes after sampling. Many samples were left at room temperature for 24-72 h before freezing according to the manufacturer's instructions. Guidelines for blood sampling were based on blood samples from patients without immunodeficiencies and with sufficient RNA for standard sequencing. We suggest treating samples from this patient group with extra care to preserve the small amount of RNA. A study of the PAXgene blood tube's preanalytical robustness showed significantly lower RIN in samples not being effectively inverted (8-10 times) upon sampling and left at room temperature for 24 h [20]. We cannot guarantee that all our samples were correctly inverted, and we could reduce the storage time at room temperature. According to several studies, the quantity and quality of RNA stored in frozen PAXgene Blood RNA tubes are stable for years, supporting our hypothesis that attention should be paid to the steps before freezing [21].

Perspectives
Host transcriptome profiling is considered a novel and promising way to support the clinical diagnosis of infectious diseases. Our study shows that it is possible to sequence whole blood samples with low WBC counts collected in PAXgene tubes using a low-input protocol. These findings are pivotal for the protocol to be used for diagnostic purposes of infectious diseases in patients with low WBC counts. Therefore, we see SMART-seq as feasible for general clinical implementation. Furthermore, the price per sample and hands-on time make it attractive compared to our standard clinical sequencing protocol ( Table 1). The price could be further reduced by exploring alternative library preparation options [11].
The main limitation of the study was the variation in RNA quality in samples. The robustness of the low input protocol would be strengthened if the RNA quality in all the samples were high and consistent. A further limitation was the small number of patient samples sequenced using both methods.
The main strength is that the low input protocol can extract useful information from most patient samples, even from patients with low RNA quality, low WBC counts, or both.

Materials and Methods
We enrolled children under 17 years with leukaemia and suspected infection admitted to the paediatric oncology department at Copenhagen University Hospital, Rigshospitalet, from 1 May 2019, to 30 September 2019. The children were from a prospective nationwide cohort investigating unique transcriptional signatures in whole blood.

Blood Sampling Procedure and Sample Purification
Whole blood was collected into PAXgene ® Blood RNA Tubes (cat. no. 762165, Qiagen, Hilden, Germany), incubated at room temperature for a minimum of two hours and frozen to −20 • C within 72 h ( Figure 7A). Before RNA extraction, the sample tubes were thawed at room temperature for at least two hours. Subsequently, we performed automated RNA purification by QiaSymphony (cat. no. 9001297, Qiagen, Hilden, Germany) and QiaQube (cat. no 9001293, Qiagen, Hilden, Germany) according to the manufacturers' instructions ( Figure 7B), thus including RNA from all blood cells for further analysis. Purified RNA was evaporated to 3 µL total sample volume.

Low-Input Protocol Procedure
All samples were processed using a full-length transcript RNA-sequencing method Takara SMART-seq (Switching Mechanism At the 5 end of RNA Template) High-Throughput (HT) kit (cat nr. 634862, Takara Bio Inc., Kusatsu, Shiga, Japan) (hereafter termed "lowinput protocol") in combination with Nextera XT DNA library preparation kit (cat nr. FC-131-1096, Illumina, San Diego, CA, USA) according to recommendations from the manufacturers ( Figure 7C,D).

Library Preparation
We sorted 3 µL purified RNA samples into a 9.5 µL FACS dispensing solution consisting of 0.95 µL 10× Lysis buffer, 0.05 µL RNase inhibitor and 1 µL 3 SMART-Seq CDS primer II A, 7.5 µL nuclease-free H 2 O. Subsequently, 20 cycles of PCR amplification were applied as recommended by the manufacturer (Figure 7C,D).

Sequencing
Each single cell library was diluted to a concentration of 4 nM in EB buffer + 0.1%. Between 20 and 3 µL of each 4 nM library was pooled in an Eppendorf tube. To denature the double-stranded cDNA, we mixed 5 µL of the 4 nM pool with 5 µL 0.2 nM NaOH and incubated for 5 min at room temperature. Next, the denatured sample pool was diluted to a concentration of 20 pM by mixing 10 µL 2 nM of the sample pool with 990 µL cold hybridization buffer 1 (HT1). Finally, we diluted the 20 pM sample pool to a concentration of 10 pM by mixing 500 µL of the 20 pM sample pool with 500 µL cold HT1. Samples were sequenced in groups on a Miseq benchtop sequencer (Illumina, San Diego, CA, USA) using a MiSeq reagent kit v2 300 cycles (cat nr. MS-102-2002, Illumina, USA) ( Figure 7E).

Alignment/Trimming
Illumina sequencing raw reads were exported as fastq files and processed on a bash shell. We performed two rounds of trimming using TrimGalore! [22]. The first trimming removed Nextera XT adaptors ("CTGTCTCTTATACACATCT"), and the second trimming removed cDNA amplification adaptors ("AAGCAGTGGTATCAACGCAGAGT"). Two rounds of FastQC were performed after each trim for quality assessment of the sequencing output [23]. We used "Spliced Transcripts Alignment to a Reference" (STAR) to the Genome Reference Consortium Human Build 38 (GRCh38) to align the trimmed sequences [24]. The STAR output file comprised reads per gene for each sample. The files were merged into a single tab-delimited file. Samples that failed at sequencing quality control were discarded from further analysis.

Standard RNA-Sequencing Protocol
Standard RNA-sequencing was performed using a TruSeq Stranded RNA Library Preparation Kit (cat no. 20020597, Illumina, San Diego, CA, USA) (hereafter termed "standard protocol") according to the manufacturer's recommendations with a total purified RNA input of 0.1-1 µg.

Conclusions
We found that a low-input protocol may be a solution to the challenge in obtaining sufficient RNA for sequencing from patients with low white blood cell counts.
The challenge remains to investigate whether the captured immune gene signatures are clinically valid, as shown in studies of patients with normal WBC counts. It is a limitation related to febrile neutropenia that antibiotic treatment sometimes cannot be avoided; however, if this protocol can be successfully implemented, it can potentially help clinicians to diagnose patients with cancer and suspected infection more efficiently and prevent unnecessary antibiotic treatment and hospital admissions.