The Contribution of Multiplexing Single Cell RNA Sequencing in Acute Myeloid Leukemia

Decades ago, the treatment for acute myeloid leukemia relied on cytarabine and anthracycline. However, advancements in medical research have introduced targeted therapies, initially employing monoclonal antibodies such as ant-CD52 and anti-CD123, and subsequently utilizing specific inhibitors that target molecular mutations like anti-IDH1, IDH2, or FLT3. The challenge lies in determining the role of these therapeutic options, considering the inherent tumor heterogeneity associated with leukemia diagnosis and the clonal drift that this type of tumor can undergo. Targeted drugs necessitate an examination of various therapeutic targets at the individual cell level rather than assessing the entire population. It is crucial to differentiate between the prognostic value and therapeutic potential of a specific molecular target, depending on whether it is found in a terminally differentiated cell with limited proliferative potential or a stem cell with robust capabilities for both proliferation and self-renewal. However, this cell-by-cell analysis is accompanied by several challenges. Firstly, the scientific aspect poses difficulties in comparing different single cell analysis experiments despite efforts to standardize the results through various techniques. Secondly, there are practical obstacles as each individual cell experiment incurs significant financial costs and consumes a substantial amount of time. A viable solution lies in the ability to process multiple samples simultaneously, which is a distinctive feature of the cell hashing technique. In this study, we demonstrate the applicability of the cell hashing technique for analyzing acute myeloid leukemia cells. By comparing it to standard single cell analysis, we establish a strong correlation in various parameters such as quality control, gene expression, and the analysis of leukemic blast markers in patients. Consequently, this technique holds the potential to become an integral part of the biological assessment of acute myeloid leukemia, contributing to the personalized and optimized management of the disease, particularly in the context of employing targeted therapies.


Background
Acute myeloid leukemia (AML) is a complex disease characterized by the infiltration of the bone marrow, blood, and other tissues by abnormal and often poorly differentiated cells of the hematopoietic system. This infiltration disrupts the normal production of blood cells, leading to the failure of normal hematopoiesis. AML exhibits genetic, epigenetic, and clinical heterogeneity, making it a challenging condition to treat effectively [1][2][3]. The disease predominantly affects the elderly population and is associated with severe complications and a high mortality rate, contributing significantly to cancer-related deaths. However, recent advancements in disease management have led to improved cure rates, reaching around 15% for patients over 60 years of age and approximately 40% for patients thymocytes disrupts T cell homeostasis in the spleen, resulting in a significant reduction in the population of MYC-deficient effector/memory T cells [37].
The application of single cell hashing in acute myeloid leukemia (AML) presents an intriguing prospect, given the delicate nature of these cells. The material used, circulating leukemia cells, is readily obtainable, and accurately identifying the cell type is paramountdistinguishing differentiated blasts from leukemia stem cells and normal stem cells is crucial. In this study, our objective is to assess the viability and dependability of the single cell hashing technique on primary AML cells. To achieve this, we conducted a comparative analysis of the transcriptomic profiles of AML cells using the conventional single cell RNA sequencing method and the cell hashing technique.

Sample Preparation and Single Cell Preparation for Chromium Single Cell RNA Sequencing (scRNA-seq)
In this study, Human Peripheral Blood Mononuclear Cells (PBMCs) were obtained with the consent of AML patients (refer to Table 1). The isolation of PBMCs from peripheral blood involved centrifugation to separate them from plasma and other components of the blood, following a ficoll-Hystopaque density gradient centrifugation method adapted from English and Andersen [38]. Subsequently, the PBMCs were cryopreserved for long-term storage by freezing them either at −80 • C or in liquid nitrogen at −195.79 • C. To facilitate cryopreservation, a freezing medium comprising 90% FBS and 10% DMSO was used to store the PBMCs. Thawing of the cryopreserved cells in liquid nitrogen was performed at 37 • C, followed by rinsing with RPMI 1640 medium to eliminate the freezing medium containing DMSO. To enhance the sample quality by enriching it with viable cells, a Dead Cell Removal Kit (MACs Miltenyi Biotec, BG, Deustchland) was employed, following the manufacturer's guidelines.

Single Cell RNA-seq Processing
For the implementation of the single cell RNA-seq technique, we followed the guidelines provided in the user guide for the "Chromium single cell 3' reagent kits v3" from 10 × Genomics (cat # CG000183 Rev C). Additionally, for the Cell Hashing technique incorporating Feature Barcoding technology for Cell Surface Protein, we referred to the user guide for the "Chromium Next Gem Single Cell 3 Reagent Kits v3.1" (cat # CG000206 Rev D). Prior to cell labeling, a 50-µL aliquot containing 1,000,000 cells with over 95% viability was prepared for each sample. The cells were suspended in a Cell Staining Buffer (CSB) obtained from Biolegend (cat # 420201). The antibodies utilized for labeling were TotalSeq -A0251 (HTO1) and -A0252 (HTO2) antibody-oligo, sourced from Biolegend (cat # 394601/394603). To proceed with the labeling step, 1µg of HTO was added to each cell suspension. The two labeled samples were then incubated at 4 • C for 30 min. Subsequently, the cells underwent three consecutive washes with 3.5 mL of CSB, followed by centrifugation at 400 g for 5 min at 5 • C. Finally, the cells were resuspended in PBS (cat # 11503387) at a concentration of 1.106 cells/mL.

Single Cell Library Generation
To prepare the single cell suspension, we employed the Single Cell 3 Reagents Kit v3, following the guidelines provided in the Chromium Single Cell 3' Reagent Kits User Guide (v3 Chemistry)-Library Prep-Single Cell Gene Expression-10x Genomics Support. Subsequently, the appropriate volume of each sample was diluted to recover 10,000 PBMCs. The single cell suspension, along with gel beads and oils, was then added to the 10 × Genomics single cell A chip. After generating droplets, the samples were transferred to PCR tubes for reverse transcription, which was performed using a C1000 Touch thermal cycler. Following reverse transcription, the cDNA was recovered and purified using silane DynaBeads, as specified in the user guide. Prior to cleanup with SPRIselect beads, we performed cDNA amplification for 12 cycles. The concentration of cDNA was measured using a Qubit 2.0 fluorometer (Invitrogen), while the quality of cDNAs was assessed through capillary electrophoresis using a fragment analyzer.

Sequencing
For the preparation of libraries targeting HTO and gene expression sequencing, we utilized the NextSeq 500/550 High Output Kit v2.5 (75 cycles, Illumina) following the manufacturer's recommendations. The library pool consisted of 8% HTO libraries and 92% gene expression libraries, and it was loaded onto the Flow Cell at a concentration of 1.8 pM. Subsequently, the libraries were sequenced on a Next Seq 500 system with the following parameters: 28 cycles for read 1, 8 cycles for index i7, and 91 cycles for read 2. This sequencing setup aimed to achieve an average sequencing depth of 32 million reads for HTO libraries and 368 million reads for gene expression libraries. The sequencing data underwent analysis using the Cell Ranger Pipeline, specifically version 6.0.2, for various steps including quality control, sample demultiplexing using mkfastq, barcode processing, alignment, and single cell 3' gene counting through cellranger count. Demultiplexing of samples was carried out with bcl2fastq, utilizing the 8-bp sample index, 10-bp UMI tags, and the 16-bp GemCode barcode. The alignment of the cDNA sequences presents in the 91-bp-long read 2 was performed using cell Ranger versions 3.0.0 and 3.1.0. UMI quantification, GEM-Code, and cell barcode filtering, based on Hamming distance error detection as described by Zheng et al., 2017 [39], were executed. Only confidently mapped transcriptome reads, excluding PCR duplicates, with valid barcodes and UMIs, were utilized to generate an unfiltered data matrix. Barcodes exceeding 10% of the 99th percentile of expected cell recovery (default = 3000) in total UMI counts were considered to contain cells and were selected to create a filtered gene-barcode matrix for subsequent analysis. The "gene and transcripts (UMI counts) per cell" metric was employed to compare the sensitivity of scRNA-seq. Additionally, the "Fraction Reads in Cells" parameter was calculated to assess the presence of background cell-free (ambient) RNA in the cell suspension, based on the fraction of cell-associated barcoded reads that confidently mapped to cell barcodes.

Normalization and Correlation of Gene Expression Levels
Normalization was performed individually for each sample using the "log-normalization" method applied to the Seurat object. To assess the correlation between the various samples, an initial Seurat filter was employed to retain genes detected in at least three cells and to include cells with a minimum of 200 UMI. Subsequently, the gene expression correlations across different samples were calculated by determining the average expression of genes shared by all samples.

PCA and tSNE Analysis for Cell Clustering and Classification, and Data Visualization
Secondary analysis was conducted using the Cell Ranger count and aggr pipelines. Prior to cell clustering, Principal Component Analysis (PCA) was applied to the genebarcode matrix, which had been normalized, log-transformed, centered, and scaled, to reduce the dimensionality of the feature (gene) space. The implementation of the IRLBA algorithm in Python was employed for this purpose, resulting in a projection of each cell onto the first N principal components. No filtering was performed on "low-quality" genes and cells as previously described, and the Seurat package utilized this projection in the subsequent PCA analysis. Following PCA, t-distributed Stochastic Neighbor Embedding (t-SNE) was utilized to visualize the cells in a 2-D space. Clustering was then performed to group cells with similar expression profiles based on their projection in the PCA space. Two clustering methods were employed: graph-based and k-means. Cell Ranger also generated a table indicating the genes that exhibited differential expression in each cluster compared to all other clusters. The classification of PBMCs was inferred from the annotation of cluster-specific genes, relying on the expression of well-known markers of immune cell types (marker-based classification). For comprehensive visualization of the entire dataset and interactive exploration of significant genes, cell types, and substructure within cell clusters, Loupe Cell Browser (v5.0) was employed [31].
To process multiplexed samples, the HTO data are processed using the "cite-seq count, v1.4.3" tool available at https://hoohm.github.io/CITE-seq-Count/ (accessed on 5 January 2022) and https://zenodo.org/record/2590196 (accessed on 5 January 2022). The resulting data are then read using the Seurat "Read10X" tool, accessible at https: //satijalab.org/seurat/articles/hashing_vignette.html (accessed on 5 January 2022). Only the cell barcodes that are common between the HTO data obtained from cite-seq count and the mRNA data obtained from cellranger are retained. These barcodes correspond to the cells detected by cellranger that also have an assigned HTO. Finally, the HTO arrays undergo normalization using the Log-Ratio Centered (CLR) method, which is recommended by the developers of Seurat. Subsequently, the "MULTIseqDemux" function is utilized to assign an HTO name to each cell.
To perform filtering on the multiplexed samples, the mRNA arrays obtained from cellranger count were loaded, and the cell barcodes were filtered based on the availability of an assigned HTO using MULTIseqDemux.

Gene Ontology Analysis for Molecular Function
Gene Ontology analysis for molecular function was performed using Panther version 16.0 database http://pantherdb.org (accessed on 6 June 2022)/.

Code Availability
All codes used in this study are available online at GitHub-SebastienNin/mw-Madaci2021.

Transcript and Gene Numbers from Single Cell RNA-seq and Cell Hashing Are Similar
In order to investigate the impact of multiplexing samples on the transcriptome through the utilization of a cell hashing technique, we employed peripheral blood mononu- clear cells (PBMCs) obtained from two patients diagnosed with acute myeloid leukemia (AML). These cells were labeled with antibodies that targeted a protein expressed on the surface of all cells, coupled with a hashtag oligonucleotide (HTO). Subsequently, the labeled cells were mixed together prior to sequencing.
The multiplexing single cell RNA-seq technique, also known as cell hashing, relies on labeling ubiquitous membrane proteins with HTOs. In the context of human cells, the B2M and ATP1B3 proteins, which are widely expressed in normal cells, have been chosen as targets for cell labeling. However, interactions between extracellular molecules and membrane proteins can potentially impact various metabolic pathways. It raises concerns about whether such interactions could alter the transcriptome of leukemia cells. To address this, we examined the expression of the B2M protein in the two acute myeloid leukemia (AML) samples sequenced using single cell RNA-seq. The results of this analysis revealed high expression of the B2M protein in both AML samples, indicating that the process of leukemogenesis did not affect the expression of this cellular target (data not shown). With the confirmation of our HTO target, we proceeded to sequence the two AML samples using both techniques.
We utilized 10X Genomics scRNA-seq technology to sequence a total of 4390 cells from the UPN23 Single  Table 2). The number of unique molecular identifiers (UMIs) captured, which represents the transcripts identified in each sequence read to prevent inflation due to PCR amplification during library preparation, was relatively higher in the individually sequenced samples (median of 4015 for UPN23 SC and 7765 for UPN29 SC) compared to the multiplexed samples (median of 3978 in UPN23 SC-m and 5554 in UPN29 SCm) (Figure 1b). Furthermore, we observed consistency between the two approaches in terms of the percentages of mitochondrial genes in each patient (Figure 1c). Based on this quality control analysis, we found consistency in the sequencing results obtained from both approaches.   (Figure 1d,e). Subsequently, we performed clustering analysis on the cells from each sample using Seurat software. In UPN23 SC, we identified 12 distinct clusters labeled as clusters 0-11, whereas in UPN23 SC-m, we observed 11 clusters labeled as clusters 0-10 (Supplementary data S1a,b). As for the UPN29 sample, we discovered 16 clusters labeled as clusters 0-15 in UPN29 SC, and 13 clusters labeled as clusters 0-12 in UPN29 SC-m (Supplementary data S1c,d).
Next, we proceeded to assign cell subtypes to the cells using the bioinformatic program "celldex". The results of this analysis enabled us to classify the cells from each sample into different subtypes, including leukemic cells, B cells, CD4+ T cells, CD8+ T cells, and NK lymphocytes. In UPN23SC, we identified approximately 95% leukemic cells, while in UPN23SC-m, we observed approximately 93% leukemic cells (Figure 1a). Similarly, in UPN29SC, we found approximately 90% leukemic cells, and in UPN29SC-m, we detected around 92% leukemic cells (Figure 1b). These findings indicate a strong correlation between the two approaches.

Gene Expression Levels Correlate between scRNA-seq and Cell Hashing
To complement our initial analysis, we investigated the correlation between the average gene expression in samples sequenced individually and those sequenced using the cell hashing technique. As depicted in Figure 1f-g, the expression levels of genes in both the UPN23 and UPN29 samples exhibited a strong correlation when comparing the two approaches (R = 0.996, p < 2.2 × 10 −16 for UPN23 and R = 0.998, p < 2.2 × 10 −16 for UPN29).  (Figure 1d,e). Subsequently, we performed clustering analysis on the cells from each sample using Seurat software. In UPN23 SC, we identified 12 distinct clusters labeled as clusters 0-11, whereas in UPN23 SC-m, we observed 11 clusters labeled as clusters 0-10 (Supplementary data S1a,b). As for the UPN29 sample, we discovered 16 clusters labeled as clusters 0-15 in UPN29 SC, and 13 clusters labeled as clusters 0-12 in UPN29 SC-m (Supplementary data S1c,d).
Next, we proceeded to assign cell subtypes to the cells using the bioinformatic program "celldex". The results of this analysis enabled us to classify the cells from each sample into different subtypes, including leukemic cells, B cells, CD4+ T cells, CD8+ T cells, and NK lymphocytes. In UPN23SC, we identified approximately 95% leukemic cells, while in UPN23SC-m, we observed approximately 93% leukemic cells (Figure 1a). Similarly, in UPN29SC, we found approximately 90% leukemic cells, and in UPN29SC-m, we detected around 92% leukemic cells (Figure 1b). These findings indicate a strong correlation between the two approaches.

Gene Expression Levels Correlate between scRNA-seq and Cell Hashing
To complement our initial analysis, we investigated the correlation between the average gene expression in samples sequenced individually and those sequenced using the cell hashing technique. As depicted in Figure 1f-g, the expression levels of genes in both the UPN23 and UPN29 samples exhibited a strong correlation when comparing the two approaches (R = 0.996, p < 2.2 × 10 −16 for UPN23 and R = 0.998, p < 2.2 × 10 −16 for UPN29). In summary, our data strongly indicate that the cell hashing approach effectively maintains the integrity of individual cell transcriptomes, making it a reliable method for investigating gene expression.
Following the quality control of the two samples sequenced using different approaches, our focus shifted to the analysis of genes utilized as markers for diagnosing leukemic blasts (Table 1). Specifically, we examined the expression of KIT (CD117), CD34, CD33, CD13, CD36, and CD64 in UPN23 patients sequenced individually and with the cell hashing technique (Figure 2a). The results demonstrated a positive correlation in the expression of these genes between UPN23 SC and UPN23 SC-m (Figure 2b). Similarly, we conducted the same analysis for UPN29, investigating the expression of KIT (CD117), CD34, CD33, and CD11 as leukemic blast markers (Figure 2c). Once again, a correlation was observed in the expression of these genes between UPN29 SC and UPN29 SC-m (Figure 2d). These findings highlight the concordance in the expression of leukemic blast markers across both approaches, despite a reduction in the number of genes and gene expression variability in the cell hashing approach compared to the standard single cell approach.
Following the examination of leukemic markers, we proceeded to investigate the cluster-specific genes in both samples. To achieve this, we performed an analysis of differentially expressed genes for each sample and approach. The outcomes of this analysis are depicted in a heatmap (Supplementary data S2), where each cluster is represented, along with the top overexpressed genes for each condition.
Diseases 2023, 11, x FOR PEER REVIEW 9 of 14 In summary, our data strongly indicate that the cell hashing approach effectively maintains the integrity of individual cell transcriptomes, making it a reliable method for investigating gene expression.
Following the quality control of the two samples sequenced using different approaches, our focus shifted to the analysis of genes utilized as markers for diagnosing leukemic blasts (Table 1). Specifically, we examined the expression of KIT (CD117), CD34, CD33, CD13, CD36, and CD64 in UPN23 patients sequenced individually and with the cell hashing technique (Figure 2a). The results demonstrated a positive correlation in the expression of these genes between UPN23 SC and UPN23 SC-m (Figure 2b). Similarly, we conducted the same analysis for UPN29, investigating the expression of KIT (CD117), CD34, CD33, and CD11 as leukemic blast markers (Figure 2c). Once again, a correlation was observed in the expression of these genes between UPN29 SC and UPN29 SC-m (Figure 2d). These findings highlight the concordance in the expression of leukemic blast markers across both approaches, despite a reduction in the number of genes and gene expression variability in the cell hashing approach compared to the standard single cell approach. The analysis of the top 10 marker genes expressed in each cluster of UPN23 and UPN29 revealed a substantial overlap. Specifically, we found 93 common top 10 marker genes between UPN23 SC and UPN23 SC-m (Figure 3a), and approximately 104 common top 10 marker genes between UPN29 SC and UPN29 SC-m (Figure 3b). This analysis demonstrates the consistent recovery of marker genes with both approaches. This consistency was further supported by Gene Ontology analysis of molecular function performed on the top 10 most expressed genes in each cluster of UPN23 ( Figure 3c) and UPN29 (Figure 3d). The results revealed that these genes could be categorized into six different functional categories, including transporter activity, structural molecular activity, molecular transducer activity, molecular function, catalytic activity, and binding. These findings highlight the agreement between the two approaches and provide evidence for the reliability of the cell hashing technique, which produces results comparable to those obtained through single cell RNA-seq.
Cell-multiplexed). (b) Histogram illustrating the percentage of expression for these markers in UPN23 SC and UPN23 SC-m. (c) UMAP representation showing comparison of leukemia cell marker expression (KIT, CD34, CD33 and CD11) between UPN29 SC and UPN29 SC-m. (d) Histogram displaying the percentage of expression for these markers in UPN29 SC and UPN29 SC-m.
Following the examination of leukemic markers, we proceeded to investigate the cluster-specific genes in both samples. To achieve this, we performed an analysis of differentially expressed genes for each sample and approach. The outcomes of this analysis are depicted in a heatmap (Supplementary data S2), where each cluster is represented, along with the top overexpressed genes for each condition.
The analysis of these genes revealed no differences in gene expression between UPN23 SC (Supplementary data S2a) and UPN23 SC-m (Supplementary data S2b). Specifically, we observed consistent expression levels of cell markers such as MS4A1, CD79A and B, CD3D, NKG7, GZMA, and GNLY, which are indicative of normal hematopoietic cells including B lymphocytes, T lymphocytes, and natural killer (NK) cells. Additionally, we observed similar expression patterns of genes involved in the leukemic process, such as MPO and SOX4. Similarly, in UPN29 SC and UPN29 SC-m (Supplementary data S2c,d), we observed consistent expression of markers for leukemic blasts (LB), lymphoid cells (LT), and NK cells (CD79A, NKG7, and GNLY), as well as genes implicated in leukemogenesis (MPO and SOX4).
The analysis of the top 10 marker genes expressed in each cluster of UPN23 and UPN29 revealed a substantial overlap. Specifically, we found 93 common top 10 marker genes between UPN23 SC and UPN23 SC-m (Figure 3a), and approximately 104 common top 10 marker genes between UPN29 SC and UPN29 SC-m (Figure 3b). This analysis demonstrates the consistent recovery of marker genes with both approaches. This consistency was further supported by Gene Ontology analysis of molecular function performed on the top 10 most expressed genes in each cluster of UPN23 ( Figure 3c) and UPN29 (Figure 3d). The results revealed that these genes could be categorized into six different functional categories, including transporter activity, structural molecular activity, molecular transducer activity, molecular function, catalytic activity, and binding. These findings highlight the agreement between the two approaches and provide evidence for the reliability of the cell hashing technique, which produces results comparable to those obtained through single cell RNA-seq. (b). The diagram reveals 93 shared genes between UPN23 SC (Single Cell RNA-sequencing) and UPN23 SC-m (Single Cell_multiplexed) (left), and 104 shared genes between UPN29 SC and UPN29 SC-m. (c,d) Bar chart presenting the Panther Gene Ontology Molecular Function analysis of the top 10 up-expressed genes in each cluster of UPN23 (c) and UPN29 (d). The genes were classified into six molecular functions, including transporter activity, structural molecular activity, molecular transducer activity, molecular function, catalytic activity, and binding.

Conclusions
In summary, our comparative study of single cell RNA sequencing methods reveals that the cell-hopping technique yields favorable outcomes while preserving transcriptomic information. We observed a strong correlation between gene expression and cellular markers, surpassing the traditional single cell approach. By employing cell labeling with HTOs, the cell-hopping approach allows for the sequencing of multiple samples in a single run, enabling a straightforward comparison between different samples and conditions without concerns about batch effects.
However, caution is advised when applying this approach, especially when identifying rare cell populations. As the number of samples increases, the number of sequenced cells per sample decreases, limiting the detection of infrequent cell populations. Based on our analysis capacity of approximately 30,000 cells per experiment, we find that using eight simultaneous samples is optimal in most cases. This allows for the analysis of relatively rare populations within each sample, even at low frequencies (e.g., 1%), which would still comprise around 40 cells using this technique. For smaller populations, the traditional single cell analysis would be more suitable.
Furthermore, a multiplexed single cell transcriptomic study has demonstrated its potential in assessing the sensitivity of leukemic cells to chemotherapy while elucidating underlying mechanisms [40]. Integrating the cell hashing technique with genetic and phenotypic approaches will facilitate the identification of distinct leukemic cell sub-populations, the discovery of new diagnostic and prognostic markers, and the establishment of targeted therapies for personalized clinical management of acute myeloid leukemia patients [41].
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/diseases11030096/s1: UMAP projection of PBMCs from AML patients grouped into different clusters; S2: Marker genes per cluster in cells in UPN23 and UPN29.
Author Contributions: L.M., C.G. and S.N. performed and analyzed the single cell experiments; R.C., D.P. and B.L. was involved in the conception of single cell experiments, in conception, design and manuscript preparation; L.M., G.V., R.C. and P.R. were involved in the conception of single cell experiments, in manuscript preparation and data discussion. All authors have read and agreed to the published version of the manuscript. Institutional Review Board Statement: All subjects gave their informed consent for inclusion before they participated in the study. The study was conducted in accordance with the Declaration of Helsinki, and the protocol was approved by the the institutional review board of the Mediterranean V (Ref. 15.013) and French National Agency for Drug Safety (Ref. 150054B-11).

Informed Consent Statement: Not applicable.
Data Availability Statement: The snRNA-seq data was downloaded from NCBI Gene Expression Omnibus (GEO) public database (GSE212038).