Identification of a Maturation Plasma Cell Index through a Highly Sensitive Droplet Digital PCR Assay Gene Expression Signature Validation in Newly Diagnosed Multiple Myeloma Patients

DNA microarrays and RNA-based sequencing approaches are considered important discovery tools in clinical medicine. However, cross-platform reproducibility studies undertaken so far have highlighted that microarrays are not able to accurately measure gene expression, particularly when they are expressed at low levels. Here, we consider the employment of a digital PCR assay (ddPCR) to validate a gene signature previously identified by gene expression profile. This signature included ten Hedgehog (HH) pathways’ genes able to stratify multiple myeloma (MM) patients according to their self-renewal status. Results show that the designed assay is able to validate gene expression data, both in a retrospective as well as in a prospective cohort. In addition, the plasma cells’ differentiation status determined by ddPCR was further confirmed by other techniques, such as flow cytometry, allowing the identification of patients with immature plasma cells’ phenotype (i.e., expressing CD19+/CD81+ markers) upregulating HH genes, as compared to others, whose plasma cells lose the expression of these markers and were more differentiated. To our knowledge, this is the first technical report of gene expression data validation by ddPCR instead of classical qPCR. This approach permitted the identification of a Maturation Index through the integration of molecular and phenotypic data, able to possibly define upfront the differentiation status of MM patients that would be clinically relevant in the future.


Introduction
Heterogeneity is a hallmark of multiple myeloma (MM) genome and transcriptome, and several molecular tests have been employed in the last few years in order to dissect patients' molecular profiles, finally aiming at risk assessment and prognosis definition [1][2][3].
In this context, DNA microarrays and, more recently, RNA sequencing, can simultaneously measure the expression level of thousands of genes up to the entire transcriptome [4,5]. Such high-throughput expression profiling can be mainly used to compare the level of gene transcription in clinical setting in order to: (1) identify diagnostic or prognostic biomarkers; (2) classify diseases (e.g., tumors with different prognosis that are indistinguishable by microscopic examination); (3) monitor the response to therapy; and (4) understand the Int. J. Mol. Sci. 2022, 23, 12450 2 of 12 mechanisms involved in the genesis of disease processes [6]. For these reasons, DNA microarrays and RNA sequencing approaches are considered important discovery tools in clinical medicine. However, cross-platform reproducibility studies undertaken so far have highlighted that microarrays are not able to accurately measure gene expression, particularly when low [7][8][9][10].
In a previous work [11], we have demonstrated a differential expression of the Hedgehog (HH) pathway, a self-renewal signaling involved in tumorigenesis and maintenance of cancer cells, among newly diagnosed multiple myeloma (NDMM) patients. In particular, patients were stratified in two clusters according to a 10 HH genes signature (Ligands: SHH, DHH, IHH; Receptors: PTCH1, PTCH2; Intermediate: SUFU; Transcription factors: GLI1, GLI2, GLI3) expression profiles. Cluster 1 patients were characterized by an overall downregulation of the pathway, whereas Cluster 2 patients displayed marked overexpression of the pathway. Although several studies pinpoint the relevant role of self-renewal mechanisms in neoplastic diseases [12,13], overall, we observed that this pathway was expressed at low level in MM patients compared to normal samples [14]. Therefore, a robust technique should be employed in order to validate these data.
Droplet digital PCR assay (ddPCR) technology is a digital PCR method utilizing a water-oil emulsion droplet system. The creation of tens of thousands of droplets means that a single sample can generate tens of thousands of data points rather than a single result, bringing the power of statistical analysis inherent in digital PCR into practical application [15,16]. This technique requires the input of small sample quantities, which reduces costs and preserves precious samples. For all these reasons, ddPCR technology offers several advantages compared to real-time quantitative PCR (qPCR), including the attainment of absolute quantitative results, improved precision, reproducibility, accuracy, sensitivity, and great tolerance to PCR inhibitors. Moreover, nucleic acid quantitation is independent of reaction efficacy, providing greater ability to detect and accurately quantify low-abundance targets [17].
Here, we consider the employment of a ddPCR assay to validate a gene signature, named the 10 HH genes signature, previously identified by gene expression profile. Results show that the designed assay is able to validate gene expression data, both in a retrospective as well as in a prospective cohort. In addition, the plasma cells' differentiation status determined by ddPCR could be further confirmed by other techniques, such as flow cytometry. To our knowledge, this is the first technical report of gene expression data validation by ddPCR instead of classical qPCR. The use of ddPCR permitted the identification of a Maturation Index capable of discriminating between patients with mature plasma cells (PCs) and patients with immature PCs.

10 HH Genes Were Expressed at Low Level across Different Cohorts of NDMM Patients
Since the current aim of the work was to validate the previously defined 10 HH genes signature, we evaluated which technique might best reproduce the benchmark results.
As we recently reported, the 10 HH genes are expressed at low level in MM patients [11]. To further confirm this, we compared gene expression data obtained from nine different cohorts of NDMM patients, downloaded from the GEO database (Table S1), including a total of 3023 patients, to gene expression data obtained from 27 donor samples, with a focus on the 10 HH genes expression data ( Figure 1). In order to avoid platform biases, only cohorts including at least 50 NDMM patients were considered, preferably profiled by Gene Chip Plus 2.0 (Affymetrix, Thermo Scientific). The comparisons confirmed that Hedgehog pathway genes were low when expressed in MM compared to donor samples. Indeed, the 10 HH genes are not included in the gene list commonly implicated in MM disease onset; in addition, their fold-change (FC) values fall within the −2; 2 FC interval (10 HH genes FC range: from −0.56 to 0.32; Table 1).
For this reason, in order to validate the signature, we considered ddPCR the best approach available, as it is able to detect low abundant targets more efficiently than conventional qPCR. ing a total of 3,023 patients, to gene expression data obtained from 27 donor samples, with a focus on the 10 HH genes expression data ( Figure 1). In order to avoid platform biases, only cohorts including at least 50 NDMM patients were considered, preferably profiled by Gene Chip Plus 2.0 (Affymetrix, Thermo Scientific). The comparisons confirmed that Hedgehog pathway genes were low when expressed in MM compared to donor samples. Indeed, the 10 HH genes are not included in the gene list commonly implicated in MM disease onset; in addition, their fold-change (FC) values fall within the -2; 2 FC interval (10 HH genes FC range: from -0.56 to 0.32; Table 1).
For this reason, in order to validate the signature, we considered ddPCR the best approach available, as it is able to detect low abundant targets more efficiently than conventional qPCR.

Selection of Suitable Housekeeping Gene
In order to properly perform the 10 HH genes quantification via ddPCR, the intrinsic expression values variability need to be normalized using a house-keeping (HK) gene.

Selection of Suitable Housekeeping Gene
In order to properly perform the 10 HH genes quantification via ddPCR, the intrinsic expression values variability need to be normalized using a house-keeping (HK) gene.
To this end, the normalized expression (NE) values of 13 known HK genes (B2M, CDKN1A, CTBP1, GUSB, MTBP, PPIH, PTBP2, PUM1, SETBP1, TBP, TBPL1, TFRC, and YWHAZ) were extrapolated from our 126-gene expression array data, and three HK genes, whose expressions were quantitatively similar to those of the 10-HH genes signature (median NE values = 45.0), were selected in order to perform data normalization. In particular, a high (PUM1: 726.03 NE), an intermediate (PPIH: 145.08 EV), and a low (MTBP: 22.09) HK gene were identified ( Figure 2). Of these three HK genes, MTBP was selected since its expression was most similar to that of the 10 HH gene signature (22.09 vs. 45.0 NE values) and, more importantly, because it is located out of chromosomal regions frequently affected by copy number variation (CNV) (i.e., on chr8q24.12).
genes, whose expressions were quantitatively similar to those of the 10-HH genes signature (median NE values = 45.0), were selected in order to perform data normalization. In particular, a high (PUM1: 726.03 NE), an intermediate (PPIH: 145.08 EV), and a low (MTBP: 22.09) HK gene were identified ( Figure 2). Of these three HK genes, MTBP was selected since its expression was most similar to that of the 10 HH gene signature (22.09 vs. 45.0 NE values) and, more importantly, because it is located out of chromosomal regions frequently affected by copy number variation (CNV) (i.e., on chr8q24.12).

Design and Set-up of the 10 HH Genes Signature ddPCR Assay
In order to design a 10 HH gene signature ddPCR assay, we extrapolated from the Plus 2.0 gene chip array the exact genomic sequences of each HH gene's probe (~500 bp each) (Table S2). Custom ddPCR assays consist of two primers, forward and reverse, and an intermediate probe conjugated with FAM and HEX fluorophores, allowing the possibility ofall the assays being multiplexed in a unique cartridge.
Custom assays were first tested on a MM cell line, NCI-H929, which represented the ideal positive control of Hedgehog pathway expression [18]. Preliminary data were produced by employing four different bona fide human MM cell lines (MM1.S, KMS12BM, NCI-H929, and RPMI-8226), whose CEL file was downloaded from Array Gene Express and GEO database (E-GEOD-22759 and GSE53798, respectively). We first processed the gene expression profiles of cell lines as described in the Patients and Methods section in order to assign them to either Cluster 1 or Cluster 2, according to the HH-gene signature. As shown in Table 2, the signature was able to classify each cell line as belonging either to Cluster 1 (MM1.S and RPMI-8226) or to Cluster 2 (KMS12.BM and NCI-H929). Assays were initially performed as a single assay, and a non-template control (NTC) was included in the experiment for each gene. Then, we tested different target concentrations by amplifying 5, 25, and 50 ng of input material (cDNA). In order to verify the reproducibility between replicates and between different experiments, all three target concentrations were analyzed in replicates, two times in two independent experiments. Since low quantification results were obtained for some genes, particularly for SHH, IHH, and GLI2, we finally set the input cDNA material at 50 ng (the highest tested), the right amount required

Design and Set-Up of the 10 HH Genes Signature ddPCR Assay
In order to design a 10 HH gene signature ddPCR assay, we extrapolated from the Plus 2.0 gene chip array the exact genomic sequences of each HH gene's probe (~500 bp each) (Table S2). Custom ddPCR assays consist of two primers, forward and reverse, and an intermediate probe conjugated with FAM and HEX fluorophores, allowing the possibility ofall the assays being multiplexed in a unique cartridge.
Custom assays were first tested on a MM cell line, NCI-H929, which represented the ideal positive control of Hedgehog pathway expression [18]. Preliminary data were produced by employing four different bona fide human MM cell lines (MM1.S, KMS12BM, NCI-H929, and RPMI-8226), whose CEL file was downloaded from Array Gene Express and GEO database (E-GEOD-22759 and GSE53798, respectively). We first processed the gene expression profiles of cell lines as described in the Patients and Methods section in order to assign them to either Cluster 1 or Cluster 2, according to the HH-gene signature. As shown in Table 2, the signature was able to classify each cell line as belonging either to Cluster 1 (MM1.S and RPMI-8226) or to Cluster 2 (KMS12.BM and NCI-H929). Assays were initially performed as a single assay, and a non-template control (NTC) was included in the experiment for each gene. Then, we tested different target concentrations by amplifying 5, 25, and 50 ng of input material (cDNA). In order to verify the reproducibility between replicates and between different experiments, all three target concentrations were analyzed in replicates, two times in two independent experiments. Since low quantification results were obtained for some genes, particularly for SHH, IHH, and GLI2, we finally set the input cDNA material at 50 ng (the highest tested), the right amount required for the optimal target identification. Notably, good quantification results' reproducibility between replicates and experiments was also observed (rho = 0.9966) (Figure 3). for the optimal target identification. Notably, good quantification results' reproducibility between replicates and experiments was also observed (rho = 0.9966) (Figure 3).

Multiplexed ddPCR 10-HH Genes Assay: Signature Validation on Primary Patients' Samples
The different assays were then multiplexed, and the cartridge was optimized by coupling genes with similar expression values in the same well (Table 3). To this end, SMO, SHH, PTCH1, IHH, DHH, and MTBP were conjugated with FAM fluorophore, whereas SUFU, GLI1, GLI2, GLI3, and PTCH2 were conjugated with HEX fluorophore. Table 3. ddPCR multiplexed assay validation of a 10 HH gene signature. The different assays have been multiplexed in order to optimize the cartridge by coupling genes with similar values of expression in the same well. SMO, SHH, PTCH1, IHH, DHH, and MTBP were conjugated with FAM fluorophores, whereas SUFU, GLI1, GLI2, GLI3, and PTCH2 were conjugated with HEX fluorophores.

FAM HEX
A total of 126 patients was screened via the ddPCR assay: 39 derived from Cluster 1; 37 from Cluster 2; and 50 initially attributed to the test set. In order to normalize gene expression values, all the absolute quantification values were corrected by the MTBP gene expression. According to the ddPCR results, all patients were then assigned either to Cluster 1 or to Cluster 2. By comparing the median expression values of each gene of the signature, 6 out of 10 genes showed a significant difference between the two clusters of patients (p < 0.05). In particular, patients included in Cluster 2 displayed an overall activation of the pathway, accounted for by at least two ligands (DHH, IHH), two receptors (SMO, PTCH1), the intermediate (SUFU), and one transcription factor (GLI2) (Figure 4).
Notably, ddPCR results also made it possible to correctly assign patients, first included in the test set, to their own Cluster, previously defined according to the distance from either Cluster 1 or Cluster 2 in terms of gene expression values (Table S3). In particular, 31 patients were assigned to Cluster 1 and 19 to Cluster 2; the comparison of gene distances between Clusters, as obtained both from gene expression data and from ddPCR data overall, showed a good correlation (p = 0.0001; r2 = 0.98).
Therefore, ddPCR quantification of the 10 HH genes signature has been shown to be able to correctly reproduce results previously obtained by gene expression profiles, identifying patients with different expressions of HH pathway.

Prospective Testing of the 10 HH Genes Signature by Integration of Molecular and Immunophenotypic Data
Data from literature have suggested that the Hedgehog pathway plays an important role in cancer cells' replication, survival, and differentiation [18][19][20]. Accordingly, plasma

Prospective Testing of the 10 HH Genes Signature by Integration of Molecular and Immunophenotypic Data
Data from literature have suggested that the Hedgehog pathway plays an important role in cancer cells' replication, survival, and differentiation [18][19][20]. Accordingly, plasma cells might express the Hedgehog Pathway differently based on their differentiation status [11]. In order to prospectively validate the 10 HH genes signature by ddPCR, 22 NDMM patients were screened up front for an immunophenotypic profile, which included, among others, specific surface markers related to plasma cells differentiation (e.g., CD19, CD81). According to their expression, 10 of 22 patients were defined as CD19+/CD81+ (i.e., with immature/less differentiated PCs), whereas 12 out of 22 were defined as CD19-/CD81-(i.e., with mature/more differentiated PCs). Testing the 10 HH genes signature in this prospective cohort confirmed their differentiation status. Indeed, even though the genes' absolute quantification values were at least 10-fold higher with respect to those obtained in the validation cohort, the comparison of the median expression values of the 10-HH genes in the two immunophenotypic groups highlight that 3 out of 10 genes were significantly different between the two groups (SMO, PTCH1, and GLI1; p < 0.05). Interestingly, patients with a CD19-/CD81-profile, characterized by a more differentiated PC status, displayed an overall activation of the Hedgehog pathway with respect to patients with an immature CD19+/CD81+ PCs profile (Table 4).
Therefore, the integration of molecular and immunophenotypic information permitted the fine description of the MM PCs' differentiation status at diagnosis, allowing the definition of a PCs Maturation Index, possibly supporting the process of patients' stratification. Table 4. Digital PCR validation of the 10 HH genes signature across the prospective NDMM patients' cohort. A prospective cohort of 22 patients was employed to validate the ddPCR assay on CD138+ plasma cells. An immunophenotypic characterization of the neoplastic clone permit the stratification according to their differentiation status. In particular, 10 out of 22 were CD19+/CD81+ (CLU1immature/less differentiated phenotype) and 12 out of 22 were CD19-/CD81-(CLU2-mature/more differentiated phenotype). Screening of these patients with the 10 HH-genes signature confirms their differentiation status. Indeed, the comparison of the absolute quantification values between the two groups highlights the overexpression of 10 HH genes signature in CD19-/CD81-patients, the most differentiated group.

Discussion
For many years, gene expression profiling via microarray has represented a highresolution and valid technique that has enabled the identification of gene expression fluctuations under different experimental conditions [20][21][22]. Moreover, the extrapolation of a restricted list of genes, named "signature", has allowed patients' stratification in prognostic meaningful subgroups [23][24][25]. However, one of the pitfalls of this approach has been represented by the need to be validated by other techniques, which possibly should be more manageable and ready to use in daily clinical practice. From these perspectives, ddPCR technology might be considered the ideal validation approach, since it provides an absolute count of target DNA copies per input sample, without the need to run standard curves such as in the traditional quantitative real-time PCR [26,27]. Indeed, it permits the absolute quantification even of very low abundant targets, such as the 10 HH gene signature herein tested [14]. Moreover, it is faster, as compared to a conventional gene expression profile experiment, and results are available in a few hours (instead of in days).
The setup of a new custom ddPCR, such as the one presented in this work, has required the preliminary investigations of several experimental conditions on MM cell lines. For example, the choice of the MTBP gene as the housekeeping gene, to be employed for expression data normalization, has been mainly conditioned by three reasons, which overall have facilitated the normalization and the development of the assay: (1) MTBP gene expression was the most similar to those of the 10 HH genes; (2) in comparison to other low-expressed HK genes, such as PPIH (chr1p34.2) and PUM1 (chr1p35.2), MTBP is located in a chromosomal region (chr8q24.12) that is infrequently affected by copy number alterations; (3) after the set-up phase, we established that just one housekeeping gene should be maintained, in order to optimize the final cartridge layout to allow the analysis of one patient at a time [28]. In addition, the identification of the optimal input template, which is not entirely negligible. Indeed, the use of the highest possible amount of input material (i.e., 50 ng) has been considered necessary compared to 5 and 25 ng, since the HH genes have been shown to be low expressed in MM patients.
Recent published works suggested that the HH pathway plays an important role in cancer cells' replication, survival, and differentiation [20] and, accordingly, plasma cells might express the Hedgehog Pathway differently based on their differentiation status [11]. Therefore, the possibility to define upfront the differentiation status of MM patients, according to the expression of these genes, will be clinically relevant in the near future. In order to validate the HH genes' signature via ddPCR, we planned to test it both in a retrospective as well as a prospective cohort of MM patients.
Results on the retrospective cohort of patients demonstrated that the ddPCR assay has been able to reproduce the results obtained by gene expression profiles. Indeed, we showed that the HH pathway quantification by ddPCR correctly identified patients previously included in Cluster1 as effectively downregulating the HH pathway and, on the contrary, patients previously classified in Cluster2 were identified as overexpressing the pathway. Therefore, the 10 HH genes signature tested by ddPCR has been able to identify patients with an impaired self-renewal ability, ultimately correlated with a worse outcome.
To further confirm these data, the ddPCR assay was employed to evaluate the HH gene expression on a prospective cohort of patients. Although this patient population was relatively small, an immunophenotypic profile allowed the stratification of patients according to the presence of either mature or immature plasma cells according to the expression of CD19 and CD81 markers. As a general observation, the HH genes' absolute quantification values were at least 10-fold higher compared to those obtained in the validation cohort; this is probably due to the better quality of input RNA, which was extracted just before the ddPCR experiment, as opposed to the validation set's RNA, which was older (the quantity was always the same). Testing the 10 HH genes signature by ddPCR assay in this cohort of patients confirmed that patients with less differentiated PCs actually downregulated the HH pathway, as we previously observed through gene expression profiling.
This opens the possibility to integrate molecular and immunophenotypic data in a Maturation Index, which might be prospectively employed, in order to stratify patients according to the differentiation status of their PCs both at diagnosis and during the disease course, in order to evaluate whether therapy selective pressure might possibly change the PCs clone's molecular and immunophenotypic make-up.
Finally, the 10 HH genes signatures' ddPCR assay might represent a good example of validation and translation of gene expression profile data into a ready-to-use technique, which is possibly prospectively applicable. Validation on the retrospective cohort strongly supports its application in the near future. However, the development of a Maturation Index through molecular and immunophenotypic data integration needs to be further validated, even though these preliminary data suggested its relevance in order to stratify patients according to the differentiation status of their plasma cells.
To our knowledge, this represents the first technical report of a validation of gene expression data by ddPCR instead of classical real-time qPCR. In addition, it offers suggestions on how to develop a Maturation index that possibly defines the PCs' differentiation status of MM patients upfront, information that would be clinically relevant in the near future.

Patients
For study purposes, the 10 HH genes signature ddPCR assay was validated both in a retrospective cohort of one hundred twenty-six patients (aged 18-65 years) with NDMM, as well as in an independent prospective cohort of twenty-two NDMM patients. Written informed consent was obtained from each patient. Molecular characteristics of the retrospective cohort have been previously reported [11], and patients were divided into: Cluster1 (Clu1), including 39 patients who under-expressed the 10 HH genes signature; Cluster 2 (Clu2), composed of 37 patients who overexpressed the pathway; and "test-set", including 50 pts, whose affinity with one cluster or with the other has previously been estimated, according to the distance in terms of gene expression from either Clu1 or Clu2.

Sample Collection and Cell Fraction Enrichment
Bone marrow (BM) samples for molecular studies were obtained during standard diagnostic procedures. Mononuclear BM cells were obtained by Ficoll-Hypaque density gradient centrifugation. An immunomagnetic beads-based strategy (MACS system, Miltenyi Biotec, Auburn, CA) was employed to isolate both plasma cells and progenitor populations. More specifically, CD138+ cells were purified by positive selection with a specific anti-CD138 antibody; in 22 patients, the B-cell fraction was enriched by depletion of all non-B cells (T cells, NK cells, monocytes, dendritic cells, granulocytes, platelets, and erythroid cells) using a cocktail of biotinylated CD2, CD14, CD16, CD36, CD43, and CD235a antibodies. The purity of positively selected plasma cells was assessed by flow cytometry using CD138, CD38, and CD45 antibodies; similarly, the CD138 negative fraction was evaluated before and after separation for the presence of CD19 and CD27 markers (Miltenyi Biotech, Auburn, CA).

RNA Processing and Droplet Digital PCR (ddPCR)
RNA was isolated using the Maxwell ® 16 Total RNA purification kit and then at least 100 ng was processed with the SuperScript ® IV Reverse Transcriptase to obtain the cDNA. Next, 20 µL reaction mixtures containing the 2x ddPCR Supermix for probes (no dUTP), 20x target primers/probe mix (FAM/HEX), cDNA templates were prepared and loaded into the QX200™ Droplet generator. After droplet generation, droplets were carefully transferred into a 96-well plate and a thermal cycler was performed as follows: 95 • C for 10 min for enzyme activation, denaturation at 94 • C per 30 s, and annealing/extension at 58 • C for 1 min, repeated for 40 cycles, and, finally, enzyme deactivation for 10 min. Ramp rate was set at 2 • C/s and annealing temperature was maintained at 58 • C for all the designed assays. Droplet generation and transfer of emulsified samples to PCR plates were performed according to the manufacturer's instructions (QX200™ Droplet Digital PCR-ddPCR™ System-Bio-Rad; Instruction manual, QX200™ Droplet Generator-Bio-Rad, Hercules, CA, USA).

Data and Statistical Analysis
Gene expression data derived from microarray were normalized and processed as previously described [11]. Briefly, in order to obtain model-based expression values, all samples were normalized and analyzed by invariant set normalization using dChip software [29], where a baseline array was automatically selected according to its median brightness and good call percentage among the array group (MM_107); data were modeled, and perfect match intensities were background adjusted. In some cases, several probes in the array were annotated on the same gene. Pearson's correlation coefficient was evaluated between these multiple probes over all samples, and only those with the highest coefficients were selected. The absolute quantity of DNA per sample (copies/µL) was evaluated by QuantaSoft™ Software (v. 1.7.4). Comparisons between patients' groups were performed using Pearson's χ2 test or Fisher's exact test, as appropriate, for categorical data, and the Kolmogorov-Smirnov test for continuous data. The %CV (Standard deviation/Mean*100) was also used to assess variability.