High-Dimensional Analysis of Single-Cell Flow Cytometry Data Predicts Relapse in Childhood Acute Lymphoblastic Leukaemia

Simple Summary B-cell Acute Lymphoblastic Leukaemia is one of the most common cancers in childhood, with 20% of patients eventually relapsing. Flow cytometry is routinely used for diagnosis and follow-up, but it currently does not provide prognostic value at diagnosis. The volume and the high-dimensional character of this data makes it ideal for its exploitation by means of Artificial Intelligence methods. We collected flow cytometry data from 56 patients from two hospitals. We analysed differences in intensity of marker expression in order to predict relapse at the moment of diagnosis. We finally correlated this data with biomolecular information, constructing a classifier based on CD38 expression. Abstract Artificial intelligence methods may help in unveiling information that is hidden in high-dimensional oncological data. Flow cytometry studies of haematological malignancies provide quantitative data with the potential to be used for the construction of response biomarkers. Many computational methods from the bioinformatics toolbox can be applied to these data, but they have not been exploited in their full potential in leukaemias, specifically for the case of childhood B-cell Acute Lymphoblastic Leukaemia. In this paper, we analysed flow cytometry data that were obtained at diagnosis from 56 paediatric B-cell Acute Lymphoblastic Leukaemia patients from two local institutions. Our aim was to assess the prognostic potential of immunophenotypical marker expression intensity. We constructed classifiers that are based on the Fisher’s Ratio to quantify differences between patients with relapsing and non-relapsing disease. We also correlated this with genetic information. The main result that arises from the data was the association between subexpression of marker CD38 and the probability of relapse.


Preprocessing of Flow Cytometry Files
The files were first imported into FlowJo (Becton Dickinson, 10.6.1) and FACSDiva (Becton Dickinson, 8.0.1) and inspected manually. Quality control was performed and margin events, debris, dead cells, and doublets were removed, as shown in Figure 1 steps 1-4. The files were then further processed in R (3.6.0) and RStudio (1.2.1335). This software, in conjunction with Bioconductor (3.11), provides packages and methods for analysis of flow cytometry data. The tubes were compensated by means of the spillover matrix that was included in each file and then transformed with the Logicle transformation [21] included in the flowCore package (2.0.1) [22] with parameters w = 0.75, t = 262,144, m = 4.5, and a = 0. Our next step was to bring, into a single file, the information that is contained in each of the patient's tubes. Because each tube contains marker intensity for different markers and cells, the full set of 20 markers was not available for any of the cells, as shown in Figure 1 step 5. This posed a problem of missing data imputation, which is addressed in different ways in the context of flow cytometry [23][24][25][26]. We followed the methodology that is described in [23], which consists of nearest-neighbour imputation while using the common or backbone markers in all aliquots. The result of this process was a set of 56 files, one per patient, containing complete information on the IPT markers selected for the analysis. After this step, 10 5 events were randomly sampled from each file in order to have the same number of cells for each patient.
Because data of multicentre retrospective studies can be affected by batch effects and technical variations across time and centre, we performed a normalisation that was based on a modified min-max transformation. This transformation brings all data points to the range [0, 1], but it is sensitive to outliers. Instead of selecting the maximum and minimum values, we chose quantiles 0.05 and 0.95 and applied the following transformation: where x q0.05 is the 5th percentile and x q0.95 is the 95th percentile. Finally, we used the common B-cell antigen CD19 to select the B-cell subpopulation, as shown in Figure 1 step 6. We did this in an automated way by means of the flowDensity package (1.24.0) [27], which incorporates methods for density-based cell population identification. The files were finally imported into MATLAB (Mathworks, R_2020a) via the fca_readfcs function [28]. Step 1: Quality control Step 2: Remove margin events Step 3: Remove doublets Step 4: Remove debris Step 5: Normalize and Merge Step 6: Select B lymphocytes   Figure 1. Preprocessing pipeline of Flow Cytometry Files. Preprocessing was carried out in six steps. The first four were performed in FlowJo and they consisted in the removal of abnormal acquisitions (quality control), margin events, doublets, and debris. The files were then imported into R in step 5 and, for each patient, all tubes or aliquots were merged into a single file by means of nearest-neighbour imputation. Finally, in step 6, the CD19 + population (B cells) was automatically selected for further analysis.

Marker Expression Characterisation
In flow cytometry, the IPT marker expression level is usually indicated by the median fluorescence intensity (MFI). In addition to this value, we also consider the mean fluorescence intensity and standard deviation. These are single-parametric representations of marker expression profiles. We fitted the cumulative distribution of every marker ( Figure 2) to a generalized logistic equation in order to capture more information: Here, I represents normalized marker intensity and F I gives the accumulated intensity. The generalized logistic equation represents families of sigmoid functions. In particular, γ = 1 gives the conventional logistic equation and, as γ approaches zero, it yields the Gompertz equation. These are two well-known sigmoid functions. The carrying capacity K was set to 1 and initial value was fixed to the first percentile of the distribution (this determines parameter A). Parameter α represents the growth rate of the curve, and parameter γ provides information regarding the position of the point of maximum growth. Fitting was performed in MATLAB with function lsqcurvefit.  2). Histograms cell count of, respectively, j 1 = CD10 and j 2 = CD20. (C.1) and (C.2). Cumulative distribution of markers j 1 = CD10 and j 2 = CD20, respectively. In red, percentiles curve from 5th to 95th percentile. (D) Each percentile curve for each patient i and marker j results in a vector x ij ∈ R, where P represents the number of percentiles chosen.

Fisher's Linear Discriminant for Relapse Prediction
We consider x ij ∈ R P as vectors that were obtained for each patient i and each common feature j, for i = 1, ..., N patients, j = 1, ..., M IPT markers, and P a number of percentiles. Thus, for each patient i, this results in a matrix X ∈ R M×P of the P percentiles from all IPT markers M, as shown in Figure 2. Let us define the general Fisher's Ratio (FR) Matrix FR ∈ R M×P [29], where for each IPT marker j in each percentile k, for j = 1, .., M and k = 1, ..., P. In this case, µ R jk and µ N jk are the median of percentiles k for the distribution of the IPT marker j in each class of patients. R stands for relapsing patients, while N refers to the non-relapsing ones. Parameters σ R jk and σ N jk are the standard deviation measures within the classes. We can select the highest FR jk in order to construct a classifier, thus obtaining percentiles from several IPT markers with the lowest deviation and highest difference in median between each subset. Thus, we would obtain a general discriminant classifier of M * ≤ M markers and associated discriminant percentiles P * ≤ P.
Let us now consider a certain IPT marker j and percentile k. We can then associate it to a specific central measure µ R jk or µ N jk and dispersion measure σ R jk or σ N jk for each class of patients. Thus, we set two control points If we now consider a new patient that is not assigned to any set andv jk as the value of percentiles k of IPT marker j, we can compute a control point This point is normalised by the mean of both dispersion measures, as we consider P to be a non-assigned patient control point. Now, we can use a distance function d : to measure the separation between the new patientP jk and the control pointsR jk andN jk (Figure 3).
We construct a probability measure P for each IPT marker and percentile as The mean of the probability measures for all of the IPT markers selected for each patient may allow for us to classify the patient in the relapsing or non-relapsing classes.

Classifier Construction and Feature Relevance
When constructing classifiers, we split the dataset in test and train set. We used the train set to select the most significant IPT markers j according to the Fisher Ratio (FR jk > 0.5). We used the test set in order to assess the performance of the classffier by means of a receiver operating characteristic (ROC) curve and its associated Area Under Curve (AUC). Accuracy was obtained as the number of correctly classified samples divided by the total number of classified samples. Along with these magnitudes, we computed from each confusion matrices the sensitivity, specificity, positive predictive value (PPV), and negative predictive value (NPV).
K-fold and Leave-One-Out cross-validation (LOOCV) techniques were applied. We performed each method 20 times in order to obtain a more robust measure of the performance [30] and ensure that the selection of the most important features in each classifier was not dependent on the dataset partition. For both techniques, a minimum of one patient of each set was always in the training set. R < l a t e x i t s h a 1 _ b a s e 6 4 = " H / w S c U W I 2 J 8 X F v S 3 o b b f y t G y U V I = " > A A A B 7 n i c b Z D L S s N A F I Z P 6 q 3 W W 9 W l m 2 A R X J V E B F 0 W 3 b i s Y i / Q h H I y P W m H T i 7 O T I Q S + h q 6 E n X n w / g C v o 3 T m o W 2 / q t v z v 8 P n P 8 E q e B K O 8 6 X V V p Z X V v f K G 9 W t r Z 3 d v e q + w d t l W S S U Y s l I p H d A B U J H l N L c y 2 o m 0 r C K B D U C c b X M 7 / z S F L x J L 7 X k 5 T 8 C I c x D z l D b U a + F 6 D M P Y Y i v 5 t O + 9 W a U 3 f m s p f B L a A G h Z r 9 6 q c 3 S F g W U a y Z Q K V 6 r p N q P 0 e p O R M 0 r X i Z o h T Z G I f U M x h j R M r P 5 0 t P 7 Z M w k b Y e k T 1 / / 8 7 m G C k 1 i Q K T i V C P 1 K I 3 G / 7 n 9 T I d X v o 5 j 9 N M U 8 x M x H h h J m y d 2 L P u 9 o B L Y l p M D C C T 3 G x p s x F K Z N p c q G L q u 4 t l l 6 F 9 V n e d u n t 7 X m t c F Y c o w x E c w y m 4 c A E N u I E m t I D B A z z D G 7 x b q f V k v V i v P 9 G S V f w 5 h D + y P r 4 B B p S P 8 A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H / w S c U W I 2 J 8 X F v S 3 o b b f y t G y U V I = " > A A A B 7 n i c b Z D L S s N A F I Z P 6 q 3 W W 9 W l m 2 A R X J V E B F 0 W 3 b i s Y i / Q h H I y P W m H T i 7 O T I Q S + h q 6 E n X n w / g C v o 3 T m o W 2 / q t v z v 8 P n P 8 E q e B K O 8 6 X V V p Z X V v f K G 9 W t r Z 3 d v e q + w d t l W S S U Y s l I p H d A B U J H l N L c y 2 o m 0 r C K B D U C c b X M 7 / z S F L x J L 7 X k 5 T 8 C I c x D z l D b U a + F 6 D M P Y Y i v 5 t O + 9 W a U 3 f m s p f B L a A G h Z r 9 6 q c 3 S F g W U a y Z Q K V 6 r p N q P 0 e p O R M 0 r X i Z o h T Z G I f U M x h j R M r P 5 0 t P 7 Z M w k b Y e k T 1 / / 8 7 m G C k 1 i Q K T i V C P 1 K I 3 G / 7 n 9 T I d X v o 5 j 9 N M U 8 x M x H h h J m y d 2 L P u 9 o B L Y l p M D C C T 3 G x p s x F K Z N p c q G L q u 4 t l l 6 F 9 V n e d u n t 7 X m t c F Y c o w x E c w y m 4 c A E N u I E m t I D B A z z D G 7 x b q f V k v V i v P 9 G S V f w 5 h D + y P r 4 B B p S P 8 A = = < / l a t e x i t > < l a t e x i t s h a 1 _ b a s e 6 4 = " H / w S c U W I 2 J 8 X F v S 3 o b b f y t G y U V I = " > A A A B 7 n i c b Z D L S s N A F I Z P 6 q 3 W W 9 W l m 2 A R X J V E B F 0 W 3 b i s Y i / Q h H I y P W m H T i 7 O T I Q S + h q 6 E n X n w / g C v o 3 T m o W 2 / q t v z v 8 P n P 8 E q e B K O 8 6 X V V p Z X V v f K G 9 W t r Z 3 d v e q + w d t l W S S U Y s l I p H d A B U J H l N L c y 2 o m 0 r C K B D U C c b X M 7 / z S F L x J L 7 X k 5 T 8 C I c x D z l D b U a + F 6 D M P Y Y i v 5 t O + 9 W a U 3 f m s p f B L a A G h Z r 9 6 q c 3 S F g W U a y Z Q K V 6 r p N q P 0 e p O R M 0 r X i Z o h T Z G I f U M x h j R M r P 5 0 t P 7 Z M w k b Y e k T 1 / / 8 7 m G C k 1 i Q K T i V C P 1 K I 3 G / 7 n 9 T I d X v o 5 j 9 N M U 8

< l a t e x i t s h a 1 _ b a s e 6 4 = " n Q T d e S / B N F c V a 9 O n + o 9 9 M u X 9 f z E = " > A A A B 7 n i c b Z D L S s N
n L n s Z 3 A J q U K j Z r 3 5 6 g 4 R l E c W a C V S q 5 z q p 9 n O U m j N B 0 4 q X K U q R j X F I P Y M x R q T 8 f L 7 0 1 D 4 J E 2 n r E d n z 9 + 9 s j p F S k y g w m Q j 1 S C 1 6 s

l a t e x i t s h a 1 _ b a s e 6 4 = " n Q T d e S / B N F c V a 9 O n + o 9 9 M u X 9 f z E = " > A A A B 7 n i c b Z D L S s N
n L n s Z 3 A J q U K j Z r 3 5 6 g 4 R l E c W a C V S q 5 z q p 9 n O U m j N B 0 4 q X K U q R j X F I P Y M x R q T 8 f L 7 0 1 D 4 J E 2 n r E d n z 9 + 9 s j p F S k y g w m Q j 1 S C 1 6 s

l a t e x i t s h a 1 _ b a s e 6 4 = " n Q T d e S / B N F c V a 9 O n + o 9 9 M u X 9 f z E = " > A A A B 7 n i c b Z D L S s N
n L n s Z 3 A J q U K j Z r 3 5 6 g 4 R l E c W a C V S q 5 z q p 9 n O U m j N B 0 4 q X K U q R j X F I P Y M x R q T 8 f L 7 0 1 D 4 J E 2 n r E d n z 9 + 9 s j p F S k y g w m Q j 1 S C 1 6 s

l a t e x i t s h a 1 _ b a s e 6 4 = " n Q T d e S / B N F c V a 9 O n + o 9 9 M u X 9 f z E = " > A A A B 7 n i c b Z D L S s N
n L n s Z 3 A J q U K j Z r 3 5 6 g 4 R l E c W a C V S q 5 z q p 9 n O U m j N B 0 4 q X K U q R j X F I P Y M x R q T 8 f L 7 0 1 D 4 J E 2 n r E d n z 9 + 9 s j p F S k y g w m Q j 1 S C 1 6 s  To obtain an idea of feature relevance, we performed 100 random splits with a training to test ratio of 75:25. For each split, we used the training data in order to construct a classifier and computed its performance on the test data as described above. As a result of this we gathered a collection of 100 classifiers and their performances. We next set an accuracy threshold and computed the frequency with which every marker was used in the set of classifiers that were above that threshold. This calculation was performed for different values of the accuracy threshold.
Finally, we performed 100 Random Forest classifications with 50 trees each and a 75:25 split of patients in order to contrast these results with other methods. Out-of-bag error and permutation feature relevance were obtained to assess feature importance. Figure 4 shows a summary of the dataset exploration, classifier construction and feature relevance analysis.

Statistical Analysis
Median, mean intensity, and standard deviation were compared with t-tests, when considering a p-value that is lower than 0.05. The parameters of the generalized logistic function fit were also compared with t-tests. Correlations between percentiles of marker expression and clinical and genetic data were done with Pearson correlation coefficient. T-tests were performed to assess the significance. These computations were carried out in Python with package Pingouin [31].

Results
Exploratory analysis reveals differences in marker expression and accumulated intensity profile. Marker CD38 showed statistically significant differences in the median in all datasets, with underexpression in relapsing patients. Marker CD66c showed significant differences in median in Dataset 2, this time being overexpressed in relapsing patients ( Figure 5). Mean fluorescence intensity showed the same statistical differences as the median fluorescence intensity (see Figure S1 in Supplementary Information). Statistical differences for standard deviation were only found in Dataset 2, for IPT markers CD22, and IGM. (Figure S2 in Supplementary Information) We also fitted cumulative intensity profiles to a a generalised logistic equation (see Equation (2)). We compared the parameters α and γ between relapsing and non-relapsing patients. Statistical differences between the averages of these parameters were found for markers CD38, CD45, and CD33. However, these differences disappeared when using median parameter values (see Figure S3). Figure S4 shows an example of this fitting. CD38 distribution differed significantly between relapsing and non-relapsing patients. While noting there were significant differences in MFI, we next considered the full distribution by means of the percentile curves. We employed Fisher's ratio in order to obtain a measure of the degree of separation between relapsing and non-relapsing patients.
The results for FR differences between relapsing and non-relapsing patients are shown in Figure 6. We also show the median cumulative distribution of the most relevant markers. For Dataset 1, CD38 FR was high in almost all percentiles, with FR jk > 0.3, as seen in Figure 6(A.1). IPT marker CD123 had high FR for the highest percentiles, with FR jk > 0.3 for k ∈ (50, 95). For Dataset 2, the differences between FR were significantly higher, with FR jk > 3.5 in percentiles j ∈ (20, 95) for IPT marker CD38, and mean FR jk > 2.5 for IPT marker CD66c, as shown in Figure 6(B.1). For the combination of both datasets, only CD38 achieved a high FR with mean FR jk > 0.9 in all percentiles, as shown in Figure 6(C.1). Immunophenotypical markers CD38 and CD123, for Dataset 1, and markers CD38 and CD66c, for Dataset 2, predicted relapse after repeated cross-validation. We next assessed the predictive value of the differences that were reported above, constructing classifiers, as explained in Section 2.7. We performed repeated K-fold cross validation and LOOCV. The results are shown in Table 2. We observed that the markers represented in Figures 6(A.2, A.3, B.2, B.3, C.2) were always present in the classifiers scoring high in accuracy. The maximal number of folds was determined by the number of relapsing patients (8 for Dataset 1, and 5 for Dataset 2). Train-test splitting revealed other markers with potential predictive value. We tested the predictive value of the variables by splitting all datasets (1, 2, and both combined) into training and test set with ratio 75:25. We used the training set in order to construct a classifier with the most relevant features according to FR. Figure 7A shows the frequency of the IPT markers used in the classifiers for the 100 different splits. IPT marker CD38 arose again as the marker used in all of the classifiers, while CD33 was used on almost 70% of them. We then assessed the performance of every classifier on their respective 25% test sets. Having obtained the accuracy for the 100 classifiers, we measure the frequency of IPT markers in the classifiers whose prediction accuracy was above a given threshold, as shown in Figure 7B. IPT markers CD13, CD24, CD33, CD38, CD45, and CD66c are consistently selected in classifiers with an accuracy above 50%.
Random-Forest analysis matched the results from the constructed classifiers. After 100 random forests, IPT markers CD33, CD38, and CD66c were the only ones with positive Out-of-bag feature importance ( Figure 7C). However, after repeating the simulations only considering these markers, Out-Of Bag Classification Error was not significantly lower in comparison to the analysis with the whole set of IPT markers (mean out-of-bag error of 0.28 versus 0.31, respectively) ( Figure 7D). Nevertheless, feature importance coincided with those markers with highest frequency in the previous analyses.  CD38 marker expression correlated with genetic information. We finally correlated marker CD38 expression with clinical and cytogenetic information. Figure 8 shows the results. For the sake of readability, we only selected percentiles 15, 50, and 85 (P 15 , P 50 and P 85 ) to represent low, normal, and high expression of CD38. The expected correlations include those between age and relapse age, as well as CD38 percentiles between themselves and with relapse. High CD38 expression as measured by P 85 showed significant direct correlation (p = 0.03) with the presence of chromosomic alteration t(12;21) and significant inverse correlation with hyperdiplod karyotype (p = 0.039). These correlations were not significant for lower percentiles. Hyperdiploid karyotype was also directly correlated with relapse and with time to relapse from diagnosis (p = 0.041 and p = 0.047, respectively). We also found that female patients showed a higher expression of CD38 than male patients (p = 0.02).

Discussion and Conclusions
The unprecedented amount and complexity of clinical data that is available nowadays has resulted in the proliferation of bioinformatics pipelines and artificial intelligence algorithms. There are a number of issues that still hamper the integration of AI and the respective clinical context. As happens in general with the relationship between mathematics and medicine, researchers at both ends often speak a different language [32]. Many AI algorithms behave as a "black boxes", providing an outcome directly from raw data and hindering a mechanistic interpretation of the underlying phenomena. For clinical use, it is highly desirable that the features that are uncovered by these algorithms can be interpretable and actionable. As Radakovich et. al. puts it, "Algorithms can only be as clinically meaningful as the outcomes that they are designed to predict" [33].
In this work, we designed an intuitive algorithm that purports to identify patients at diagnosis with potential of relapse versus those with no risk of relapse in B-cell childhood ALL. We used flow cytometry data that were obtained at diagnosis from two local institutions and based the analysis on two concepts that are already employed in this context; the intensity and range of surface markers expression and frequency of cells within that range. Currently, this information is used for the identification and quantification of the leukaemic population, but marker expression level is not used as a prognostic factor. In our analysis, we took this into account by assigning each patient and marker its percentile curve and then used the Fisher's ratio to look for meaningful differences between both groups of patients. That approach allowed for us to construct a classifier based on this measure in order to assess the significance of the previously obtained differences. Given the small sample size, we used the cross-validation routines to assess the validity of the Fisher's ratio-based measure. This was later correlated with genetic information from the patients. Despite the exploratory nature of the study, we were able to find some common trends in the data.
We first carried out a more conventional analysis of marker expression by means of median and mean fluorescence intensity. This showed significant differences between relapsing of non-relapsing patients in CD38 in Dataset 1 and 2, as well as in both combined. CD66c also showed significant differences in Dataset 2. In order to look further into this, we moved from a one-parametric representation to a bi-parametric representation by means of logistic curve fitting of intensity profiles. A comparison of the fitted parameters did not yield significant results, which suggests that differences are not in the shape of the distribution but on the level of expression.
In order to delve into these differences, we compared the whole distribution by means of percentile vectors. We observed that the Fisher's ratio displayed differences in the expression levels between relapsing and non relapsing patients. This was especially significant for the second dataset. Given that both of the datasets were pre-processed identically, the difference in the significance of the measure could be due to either sample size or different acquisition routines in either hospital. We expect to have a clearer understanding of this after increasing the number of patients in each dataset or the number of datasets as a whole. K-fold cross validation showed that, when restricting the analysis to the most important features according to the previously calculated Fisher's ratio, the algorithm was able to better separate between relapsing and non relapsing patients, always using data only available on diagnosis.
The measurements of performance yielded good values for this biomarker, as measured by Accuracy and AUC. However, although specificity was high, we obtained a low negative predictive value, i.e., the algorithm underperformed when detecting relapses. This could be due to the intrinsic unbalance in the datasets, with only 25% of relapsed patients. The relevant information, nonetheless, was the agreement in the selection of the most important features. The Monte-Carlo based and Random Forest feature importance computation later confirmed this.
The most consistent result, in the different analyses and for both local institutions, was the association between a lower expression of CD38 marker and relapse. CD38 is a surface receptor that is present in a broad variety of immune cells. It is considered to be a cell activation marker and it operates both as a receptor and an enzyme [34]. In the B cell compartment, both bone marrow precursors and terminally differentiated cells express CD38 [35]. In the context of haematological disease, high CD38 levels have been associated with worse prognosis in Chronic Lymphocytic Leukaemia [36]. Previous studies have suggested that CD38 is a suitable therapeutic target in both AML and ALL [37,38]. There has been some controversy concerning the existence of a CD34+/CD38-population of leukaemia initiating stem cells [39][40][41][42]. In B-ALL, the accumulated evidence indicates that lower levels of CD38 could be associated with a worse outcome in terms of survival [43][44][45][46]. Our results aligned with this evidence, which suggested that a higher frequency of low CD38 expressing B cells could be an early indicator of relapse risk.
Other markers that were found to be relevant in this study were CD33 and CD66c. These two markers are normally expressed in the cells of the myeloid lineage, and they have been linked to paediatric B-ALL in the context of myeloid antigen expressing B-cell malignancies. This refers to the fact that some malignant B cells can express markers from the myeloid line. CD66c is the most frequently observed aberrant myeloid antigen in B-cell ALL. Upon studying the correlation of the expression of this antigen with known prognostic factors, previous studies have found that CD66c is associated with BCR/ABL translocation, which has been shown to confer the worst prognosis [47][48][49]. Here, we found that relapsing patients were more prone to the overexpression of this marker on diagnosis. With respect to CD33, there has been some controversy regarding its prognostic value, but evidence suggests that the presence of high expressing CD33+ cells identifies patients with worse prognosis [50], contrary to the differences that are exhibited by the percentile curves here.
The immunophenotypical marker CD123 was also highlighted by Fisher's ratio, but only in Dataset 1. Its importance could not be further assessed since it was not available in Dataset 2. This marker was first described as a marker of Acute Myeloid Leukaemia stem cells. It was later shown to be uniformly expressed in B-ALL blasts, being proposed for the detection of minimal residual disease [51,52] and recently identified as a potential target for immunotherapies [53,54]. Interestingly, high expression of CD123 correlated with hyperdiploid karyotype, an indicator of favourable prognosis in childhood B-cell ALL [55]. In our cohort, we found a high proportion of CD123 expressing cells in relapsing patients.
Finally, we complemented the analysis of CD38 expression with clinical, cytogenetic, and molecular biology information, also relevant in the prognostic assessment of haematological diseases. CD38 intensity correlated inversely with the presence of hyperdiploid karyotypes, something that has been previously reported in the literature for the case of B-ALL [56,57]. This is interesting, because the presence of hyperdiploid karyotype is a favorable prognostic factor, while we and others found that the low expression of CD38 is an indicator of relapse. In fact, in our dataset hyperdiploid karyotype correlated with possible relapse. This is a matter of further exploration. On the contrary, the correlation between high expression of CD38 and the existence of translocation t(12;21) agrees on predicting a favourable outcome. Another interesting result is that percentile 85 presented significant correlation with a number of these features. This points out the importance of considering multi-parametric analysis of immunophenotypic markers, instead of the more conventional MFI.
In this sense, while differences in CD38 expression according to Fisher's Ratio were present across the whole range of expression of the marker, that was not the case for CD33 and CD123. For those markers, differences were only observed in the low expression region for the former and in the high expression region for the latter. The fact that there is less evidence for their prognostic value suggests that the method presented here only leads to significant results if there is a constant difference in expression levels between both sets of patients. Indeed, this is a limitation of the study; by representing the expression as a percentile curve, we may miss information that can be clinically relevant and that refers not to the frequency of cells or intensity of expression, but to the presence or absence of a given subpopulation. In this regard, we already mentioned that a subpopulation of CD34+/CD38-cells could be associated with leukaemia initiating cells, and the same could happen for a restricted subpopulation of CD34+/CD38-/CD123+; this one agrees with the results that are presented in this paper. Another downside of using percentile curves is the high degree of correlation between percentiles of a given marker. In this sense, alternative representations should be considered when relating marker expression information to other kind of data. For example, this could be done by considering cell percentage rather than fluorescence intensity.
Another limitation of our analysis is the data, a recurrent concern in artificial intelligence in haematology [33]. Given the limited size of the dataset, conclusions should be contrasted in an extended future study. This is why we focused on building a robust methodology, keeping the exploratory nature of the work in mind. Besides, the set of relapsing patients only represented 25% of the whole dataset and unbalance could introduce biases in the analysis. A larger dataset should allow for a 50/50 analysis. Finally, there is the issue of data variability, given that it was collected retrospectively, belonging to patients from different years and hospitals. This highlights the importance of the preprocessing routine, which is also amenable to improvements in order to ensure the comparability of the samples. These weaknesses provide future lines of work. While in the process of recruiting more patients and hospitals, efforts will we directed towards the automation of the preprocessing workflow and towards the combination with more complex analyses, like dimensionality reduction, network analysis, and clustering.
Notwithstanding these limitations, this works adds to the growing field of artificial intelligence in haematology and, specifically, in B-cell childhood Acute Lymphoblastic Leukaemia. We attempted to delineate differences in marker expression between patients who relapse from the disease and those that respond to treatment, obtaining results that are directly interpretable from the clinical point of view. The main result would be the underexpression of surface marker CD38 at diagnosis in patients experiencing relapse after the first-line chemotherapy treatment. This result emerged from an unspecific comparison of intensity of marker expression and not as a particular aim of the study, which favours its extension to any other disease that uses flow cytometry as a routinary clinical tool. We hope that this contribution will be found to be useful for the purpose of exploiting flow cytometry data and marker expression level for prognostic assessment in paediatric leukaemia and other malignancies.
Supplementary Materials: The following are available online at https://www.mdpi.com/2072-669 4/13/1/17/s1: Figure S1: Mean of immunophenotypic markers for relapse and non-relapse patients, Figure S2: Standard deviation of immunophenotypic markers for relapse and non-relapse patient, Figure S3: Markers with significant differences in parameters coming from intensity profiles fitting, Figure S4: Example of curve fitting of CD38 expression.