Prediction of Clinical Outcomes with Explainable Artificial Intelligence in Patients with Chronic Lymphocytic Leukemia

Background: The International Prognostic Index (IPI) is applied to predict the outcome of chronic lymphocytic leukemia (CLL) with five prognostic factors, including genetic analysis. We investigated whether multiparameter flow cytometry (MPFC) data of CLL samples could predict the outcome by methods of explainable artificial intelligence (XAI). Further, XAI should explain the results based on distinctive cell populations in MPFC dot plots. Methods: We analyzed MPFC data from the peripheral blood of 157 patients with CLL. The ALPODS XAI algorithm was used to identify cell populations that were predictive of inferior outcomes (death, failure of first-line treatment). The diagnostic ability of each XAI population was evaluated with receiver operating characteristic (ROC) curves. Results: ALPODS defined 17 populations with higher ability than the CLL-IPI to classify clinical outcomes (ROC: area under curve (AUC) 0.95 vs. 0.78). The best single classifier was an XAI population consisting of CD4+ T cells (AUC 0.78; 95% CI 0.70–0.86; p < 0.0001). Patients with low CD4+ T cells had an inferior outcome. The addition of the CD4+ T-cell population enhanced the predictive ability of the CLL-IPI (AUC 0.83; 95% CI 0.77–0.90; p < 0.0001). Conclusions: The ALPODS XAI algorithm detected highly predictive cell populations in CLL that may be able to refine conventional prognostic scores such as IPI.


Introduction
Chronic lymphocytic leukemia (CLL) is the most common leukemic disease in Western countries [1]. The WHO Classification for 2022 categorized CLL as a mature B-cell neoplasm [2]. CLL is diagnosed by the characteristic immunophenotype in multiparameter flow cytometry (MPFC) B-cell panels [3,4]. The prognosis of CLL is heterogeneous; some patients will not require treatment and some patients will progress quickly and some transform into high-grade lymphoma (Richter syndrome). Traditionally, the Rai and Binet classifications of CLL were used for clinical staging and to estimate the prognosis based primarily on leukemia burden [5,6]. In addition to the disease burden, genetic factors such as TP53 mutation or 17p deletion, 11q deletion, and a complex karyotype indicate a poor prognosis, while deletion of 13q14 and trisomy 12 harbors a favorable prognosis [7][8][9]. Furthermore, mutations in the NOTCH1, SF3B1, and BIRC3 genes are associated with shorter survival [10][11][12][13]. Expression of CD38 and ZAP-70 on CLL cells has been associated with unmutated IGHV and higher levels of beta2-microglobulin indicating an adverse prognosis [14][15][16][17][18][19][20].
However, the five parameters of the CLL-IPI or single markers such as CD38 and ZAP-70 may not reflect the genetic, pathophysiological, and prognostic heterogeneity of the CLL at the individual level compared to structures in large and complex datasets.
The MPFC of peripheral blood from patients with CLL generates individual and complex high-dimensional data from malignant CLL cells and non-malignant surrounding white blood cells. MPCF data from CLL patients are acquired routinely for diagnostic purposes. Dimensionality reduction techniques such as self-organizing maps (SOM) have already been applied to enhance the interpretation of MPFC data [22,23]. Furthermore, the Citrus (cluster identification, characterization, and regression) algorithm can be applied to find cell-type specific differences between groups in MPFC data [24,25].
Our group described the algorithmic population description approach (ALPODS) which is based on explainable artificial intelligence (XAI) and has been used for classification tasks. The XAI provides sample-based explanations of its decisions by visualizing the immune cell populations which were distinctive for bone marrow compared to peripheral blood [26,27]. ALPODS delivered disjunct cell populations which can be visualized in usual flow cytometry two-dimensional dot plots of FCS data files. This enables human flow cytometry operators to classify these cell populations by conventional MPCF gating.
MPFC data of diagnostic B-cell panels should include information about the individual CLL prognosis and outcome. Therefore, we used the ALPODS algorithm to identify the crucial cell populations that are overrepresented in CLL patients who experienced death or failure of the first line of systemic therapy [26].

Patients, Data Acquisition, and Processing
MPFC data of the peripheral blood from 157 unselected patients with CLL diagnosis were re-analyzed for this study and matched to clinical data. MPFC data were acquired for routine diagnostic analysis at the University Hospital Marburg from 2014 to 2020. The study was approved by the local ethics committee in Marburg. Clinical data included the following: CLL-IPI (i.e., TP53 mutation status, IGHV mutation status, age, Binet, beta2microglobulin), sex, ECOG, Richter transformation, treatment, date of death, treatment failure, and last follow-up. In case of incomplete or unknown CLL-IPI parameters, half of the score points of the missing parameters were given. The patients were separated into a group with an inferior outcome and a group with a superior outcome. Patients who died during follow-up and had a failure of the first-line systemic therapy were categorized as TTF 1 (time to first-line treatment failure), that is, inferior outcome. All other patients were classified as TTF 0 which indicated a superior CLL outcome.
We used the ALPODS XAI algorithm [26] to identify cell populations in flow cytometry data that were over or underrepresented in patients with the inferior result (TTF 1) or superior outcomes (TTF 0). The predictive value of the XAI populations was compared to the frequency of CD38-positive CLL cells and CLL-IPI on the receiver operating characteristic (ROC) curves. Multiple logistic regression analysis in repeated 10 bootstrap trials was performed for the combination of more than one independent variable to predict dichotomous groups. Therefore, three randomly selected patients from each group (TTF 0 and TTF1) were left out 10 times to test if the results can be generalized for other patients.
Two 5 mL polystyrene FACS tubes with fluorescence antibodies in a dried-down layer (DuraClone-Technology, Beckman Coulter, Krefeld, Germany) were incubated for 15 min at room temperature in 100 µL prewashed peripheral blood. After antibody staining, red cells were lysed in 2 mL of VersaLyse™ (Beckman Coulter, Krefeld, Germany) for 10 min, washed with 3 mL of buffered phosphate saline (PBS Biochrom, Berlin, Germany), and centrifuged with 300× g for 5 min. The cell pellet was resuspended in 500 µL PBS and measured on a Navios Flow Cytometer (Beckman Coulter, Krefeld, Germany). In total, up to 1 × 10 5 cells were acquired.

Data Processing
The raw flow cytometry data were compensated, and log transformed. Events with very high side scatter (i.e., mainly granulocytes) were excluded to reduce the amount of data that adds little informative value. Thereafter, data were range standardized between zero and 6 based on the adapted Milligan cooper standardization [28,29]. From the total number of recorded cell events of each sample, a 1% random data set was drawn. Using this 1% sample for training ALPODS a 1000-fold cross-validation was performed. The populations that were relevant for the distinction of TTF 1 versus TTF 0 were selected from ALPODS and the most important populations were filtered using Cohen's D effect size measure. The computed ABC analysis [30] selected optimal limits for subset division by exploiting the mathematical properties related to the distribution of the items analyzed. ABC analysis divides the data into three disjoint subsets A, B, and C, with subset A comprising very profitable values, i.e., largest data values ("the important few"), subset B comprising values where the yield equals the effort required to obtain it, and the subset C comprising of non-profitable values.

Patient Characteristics
From the 157 CLL patients N = 42 had inferior outcomes (death and/or first-line treatment failure) and N = 115 patients did not reach the defined endpoints (superior outcome, TTF 0). The median age of the total cohort was 68 years (range 26-91 years), and 62 (39.5%) of the patients were female and 95 (60.5%) were male. A total of 83 (52.9%) of the patients were diagnosed with Binet A, 24 (15.3%) with Binet B, and 12 (7.6%) with Binet C. Follow-up was in the median 31.5 months (interquartile range 9-65 months). Additional patient characteristics for the total cohort and separated for TTF 1 or TTF 0 are denoted in Table 1.

Cell Populations Identified by ALPODS
Standardized flow cytometry data and outcome group (TTF 1 or TTF 0) were used as input information for the ALPODS algorithm. ALPODS identified N = 17 distinctive cell populations in the MPFC data which were overrepresented (N = 14/17) or underrepresented (N = 3/17) in the TTF 1 patients' cohort. Seven out of 17 populations were identified in the first tube (T1) of the diagnostic flow cytometry B-cell panel. Ten out of 17 populations were identified in the second tube (T2) of the B-cell panel. The workflow is depicted in Figure 1.
Mann-Whitney U test and ROC analysis were performed to detect the most predictive XAI populations for inferior outcomes. The results were listed in Table 2. XAI populations with a significant predictive ability for the outcome (TTF 1 vs. TTF 0) were verified for their prognostic value in patients with high IPI (≥4) compared to patients with low IPI (≤1) (Supplementary Information, Table S2). XAI populations with a significant predictive value for the outcome (TTF 1 vs. TTF 0) and the prognosis (IPI low vs. IPI high) were T1C0011, T1C0016, T2C0004, and T2C0018 (bold script in Table 2). Among these populations, solely T1C0016 had a higher frequency in the patients with a good outcome (TTF 0; mean 13.51% vs. 4.91%; SE of difference 1.83) and good prognosis (IPI ≤ 1; mean 12.50% vs. 5.37%; SE of difference 2.42).
EER REVIEW Figure 1. Workflow and data processing. Flow cytometry raw data was standardized and a to the outcome group (TTF 1 inferior, TTF 0 superior). The ALPODS algorithm was used to i distinctive cell populations with different frequencies in TTF 1 versus TTF 0 The most im populations for determination were visualized in flow cytometry bivariate dot plots and a to their biological counterparts.
The 17 XAI populations in combination had a predictive ability of 0.95 AUC (95% CI 0.91-0.98; p < 0.0001) for TTF using multiple logistic regression analysis (Figure 2A), which was significantly higher than IPI (p = 0.0008; Hanley-McNeil test). Restriction on the four populations of XAI (that is, T1C0011, T1C0016, T2C0004, and T2C0018), which were verified to be predictive of IPI, resulted in a lower diagnostic ability of 0.87 AUC (95% CI 0.80-0.93; p < 0.0001) ( Figure 2B), but still higher than the conventional IPI (AUC 0.87 vs. 0.78) in this patient cohort, although the difference did not reach statistical significance

Identification of the XAI-Populations
The ALPODS algorithm calculated FCS data files that can be depicted with co tional two-dimensional flow cytometry dot plots. Therefore, XAI populations can be and analyzed by a human flow cytometry expert. The population T1C0011 has be cated within the CLL cells while T1C0016 consisted of CD4+ T cells ( Figure 3A). T2 represented nearly exclusively a subset of CLL cells ( Figure 3B) and T2C0018 was ture of a CLL cell subset (higher fraction) and a T and NK cell subset (lower fractio

Identification of the XAI-Populations
The ALPODS algorithm calculated FCS data files that can be depicted with conventional two-dimensional flow cytometry dot plots. Therefore, XAI populations can be gated and analyzed by a human flow cytometry expert. The population T1C0011 has been located within the CLL cells while T1C0016 consisted of CD4+ T cells ( Figure 3A). T2C0004 represented nearly exclusively a subset of CLL cells ( Figure 3B) and T2C0018 was a mixture of a CLL cell subset (higher fraction) and a T and NK cell subset (lower fraction).
Interestingly, the most relevant cell population for the outcome (T1C0016) in CLL was not part of the malignant cells but consisted of T helper cells, which were overrepresented in patients with a favorable outcome. This observation led to the question of whether increased CD8+ T cells were predictive of an inferior outcome. Indeed, we found that the XAI population T1C0023 consisted of CD8+ T cells ( Supplementary Information, Figure S1). T1C0023 was significantly more abundant in patients with an inferior prognosis (IPI ≥ 4 mean 4.30% vs. IPI ≤ 1 mean 0.48; SE of difference 0.77; p < 0.00461). However, in the ROC analysis, population T1C0023 was not able to classify between TTF 1 and TTF 0 on its own (AUC 0.53; 95% CI 0.41-0.65; p = 0.5573).

Identification of the XAI-Populations
The ALPODS algorithm calculated FCS data files that can be depicted with conventional two-dimensional flow cytometry dot plots. Therefore, XAI populations can be gated and analyzed by a human flow cytometry expert. The population T1C0011 has been located within the CLL cells while T1C0016 consisted of CD4+ T cells ( Figure 3A). T2C0004 represented nearly exclusively a subset of CLL cells ( Figure 3B) and T2C0018 was a mixture of a CLL cell subset (higher fraction) and a T and NK cell subset (lower fraction).

Characterization of Predictive Subsets within CLL Cells
Besides T1C0016 (CD4+ T cells) and T1C0023 (CD8+ T cells), most of the XAI populations were CLL subsets (Tube 1: T1C0011, T1C0012, T1C0017, T1C0019, and T1C0020;  Tube 2: T2C0002, T2C0004, T2C0009, T2C0010, T2C0014, T2C0018, and TC0020). The most crucial CLL subsets were T1C0011, T2C0004, and T2C0018, which are shown in Figure 4. Additionally, the median levels of antigen expression and scatter height for relevant subpopulations of CLL cells were compared to the median antigen expression of CLL cells from the average patient in the cohort using a heat map. (Figure 4A,B).
It is noteworthy that the populations T1C0011 and T2C0002 showed a decreased forward scatter expression and a decreased antigen brightness compared to mean CLL cells. Visualized on flow cytometry forward scatter, these populations were located partly in the area of dead and apoptotic cells CLL cells ( Figure 4C). This finding suggests that a higher frequency of dead and apoptotic CLL cells is associated with a worse prognosis and outcome. In contrast, the CLL subsets T2C0004, T2C0014, T2C0018, and T2C0020 showed higher antigen expression and scatter light profile than mean CLL cells. In summary, CLL cells with small cell volume (low forward scatter) were overrepresented in MPFC data independent of the B-cell panel tube and indicated a poor outcome. On the contrary, subsets of CLL cells with large cell volumes (high forward scatter) also indicated poor outcomes.
Besides T1C0016 (CD4+ T cells) and T1C0023 (CD8+ T cells), most of the XAI populations were CLL subsets (Tube 1: T1C0011, T1C0012, T1C0017, T1C0019, and T1C0020;  Tube 2: T2C0002, T2C0004, T2C0009, T2C0010, T2C0014, T2C0018, and TC0020). The most crucial CLL subsets were T1C0011, T2C0004, and T2C0018, which are shown in Figure 4. Additionally, the median levels of antigen expression and scatter height for relevant subpopulations of CLL cells were compared to the median antigen expression of CLL cells from the average patient in the cohort using a heat map. (Figure 4A,B).

Clinical Significance
The XAI method ALPODS identified 17 cell populations that were effective at predicting outcome. However, correct manual gating of these populations without using ALPODS is sophisticated, especially for the CLL subsets. Exceptions were the populations T1C0016 and T1C0023, which compromised CD4+ T cells and CD8+ T cells, respectively. Both populations were easy to gate manually in flow cytometry dot plots. Therefore, we tested whether T1C0016 (CD4+ T cells) and T1C0023 (CD8+ T cells) add predictive value to IPI and CD38-positive CLL cells. Multiple logistic regression was performed for this four-factor model ( Table 3). Odds ratio (OR) >1 favored inferior outcome and <1 favored superior outcome.

Discussion
In this single-center study, we analyzed immunophenotypes of 157 CLL patients employing an explainable AI (XAI). The XAI identified 17 cell populations in MPFC data which could in combination predict the clinical outcome of CLL with a higher ability than CLL-IPI or the frequency of CD38-positive CLL cells. Most of the 17 cell populations were located completely or in part within the abundant CLL population. However, some cell populations were non-malignant. For example, the T1C0016 population consisted of CD4+ T cells entirely and was underrepresented in patients with a poorer outcome. T1C0016 (CD4+ T cells) was the best single classifier for the outcome of the 17 XAI-identified cell populations. In contrast, T1C0023 compromised CD8+ T cells that were overrepresented in patients with inferior outcomes.
T cells in CLL have been described as dysregulated. CD4+ T cells and CD8+ T cells in patients with CLL deviate from healthy individuals by the accumulation of memory T cells and loss of naïve T cells, increased expression of immune checkpoint receptors (i.e., PD1, TIGIT, CTLA-4), and increased activation [33][34][35][36][37][38]. Inversion of the CD4/CD8 ratio is typical for CLL [39][40][41][42]. Furthermore, Elston et al. showed that patients with a CD4/CD8 ratio >1 have better overall survival and progression-free survival [34]. This is in line with our findings that CD4+ T cells indicated a good outcome and CD8+ T cells an adverse outcome. It would be of interest in further studies to determine which subset of CD4+ cells plays the most significant role in favorable outcomes for patients with CLL. Gating of CD4+ T cells and CD8+ T cells is simple in contrast to XAI populations of CLL subsets and transferrable to other flow cytometry panels that include antibodies against CD4 and CD8. For this reason, we developed a simplified approach to predict the outcome in CLL by the combination of IPI and CD4+ T cells or CD4+ T cells and CD8+ T cells. Both two-factor models discriminate between inferior and superior outcomes in more than 80% of the CLL cases.
In addition to diagnostic ability, the XAI populations provided insight into the immunopathology of CLL. For example, we showed that CLL subsets that predict inferior outcomes were small apoptotic/dead CLL events (T1C0011, T2C0002) with low forward scatter. These results are in line with Witkowska et al. and Jahrsdörfer et al. who described that spontaneous in vitro apoptosis of CLL cells correlated with disease progression and cytogenetics with worse prognosis [43,44].
On other hand, CLL subsets with high forward and side scatter (T2C0004, T2C0014, T2C0018, and T2C0020) were associated with adverse outcomes as well. Forward and side scatter correlates with bigger cell size and internal complexity and suggests the prognostic importance of prolymphocytes in CLL. Oscier et al. described that prolymphocytes >10% in CLL are associated with shorter OS and PFS [45].
There are limitations in this study, as some patients did not have complete data for all components of the IPI. In these cases, a half-point score was assigned, which may result in the inaccurate categorization of some patients. A larger patient cohort is warranted to validate our discoveries in multivariate Cox regression models. This would also allow a better clue, if the results are independent of different treatment strategies. Furthermore, an analysis of other prognostic markers for CLL, such as CD49d and ZAP-70, would strengthen the conclusions of the study [46]. However, in addition to the established markers that have been shown to have prognostic value in CLL, our XAI study provides new insights into the prognostic factors related to the immunology of CLL and the non-malignant, reactive immune system.

Conclusions
The ALPODS XAI algorithm identified and described highly predictive immune cell populations related to outcomes in CLL. In particular, CD4+ T cells were identified as the best single classifier and improved the predictive ability of CLL-IPI. These findings should be further refined with a different immunophenotyping panel and an independent patient cohort.