Integrative Statistics, Machine Learning and Artiﬁcial Intelligence Neural Network Analysis Correlated CSF1R with the Prognosis of Diffuse Large B-Cell Lymphoma

: Tumor-associated macrophages (TAMs) of the immune microenvironment play an important role in the Diffuse Large B-cell Lymphoma (DLBCL) pathogenesis. This research aimed to characterize the expression of macrophage colony-stimulating factor 1 receptor (CSF1R) at the gene and protein level in correlation with survival. First, the immunohistochemical expression of CSF1R was analyzed in a series of 198 cases from Tokai University Hospital and two patterns of histological expression were found, a TAMs, and a diffuse B-lymphocytes pattern. The clinicopathological correlations showed that the CSF1R + TAMs pattern associated with a poor progression-free survival of the patients, disease progression, higher MYC proto-oncogene expression, lower MDM2 expression, BCL2 translocation, and a MYD88 L265P mutation. Conversely, a diffuse CSF1R + B-cells pattern was associated with a favorable progression-free survival. Second, the histological expression of CSF1R was also correlated with 10 CSF1R-related markers including CSF1, STAT3, NFKB1, Ki67, MYC, PD-L1, TNFAIP8, IKAROS, CD163, and CD68. CSF1R moderately correlated with STAT3, TNFAIP8, CD68, and CD163 in the cases with the CSF1R + TAMs pattern. In addition, machine learning modeling predicted the CSF1R immunohistochemical expression with high accuracy using regression, generalized linear, an artificial intelligence neural network (multilayer perceptron), and support vector machine (SVM) analyses. Finally, a multilayer perceptron analysis predicted the genes associated with the CSF1R gene expression using the GEO GSE10846 DLBCL series of the Lymphoma/Leukemia Molecular Profiling Project (LLMPP), with correlation to the whole set of 20,683 genes as well as with an immuno-oncology cancer panel of 1790 genes. In addition, CSF1R positively correlated with SIRPA and inversely with CD47 . In conclusion, the CSF1R histological pattern correlated with the progression-free survival of the patients of the Tokai series, and predictive analytics is a feasible strategy in DLBCL. R-CHOP, rituximab, cyclophos- phamide, doxorubicin hydrochloride, vincristine, and prednisolone; CR, complete response; PR, partial response; PD, progressive disease; SD, stable disease, NC, no change.


Introduction
Diffuse large B-cell lymphoma (DLBCL) is one of the most frequent histological subtypes of non-Hodgkin lymphoma (NHL) in the Western countries, representing approximately 25% of the cases. DLBCL not-otherwise specified (NOS) is characterized for being a heterogeneous disease because of the morphological characteristics, the biological background, and the genetic alterations [1]. In the current classification of the World Health Organization (WHO) [2], DLBCL has some separate diagnostic categories including the T-cell/histiocyte rich large B-cell lymphoma, the primary DLBCL of the mediastinum, and the intravascular lymphoma, etc.
DLBCL can be cured in around 50% of the cases with current therapy, mainly based on the R-CHOP (Rituximab, cyclophosphamide, doxorubicin, vincristine, and prednisone). Due to the clinical heterogeneity, it is important to identify the patients with a worse outcome. The International Prognostic Index (IPI) and its derivatives are the main tools being used to stratify the prognosis of the patients with DLBCL. The IPI includes the following variables: age, serum lactate dehydrogenase, Eastern Cooperative Oncology Group (ECOG) performance status, clinical stage, and the number of extranodal disease sites. The variants of the original IPI include the age-adjusted, the stage-adjusted, and the National Comprehensive Cancer Network International Prognostic Index (NCCN IPI) [3]. The gene expression analysis (GEP) classified the DLBCL patients according to the cellof-origin as germinal centre B-cell-like (GCB) associated with a good prognosis, and as activated B-cell-like (ABC) associated with a poor prognosis [4][5][6]. Importantly, the role of the immune microenvironment was also highlighted [7].
The microenvironment is comprised of several immune cells including CD8 + cytotoxic T-lymphocytes, CD4 + helper T-lymphocytes, natural killer (NK) cells, FOXP3 + regulatory T-lymphocytes (Treg), and macrophages, among others [8]. The tumor-associated macrophages (TAMs) are of special interest because the ones with an M2-like phenotype have tumor-promoting capabilities [9], which involve tumor proliferation, invasion, angiogenesis, metastasis, and suppression of anti-tumor immunity [10,11]. In DLBCL, it has been reported that high numbers are associated with a poor prognosis of DLBCL [9,12].
Macrophage colony-stimulating factor 1 receptor (CSF1R) is a tyrosine-protein kinase that functions as a cell-surface receptor for CSF1 and IL34 and regulates the survival, proliferation, and differentiation of macrophages [13]. Due to the association of TAMs with tumorigenesis and the suppression of the anti-tumor immunity, CSF1R is of great interest as a target for cancer treatment using small molecules CSF1R inhibitors [14,15]. In case of Hodgkin Lymphoma, an abstract report by Moskowithz CH et al. described the use of a CSF1R inhibitor (PLX3397) in patients with relapsed or refractory disease, a phase 2 single agent clinical trial, and concluded that the efficacy of single agent PLX3397 in that study population was modest, and that the manageable safety profile and evidence of target inhibition might warrant further testing in combination therapy trials. To the best of our knowledge, the use of CSF1R inhibitors in DLBCL has not been performed [16].
The purpose of this work was to analyze the expression of CSF1R in DLBCL. First, we analyzed the immunohistochemical protein expression of CSF1R in a series of 198 cases of DLBCL from Tokai University Hospital and performed several clinicopathological correlations. Then, we analyzed the gene expression of CSF1R in DLBCL using a robust series from western countries of 414 from the Lymphoma/Leukemia Molecular Profiling Project (LLMPP), and we focused on the identification of genes associated with the CSF1R as a dichotomic variable (high vs. low levels) and then with other relevant cancer-related genes.

Subjects of Study DLBCL Series from the Tokai University Hospital
For the immunohistochemical analysis of CSF1R, we used a Japanese series of 198 cases of DLBCL from the Tokai University Hospital. The complete clinicopathological characteristics of this series are shown in Table 1. In summary, this series has the characteristics of a conventional series of DLBCL not-otherwise specified. The disease location is nodal (+spleen) and Waldeyer's ring is in around half of the cases. The treatment was R-CHOP (rituximab, cyclophosphamide, doxorubicin hydrochloride, vincristine, and prednisolone) or R-CHOP-like in 96% of the cases, and a 75% had a clinical response to treatment. The immunophenotype showed CD10 positivity in 30% of the cases, CD5 positivity in 16%, MUM1 positivity in 74%, BCL2 positivity in 74%, and a cell-of-origin, according to the Hans' classifier of non-GCB, in 64% of the cases. Epstein-Barr virus (EBER) was found in 9% of the cases. The clinicopathological variables were correlated with the overall survival and progression-free survival. In Table 2, the correlation with the overall survival is shown. Relevant variables that correlated with the overall survival were IPI, clinical response to treatment, some immunohistochemical markers (CD10, MUM1, BCL2, Ki67 and RGS1), cell-of-origin Hans' classification, and Epstein-Barr virus (EBER). The correlations with the progression-free survival was like the ones of the overall survival. Of note, the original series for the immunohistochemistry was around 130 cases. This is the reason why some variables such as the fluorescence in situ hybridization (FISH) is only available in around 130 cases. Later, the series was expanded up to 198 to increase the statistical power. The study was conducted according to the guidelines of the Declaration of Helsinki and approved by the Institutional Review Board and the Ethics Committee of Tokai University, School of Medicine (protocol code IRB14R-080 and IRB-156).
The CSF1R monoclonal antibody was developed by Dr Juan Fernando Garcia (Department of Pathology, MD Anderson Cancer Center, Madrid, Spain) and created by Dr. Giovanna Roncador from the Monoclonal Antibodies Unit, Spanish National Centre for Cancer Research (Centro Nacional de Investigaciones Oncologicas, CNIO, Madrid, Spain). This mouse monoclonal primary antibody targets human CSF1R, of which the clone name is FER216, and the isotype is IgG1, and the antibody used the antigen ecCSF1R-Fc-6His recombinant protein (84kDa-extracellular portion). The FER216 mAb can detect human CSF1R protein by Western Blotting, immunoprecipitation, immunocytochemistry, immunohistochemistry (frozen, paraffin, and immunofluorescence), and flow cytometry [23]. For the gene expression analysis of CSF1R, we used a robust and well characterized series of 414 cases of DLBCL from Western countries, the GSE10846 of the Lymphoma/Leukemia Molecular Profiling Project (LLMPP) [24,25].
The clinicopathological characteristics of this series are shown in Table 2. In summary, the age ranged from 14 to 92 years old, with a mean of 61 and a median of 62.5 years. The male/female ratio was 1.3 (224/172). The 1, 3, 5, and 10-year overall survival was 78%, 63%, 57%, and 47%. According to the National Comprehensive Cancer Network International Prognostic Index (NCCN IPI), low risk patients represented a 16.8% of the series (54/321), low-intermediate represented a 47.4%, high-intermediate represented a 30.5%, and high represented a 5.3%. According to the cell-of-origin molecular classification, a 44.2% (183/414) were germinal center B-cell-like (GCB), a 40.3% were activated B-cell-like (ABC), and a 15.5% were unclassified. The variables age, LDH ratio, ECOG performance status, clinical stage, number of extranodal sites, NCCN IPI, and cell-of-origin molecular classification correlated with the overall survival of the patients. Therefore, this is a conventional series of DLBCL (Table 2).

Bioinformatics and Statistical Analysis
The GSE10846 data was downloaded from the National Center for Biotechnology Information (NCBI) Gene Expression Omnibus (GEO) public functional genomics data repository (https://www.ncbi.nlm.nih.gov/gds; accessed on 9 April 2021). The gene expression array used in this series is the GPL570, Affymetrix Human Genome U133 Plus 2.0 Array (HG-U133_Plus_2). The data was normalized and log2 transformed. The probes were collapsed to a one expression value per gene using the maximum probe values. Therefore, the series was comprised of 414 cases and 20,684 genes.
All the analyses were performed using the following software according to the manufacturers' instructions: R programming language with R version 3.6.3 (https://www.rproject.org/; accessed on 9 April 2020) and R Studio (version 1.3.959; https://rstudio.com/; accessed on 9 April 2020), the Gene set enrichment analysis software (GSEA 4. The gene expression values of CSF1R in the series were selected and an appropriate cut-off value for prediction of the overall survival was found. The series of cases was divided into two groups of cases: high versus low CSF1R gene expression. Then, the genes associated with the high or low CSF1R groups were searched using the multilayer perceptron analysis. The multilayer perceptron analysis was performed as thoroughly described in our recent publications [26][27][28]. First, the 20,683 genes were ranked according to their normalized importance for their association with the high or low CSF1R expression groups. Second, a predefined set of 1825 genes was also ranked, according to the association with the two groups. This defined set of 1825 is a cancer transcriptome atlas panel (LBL-10809-2) designed for comprehensive profiling of the tumor, microenvironment, and immune response. The genes are summarized as follows: adaptive and innate immunity, immune response, cell function, metabolism, physiology and disease, signaling pathways, tissue compartment (tumor, immune, and stroma), and biological categories (tumor biology, immune response, and microenvironment). For example, in the category apoptosis, the genes ACTB, AKT1, AKT2, AKT2, APC, ATM, BAD, BCL2, etc. are found. The FOXP3 gene belongs to the Treg differentiation. PDCD1 (PD-1) belongs to the immune exhaustion and T-cell checkpoints. PAX5 belongs to the epigenetic modification. SETD2 belongs to ammino acid synthesis and transport. Of note, one gene can be present in more than one category.
The criteria for overall survival and progression-free survival were standard values [29,30]. The survival was calculated with the Kaplan-Meier with the Log rank (Mantel-Cox) test (in the calculation, the Breslow and Tarone-Ware test were also included) and the Cox regression (enter method). Comparisons between groups were performed with nonparametric tests (independent samples, Mann-Whitney U-test for 2 samples, or Kruskal-Wallis one-way ANOVA for k samples, if necessary), and crosstabulations with Pearson Chi-Square, Likelihood Ratio, and Fisher's Exact Test. Bivariate correlations was performed by Pearson and Spearman correlations (2-tailed).

Immunohistochemical Expression of CSF1R in Reactive Tonsils
The staining for CSF1R was performed in 10 reactive tonsils using an autostainer and, under the microscope, the slides showed that the CSF1R-positive cells had a morphology macrophage. CSF1R-positive cells were distributed both in the follicular and in the interfollicular compartments. In the germinal centers, the CSF1R-positive cells had a morphology compatible with tangible body macrophages. In the interfollicular area, the CSF1R-positive cells had a morphology compatible with macrophages/dendritic cells ( Figure 1).

Correlation between the Immunohistochemical Expression of CSF1R and Prognosis of the Patients in the Tokai DLBCL Series
The CSF1R staining was performed in a series of 198 cases of DLBCL. The CSF1Rpositive cells had a morphology of tumor-associated macrophages (TAMs). In addition, in some cases, the staining was diffuse (B-cell pattern).
The cases were initially evaluated as an ordinal variable as 0 (no staining, <5%), 1+ (an estimated 5-10% of CSF1R + TAMs), 2+ (10-15%), 3+ (25-50%), and 4+ (diffuse pattern/Bcell pattern). The CSF1R staining in TAMs was of macrophages with dendritiform-like elongations. Conversely, the B-cell pattern showed a diffuse staining of the B-lymphocytes of the DLBCL. After that, the slides were digitalized and a representative area for each case was kept for digital image quantification using Fiji software. The CSF1R expression ranged from 0.37% to 87.45%, with a median of 19.9% and a mean of 29.4% ± 25.6%. The relationship between the ordinal evaluation and the digital image quantification is shown in Figure 2. At 60% cut-off, the TAMs vs. the B-cell pattern could be differentiated [a receiver operating characteristic (ROC) analysis was not performed in this case].
Hemato 2021, 2 188 Figure 1. Immunohistochemical expression of CSF1R in reactive tonsils. The CSF1R expression was characteristic of macrophages. Their distribution was mainly interfollicular. In the germinal centers, a weak expression could be found in the tangible body macrophages. The B-lymphocytes were negative for CSF1R.

Correlation between the Immunohistochemical Expression of CSF1R and Prognosis of the Patients in the Tokai DLBCL Series
The CSF1R staining was performed in a series of 198 cases of DLBCL. The CSF1Rpositive cells had a morphology of tumor-associated macrophages (TAMs). In addition, in some cases, the staining was diffuse (B-cell pattern).
The cases were initially evaluated as an ordinal variable as 0 (no staining, <5%), 1+ (an estimated 5-10% of CSF1R + TAMs), 2+ (10-15%), 3+ (25-50%), and 4+ (diffuse pattern/B-cell pattern). The CSF1R staining in TAMs was of macrophages with dendritiformlike elongations. Conversely, the B-cell pattern showed a diffuse staining of the B-lymphocytes of the DLBCL. After that, the slides were digitalized and a representative area for each case was kept for digital image quantification using Fiji software. The CSF1R expression ranged from 0.37% to 87.45%, with a median of 19.9% and a mean of 29.4% ± 25.6%. The relationship between the ordinal evaluation and the digital image quantification is shown in Figure 2. At 60% cut-off, the TAMs vs. the B-cell pattern could be differentiated [a receiver operating characteristic (ROC) analysis was not performed in this case]. The expression of CSF1R was correlated with several clinicopathological characteristics of the patients from the Tokai series of DLBCL. Using the cut-off of 60% that differentiate the TAMs with the B-cells patterns, two groups of patients with different progression-free survival could be identified. The patients with CSF1R B-cells pattern were characterized with a more favorable progression-free survival (Cox regression, Hazard Risk (HR) = 0.5, 95% confidence interval (CI) for HR 0.2-0.9, p = 0.049). Conversely, a CSF1R TAMs pattern was associated with an unfavorable progression-free survival (HR = 2.2, 95% CI for HR 1.0-4.8, p = 0.049) (Figure 3). Of note, when the group of TAMs pattern was divided into two subgroups, high vs. low, the low CSF1R + TAMs subgroup had a trend of more favorable progression-free survival than the group of high CSF1R + TAMs. When a multivariate Cox regression analysis was performed including the histological pattern of CSF1R (TAMs vs. B-cells patterns) and the IPI (low + low-intermediate vs. high-intermediate + high), only the IPI kept the prognostic relevance for the progression-free survival.    The survival analysis was repeated by stratifying the cases according to the cell-oforigin molecular subtypes based on the Hans' classifier both for the overall survival and progression-free survival. In case of the overall survival, the CSF1R expression patterns did not correlated with the outcome. Conversely, the progression-free survival tended to keep the prognostic relevance for both the GCB and non-GCB, but this difference was not statistically significant (p = 0.079 and 0.148, respectively). Of note, our interpretation is that, in a larger series, if the proportion is kept, the difference would be significant because the groups are well separated in the graphs (Figure 3).
The CSF1R with the 60% cut-off was also correlated with the rest of clinicopathological characteristics of the patients and the samples (Tables 3-7). High CSF1R expression (i.e., >60%, B-cells pattern) was associated with a lower MYC proto-oncogene immunohistochemical expression (p = 0.038), higher MDM2 immunohistochemical expression (p = 0.051), lower DNA-binding protein IKAROS immunohistochemical expression (p = 0.042), an absence of BCL2 translocation (p = 0.026), an absence of mutation of MYD88 L265P (p = 0.028), and lower disease progression (p = 0.028). No other correlations were found with the other variables, including the cell-of-origin classification (Hans' classifier).    The expression of CSF1R was correlated with other markers of the CSF1R-pathway including macrophage markers in each of the two histological patterns: The TAMs and the B-cell patterns. In the TAMs pattern group (n = 162), CSF1R positively correlated with STAT3, TNFAIP8, CD163, and CD68. In the B-cell pattern group (n = 36), CSF1R inversely correlated with TNFAIP8 and CD163 ( Table 6).
The same type of analysis was performed for each histological pattern using predictive analytics with 12 models including regression, generalized linear, KNN algorithm (nearest neighbor analysis, the number of nearest neighbors to examine is called k), linear-AS (namely, linear analytic server), LSVM (linear support vector machine), random trees, SVM (support vector machine), tree-AS, linear, CHAID (Chi-squared automatic inter-action detection), C&R tree (classification and regression tree), and a neural network (Figures 4 and 5). We aimed to predict the CSF1R expression as a quantitative variable by the previous 10 markers (CSF1, STAT3, NFKB1, Ki67, MYC, PD-L1, TNFAIP8, IKAROS, CD163, and CD68).

Identification of the Genes Associated with CSF1R Expression Levels in the LLMPP DLBCL Series
The series of 414 cases was divided into two groups, according to the CSF1R expression: ≤11.62 (n = 309, 64.7%), and ≥11.63 (n = 105, 46.7%). The cut-off was found using the "transform variable" and the "visual binding" function of the SPSS software (version 26). When making the cutpoints, equal percentiles were used based on the scanned cases, and the intervals corresponded to the number of desired cutpoints. As a start, 3 cutpoints were set (25% for each interval). Then, the binned variable was subjected to overall survival analysis and a compromise between the statistical significance and a balanced distribution of the samples was found. A multilayer perceptron neural network analysis was performed to identify the most relevant genes associated with the CSF1R expression (Figures 6 and  7). Using this technique, all the genes of the array (n = 20,683) were ranked according to their normalized importance for predicting the CSF1R expression as a dichotomic variable (high vs. low, using the cut-off value of 11.63). The neural network performance was good, with an area under the curve of 0.92 and a model with only a 12.2% of incorrect predictions in the training and a 14.9% in the testing set. In this model, the most relevant genes (with a normalized importance >70%) were as follows: AC067852.2, CD99P1, ACAN, SMYD3, MVB12A, NABP2, PRH1, C2orf74, RFX7, IKZF1, and CEBPD. predictions in the training and a 14.9% in the testing set. In this model, the most relevant genes (with a normalized importance >70%) were as follows: AC067852.2, CD99P1, ACAN, SMYD3, MVB12A, NABP2, PRH1, C2orf74, RFX7, IKZF1, and CEBPD. Figure 4. Prediction of the immunohistochemical expression of CSF1R in each of the histological expression groups by a set of 10 markers using Artificial Neural Network (Tokai series). In each of the CSF1R histological expression groups, the expression of CSF1R could be predicted by modeling using a multilayer perceptron analysis. According to their importance, the markers are ranked as the most important in the model on the top, MYC proto-oncogene and IKAROS (DNA-binding protein Ikaros), and less important on the bottom (PD-L1, Programmed cell death 1 ligand 1). In order to understand how the different markers interact between them, a protein-protein interaction analysis was also perfomed, using a basic (left) or an extended network (right). In each of the CSF1R histological expression groups, the expression of CSF1R could be predicted by modeling using a multilayer perceptron analysis. According to their importance, the markers are ranked as the most important in the model on the top, MYC proto-oncogene and IKAROS (DNA-binding protein Ikaros), and less important on the bottom (PD-L1, Programmed cell death 1 ligand 1). In order to understand how the different markers interact between them, a protein-protein interaction analysis was also perfomed, using a basic (left) or an extended network (right).
Hemato 2021, 2 Figure 5. Immunohistochemical expression of CSF1R and some of the CSF1R-related mark A logistic regression was performed to ascertain the effects of the most r genes, that were previously identified in the multilayer perceptron neural netwo yses, on the likelihood that the patients have a high CSF1R expression. The genes normalized importance >70% were selected and the analysis included univariate an tivariate (backward conditional) tests.
In the multivariate analysis, increasing expression of CD99P1, MVB12A, IKZ CEBPD was associated with an increased likelihood of exhibiting high CSF1R exp but increasing PRH1 and C2orf74 was associated with a reduction in the likelihoo hibiting high CSF1R expression (Table 7). A logistic regression was performed to ascertain the effects of the most relevant genes, that were previously identified in the multilayer perceptron neural network analyses, on the likelihood that the patients have a high CSF1R expression. The genes with a normalized importance >70% were selected and the analysis included univariate and multivariate (backward conditional) tests.
In the multivariate analysis, increasing expression of CD99P1, MVB12A, IKZF1, and CEBPD was associated with an increased likelihood of exhibiting high CSF1R expression, but increasing PRH1 and C2orf74 was associated with a reduction in the likelihood of exhibiting high CSF1R expression (Table 7).

Identification of the Genes of the Cancer Transcriptome Atlas Panel Associated with CSF1R Levels of the LLMPP DLBCL Series
A multilayer perceptron neural network analysis was performed to identify the most relevant genes of the transcriptome atlas panel associated with the CSF1R expression (Figures 8 and 9). Using this technique, all the genes of the array (n = 1790) were ranked according to their normalized importance for predicting the CSF1R expression as a dichotomic variable (high vs. low, using the cut-off value of 11.63). The neural network performance was good, with an area under the curve of 0.99 and a model with only a 3.3% of incorrect predictions in the training and an 11.2% in the testing set. In this model, the most relevant genes (normalized importance >70%) were 42. In order from most to least important were as follows: A logistic regression was performed to ascertain the effects of the most relevant genes of the cancer panel, which were previously highlighted in the multilayer perceptron neural network analyses, on the likelihood that the patients have a high CSF1R expression. The genes with a normalized importance >70% were selected and the analysis included univariate and multivariate (backward conditional) tests.
In the multivariate analysis, increasing expression of PLA2G4C, RIN1, NFATC2, and HSPB1 was associated with an increased likelihood of exhibiting high CSF1R expression, but increasing PIN1, TXN2, IL5RA, SPINK1, FOLH1, KRAS, ITGA6, PRKCE, and TAF3 was associated with a reduction in the likelihood of exhibiting high CSF1R expression (Table 8). Figure 6. Identification of the genes associated with CSF1R expression levels (LLMPP data, all genes set). The use of artificial intelligence analysis, based on the multilayer perceptron analysis allowed to predict the genes associated with the CSF1R expression (high vs. Low). The 20,683 genes of the array were ranked according to their normalized importance for predicting the CSF1R expression. The neural network performance was good, with an area under the curve of 0.92. CSF1R High, red color. CSF1R Low, blue color. Figure 6. Identification of the genes associated with CSF1R expression levels (LLMPP data, all genes set). The use of artificial intelligence analysis, based on the multilayer perceptron analysis allowed to predict the genes associated with the CSF1R expression (high vs. Low). The 20,683 genes of the array were ranked according to their normalized importance for predicting the CSF1R expression. The neural network performance was good, with an area under the curve of 0.92. CSF1R High, red color. CSF1R Low, blue color.
A logistic regression was performed to ascertain the effects of the most relevant genes of the cancer panel, which were previously highlighted in the multilayer perceptron neural network analyses, on the likelihood that the patients have a high CSF1R expression. The genes with a normalized importance >70% were selected and the analysis included univariate and multivariate (backward conditional) tests.
In the multivariate analysis, increasing expression of PLA2G4C, RIN1, NFATC2, and HSPB1 was associated with an increased likelihood of exhibiting high CSF1R expression, but increasing PIN1, TXN2, IL5RA, SPINK1, FOLH1, KRAS, ITGA6, PRKCE, and TAF3 was associated with a reduction in the likelihood of exhibiting high CSF1R expression (Table  8).   Figure 8. Identification of the genes of the cancer panel associated with CSF1R expression levels (LLMPP data). The multilayer perceptron neural network analysis was also performed using an immuno-oncology cancer panel of 1790 genes. In this analysis, the Receiver Operating Characteristic (ROC) area under the curve was 0.99. Therefore, these genes are highly associated and are capable of predicting the CSF1R expression with high accuracy. Figure 8. Identification of the genes of the cancer panel associated with CSF1R expression levels (LLMPP data). The multilayer perceptron neural network analysis was also performed using an immuno-oncology cancer panel of 1790 genes. In this analysis, the Receiver Operating Characteristic (ROC) area under the curve was 0.99. Therefore, these genes are highly associated and are capable of predicting the CSF1R expression with high accuracy.

Correlation Between Expression Levels of CSF1R and CD47 in the LLMPP DLBCL Series
The CD47 was one of the genes present in the transcriptome atlas panel set that belongs to the immune checkpoint pathway. It is related to the macrophages' pathway and it is associated with the prognosis of DLBCL [31][32][33][34]. An immunohistochemical study

Correlation between Expression Levels of CSF1R and CD47 in the LLMPP DLBCL Series
The CD47 was one of the genes present in the transcriptome atlas panel set that belongs to the immune checkpoint pathway. It is related to the macrophages' pathway and it is associated with the prognosis of DLBCL [31][32][33][34]. An immunohistochemical study showed that CD47 was expressed by the B-lymphocytes of DLBCL, while its receptor SIRPA (namely Tyrosine-protein phosphatase non-receptor type substrate 1) was expressed by the tumor-associated macrophages (TAMs) [34]. SIRPA (is a relevant immune checkpoint marker because it mediates negative regulation of phagocytosis [13]. In the 233 DLBCL cases of the LLMPP series with R-CHOP treatment, high gene expression of CD47 correlated with an unfavorable overall survival of the patients (cut-off value = 13.94, Hazard Risk = 1.82, p = 0.021) ( Figure 10). Conversely, high expression of SIRPA correlated with a favorable overall survival (cut-off value = 9.34, Hazard Risk = 0.55, p = 0.02). Of note, CD47 and SIRPA gene expression levels inversely correlated between them (Pearson Correlation = −0.3, p < 0.001). Both markers were correlated with the CSF1R expression ( Figure 10). CSF1R inversely correlated with CD47 (Pearson Correlation = −0.31, p < 0.001). Conversely, CSF1R strongly correlated positively with SIRPA (Pearson Correlation = 0.71, p < 0.001). Finally, in order to identify which genes of the transcriptome atlas panel were more associated with the expression of both CD47 and SIRPA, a multilayer perceptron artificial neural network analysis was performed ( Figure 10). The most relevant markers were the following: PIK3CB, FADD, MLPH, PTPRC, AKT2, MUC1, SOX10, PLCB1, DMBT1, and FANCC. Of note, the predictive modeling by the neural network had a high efficiency with an area under the curve (ROC) of 0.91 for both markers. In this analysis, only the cases treated with R-CHOP were selected (n = 233). These two markers belong to the immune checkpoint pathway, and mediate a negative regulation of phagocytosis. In DLBCL, CD47 is expressed by the B-lymphocytes and SIRPA is expressed by the Figure 10. Gene expression analysis with CD47 and SIRPA in the LLMPP DLBCL series. The series of DLBCL of the LLMPP was used to analyze the gene expression of CD47 and SIRPA, and to correlate with CSF1R. In this analysis, only the cases treated with R-CHOP were selected (n = 233). These two markers belong to the immune checkpoint pathway, and mediate a negative regulation of phagocytosis. In DLBCL, CD47 is expressed by the B-lymphocytes and SIRPA is expressed by the tumor-associated macrphages (TAMs) [31][32][33][34]. We found that high CD47 expression was associated with a poor overall survival of the DLBCL patients. Conversely, high SIRPA is associated with a favorable overall survival. Of note, these two markers inversely correlated between them. When correlated with CSF1R, SIRPA positively correlated with CSF1R, and inversely with CD47. CSF1R moderately correlated with CD163 as well. Finally, the expression of both CD47 and SIRPA were predicted using a multilayer perceptron artificial

Discussion
Colony stimulating factor 1 receptor (CSF1R), also known as macrophage colonystimulating factor receptor (M-CSFR) and CD115, is a cell surface protein that functions as a receptor for colony stimulating factor 1 (CSF1) and the Interleukin-34 (IL-34). CSF1R has a role in regulating the homeostatic survival of the tumor-associated macrophages (TAMs). TAMs are relevant because they promote tumorigenesis of many types of cancer, including non-Hodgkin lymphomas [23,[35][36][37][38]. Therefore, CSF1R is a potentially relevant oncological target.
CSF1R expression was initially thought to be characteristic of myeloid cells, but recent research has shown that non-myeloid cells can also express CSF1R, including malignant B-lymphocytes and classical Hodgkin Lymphoma [23,[35][36][37]. In this research about DLBCL, we have found that the immunohistochemical expression of CSF1R was variable. The most characteristic pattern was of TAMs, present in 82% of the cases. These CSF1R-positive TAMs had a morphology that was like the one seen in M2-like TAMs, with a higher shape and dendritiform elongations, especially when the concentration was high in the tumor immune microenvironment. In addition, a CSF1R-positive B-cells pattern was seen in 18% of the cases. This CSF1R pattern affected the B-lymphocytes of DLBCL and the expression was diffuse. The CSF1R pattern correlated with the prognosis of the patients. The CSF1Rpositive TAMs pattern was associated with a poor progression-free survival. Conversely, the B-cells pattern correlated with a favorable progression-free survival. Interestingly, although not statistically significant, the pattern of low CSF1R-positive TAMs had a better survival than the cases with high CSF1R-positive TAMs.
The start point of this research was to check if the gene expression of CSF1R correlated with the prognosis of the patients with DLBCL. We used the LLMPP series that is comprised of 414 DLBCL cases. This series from western countries is robust and very well annotated. Using a cut-off, two groups with different overall survival could be found. The group with low CSF1R expression, ≤11.62 (n = 309, 64.7%), was associated with favorable survival. Conversely, the group with high CSF1R expression, ≥11.63 (n = 105, 46.7%), was associated with a poor outcome. We also correlated the CSF1R with other markers, including CD163 and PD-L1 that are markers of M2-like TAMs. The correlation was moderate and positive. Therefore, the hypothesis was that CSF1R in DLBCL identified only TAMs. In DLBCL, high CD163-positive TAMs have been associated with poor prognosis of the patients [9,10,12], which is the same result seen by gene expression in the LLMPP series. Nevertheless, the presence of a B-cell pattern was not expected. This B-cell pattern was a new finding in the Tokai University Hospital series. Of note, in the reactive tonsils, CSF1R-positive B-cells were not identified. Therefore, this B-cells pattern in DLBCL may be pathological, as seen in Hodgkin Lymphoma.
In the Tokai series, a correlation between the two CSF1R patterns was made with several clinicopathological characteristics of the series. Initially, not many associations were found, but the B-cells pattern was associated with a lower MYC immunohistochemical expression, absence of BCL2 translocation, absence of mutation of MYD88 L265P, and higher MDM2 immunohistochemical expression. These characteristics point to a lower pathological background in this group of patients. Correlation with the clinical features of the patients also showed that the CSF1R + TAMs pattern was associated with a poor progression-free survival of the patients, disease progression, higher MYC expression, lower MDM2 expression, BCL2 translocation, and MYD88 L265P mutation. In addition, the histological expression of CSF1R was also correlated with 10 CSF1R-related markers including CSF1, STAT3, NF-KB, Ki67, MYC, PD-L1, TNFAIP8, IKAROS, CD163, and CD68, and predictive modeling with high accuracy for CSF1R was found using regression, generalized linear, an artificial intelligence neural network (multilayer perceptron), and SVM. Of note, CSF1R moderately correlated with STAT3, TNFAIP8, CD163, and CD68. Therefore, our results agree with groups that showed that, in DLBCL, high CD163-positive TAMs were associated with poor prognosis of the patients [9,10,12].
Finally, we used artificial intelligence analysis to identify the genes that predicted the CSF1R expression in the LLMPP series. Many data mining applications use neural networks because of their power, flexibility, and ease of use in situations where the underlying process is complex [26]. Among them, the multilayer perceptron analysis predicts one or more target variables based on the values of several predictors [26]. In this research, we performed two types of analysis. First, we used all the genes of the array and the result ranked the genes according to their importance to predict the CSF1R expression (high vs. low). This analysis was technically successful, as shown by the low percentage of incorrect predictions and the high area under the curve. Second, we used an immune-oncology cancer panel and the multilayer perceptron managed to predict the CSF1R expression with even better performance. Therefore, it is expected that those genes are not only related to the CSF1R expression mechanisms but also related to the prognosis of DLBCL.
If the gene CSF1R is checked in the cBioPortal webpage for cancer genomics and a combined study for DLBCL with 1295 samples is performed, the result shows that there are no alterations in this gene. We think that CSF1R is not relevant in DLBCL for the mutational status or other genomic changes, but it is relevant for their association to macrophage signature. Of note, the relevance of CD163 in DLBCL is well established as a marker for an inferior prognosis [9,10].
CSF1R may be relevant in other subtypes of cancer. According to the Human Protein Atlas (http://www.proteinatlas.org; accessed on 9 April 2021) [39] that used the TCGA dataset, the RNA expression of CSF1R shows low cancer specificity. Among the different types of tumors that are being tested, glioma is the subtype that shows more CSF1R expression. High CSF1R expression correlated with a poor prognosis of renal and testis cancer. Nevertheless, no information is provided regarding lymphoma and CSF1R in the Human Protein Atlas. According to the Kaplan-Meier Plotter (http://kmplot.com/ analysis/index.php?p=background; accessed on 9 April 2021) [40], high expression of CSF1R is associated with favorable overall survival of breast cancer and unfavorable overall survival of ovarian, lung, and gastric cancer. Therefore, CSF1R seems to be relevant in the pathogenesis of other subtypes of cancer as well. Due to the importance of the CSF1R in cancer, several groups have used CSF1/CSF1R inhibitors as monotherapy in clinical development. For example, small molecules have been used in melanoma, prostate cancer with metastasis, glioblastoma multiforme, solid tumors, relapse or refractory acute myeloid leukemia, and breast cancer. A review manuscript describing the use of CSF1R inhibitors in cancer therapy has recently been written by Cannarile MA et al. [41] There are new discriminators in the literature that are worth mentioning. For example, CD47 is a marker of the immune checkpoint that is a potential negative regulator of the DLBCL treatment outcome ("don't eat me"). In a report by Bouwstra et al., CD47-positive DLBCL is characterized by worse overall survival when treated by R-CHOP [33]. Therefore, DLBCL patients of the non-GCB cell-of-origin subtype may benefit from CD47-targeted therapy in addition to rituximab and possibly in addition to macrophage-targeted therapy. In the last section of our research, we analyzed the CD47 and SIRPA expression in the LLMPP database. We found that high CD47 correlated with a poor overall survival of the patients, and that high SIRPA (the receptor for CD47) correlated with good survival and with CSF1R ( Figure 10). In addition, we also highlighted the genes of the cancer panel associated with the expression of these two markers. Therefore, CD47 is an interesting marker with complex relationships and will require further analysis. TAMs in DLBCL can also be targeted using a legumain inhibitor, which suppressed the tumor progression in an OCI-Ly3 xenograft mouse model of DLBCL [42]. Wu ZL et al. reported that high nuclear expression of STAT3 associated with an unfavorable prognosis of DLBCL [43]. In our research, we found that, in the CSF1R histological pattern of TAMs, which was associated with a worse progression-free survival, the CSF1R marker correlated with the STAT3 expression. Finally, high expression of PD-L1 was associated with poor prognosis in DLBCL [44]. This result was also recently confirmed by our group [45], but, in this research, the CSF1R did not correlate with the PD-L1 expression. Another marker that we have recently described is the apoptosis inhibitor TNFAIP8, which is associated with a poor prognosis of the patients [28]. In this research, we found that, in the TAMs histological pattern, CSF1R correlated with TNFAIP8.

Conclusions
In DLBCL, the expression of DLBCL shows two histological patterns with correlation to the progression-free survival of the patients. A pattern of CSF1R-positive TAMs correlates with poor progression-free survival. Conversely, a pattern of CSF1R-positive B-cells correlate with a favorable progression-free survival. Using multilayer perceptron artificial neural network analysis, the genes connect with the CSF1R expression that could be highlighted. Therefore, CSF1R is a relevant marker in the pathogenesis of DLBCL.