Artificial Intelligence Analysis of Gene Expression Predicted the Overall Survival of Mantle Cell Lymphoma and a Large Pan-Cancer Series

Mantle cell lymphoma (MCL) is a subtype of mature B-cell non-Hodgkin lymphoma characterized by a poor prognosis. First, we analyzed a series of 123 cases (GSE93291). An algorithm using multilayer perceptron artificial neural network, radial basis function, gene set enrichment analysis (GSEA), and conventional statistics, correlated 20,862 genes with 28 MCL prognostic genes for dimensionality reduction, to predict the patients’ overall survival and highlight new markers. As a result, 58 genes predicted survival with high accuracy (area under the curve = 0.9). Further reduction identified 10 genes: KIF18A, YBX3, PEMT, GCNA, and POGLUT3 that associated with a poor survival; and SELENOP, AMOTL2, IGFBP7, KCTD12, and ADGRG2 with a favorable survival. Correlation with the proliferation index (Ki67) was also made. Interestingly, these genes, which were related to cell cycle, apoptosis, and metabolism, also predicted the survival of diffuse large B-cell lymphoma (GSE10846, n = 414), and a pan-cancer series of The Cancer Genome Atlas (TCGA, n = 7289), which included the most relevant cancers (lung, breast, colorectal, prostate, stomach, liver, etcetera). Secondly, survival was predicted using 10 oncology panels (transcriptome, cancer progression and pathways, metabolic pathways, immuno-oncology, and host response), and TYMS was highlighted. Finally, using machine learning, C5 tree and Bayesian network had the highest accuracy for prediction and correlation with the LLMPP MCL35 proliferation assay and RGS1 was made. In conclusion, artificial intelligence analysis predicted the overall survival of MCL with high accuracy, and highlighted genes that predicted the survival of a large pan-cancer series.


Introduction
Mantle cell lymphoma (MCL) is a hematological neoplasia derived from B-lymphocytes, and a subtype of non-Hodgkin lymphomas (NHL) [1]. MCL represents around 7% of adult NHL, and has an incidence of four to eight cases per million people per year [2][3][4][5][6]. MCL affects white men, with a median age at diagnosis of 65 years. The disease frequency increases with age [7], and the incidence of this disease is on the rise in Western and developed countries [7].

Predictive Genes and Artificial Neural Network Analysis 2.3.1. Gene Expression Series of Mantle Cell Lymphoma
The gene expression data of the MCL series GSE93291 were downloaded from the gene expression omnibus (GEO) database [50], which is located at the National Center for Biotechnology  The study involved retrospective gene expression profiling of samples from patients with MCL, confirmed by expert pathology consensus review. This series was created by the Lymphoma/Leukemia Molecular Profiling Project (LLMPP) [50]. These biopsies, with tumor content ≥ 60%, were obtained from untreated patients, with no history of previous lymphoma, who subsequently received a broad range of treatment regimens. The biopsies contributing to the set included 80 biopsies described in Rosenwald et al. [51] (classified based on established morphologic and immunophenotypic criteria, with overexpression of cyclin D1 (CCND1) mRNA (in most cases, immunohistochemistry demonstrated overexpression of cyclin D1 also on the protein level), 3.8 male/female ratio, median age of 62 years (range 38 to 93), multiagent treatment, and median survival 2.8 years) [51], along with additional biopsies gathered from the clinical sites of the LLMPP. The treatments of the patients was multiagent chemotherapy (R-CHOP, R-CHOP-like), six received no treatment, and no information on treatment was available for two patients.
The gene expression array used in this series was the HG-U133 plus 2 platform (GPL570, Affymetrix, Santa Clara, CA, USA). The GeneChip™ Human Genome U133 Plus 2.0 Array (#900466, ThermoFisher Scientific, Affymetrix Japan K.K., Tokyo, Japan), which is the first and most comprehensive whole human genome array. It has a complete coverage of the Human Genome U133 Set, plus 6500 additional genes for analysis of over 47,000 transcripts. The design and performance of the chip can be accessed at the following webpage: https://www.thermofisher.com/order/catalog/product/900466 (accessed on 29 December 2021).
Total RNA from MCL specimens of frozen samples from 123 patients had been extracted using the FastTrack kit from Invitrogen (Thermo Fisher Scientific Corp., Waltham, MA 02451, USA), and biotinylated cRNA had been prepared according to the standard Affymetrix protocol from 1 microg mRNA (Expression Analysis Technical Manual, 2001, Affymetrix). The Affymetrix hybridization protocol was used: following fragmentation, 15 micrograms of cRNA were hybridized for 16 h at 45 • C on arrays from Affymetrix. Arrays were washed and stained in the Affymetrix Fluidics Station 400. The Affymetrix scanning protocol was used and the scanning had been performed by the Affymetrix 3000 scanner. The data had been analyzed with Microarray Suite version 5.0 (MA S 5.0) using Affymetrix default analysis settings and global scaling as normalization method. The trimmed mean target intensity of each array was arbitrarily set to 500. The data was normalized and log2 transformed. The original series matrix files [50] provided by the LLMPP were used for the artificial neural network analysis. The gene expression values were collapsed to symbols applying the max probe values, using the GSEA software and the gene cluster text file (*.gct) [52,53].
The cut-offs were found using SPSS software on the collapsed to symbols gene expression values dataset (i.e., each gene had only one expression value). The visual binning function created new variables based on grouping contiguous values into a limited number of distinct categories. The cutpoints were created using equal percentiles, three cutpoints and a width of 25%. After visualization of the overall survival plots with the Kaplan-Meier and log-rank test, the most adequate cut-off value was identified. Then, the Cox regression calculated the hazard-risk (contrast: indicator; reference category: first). Based on the p values (Table 2), the most relevant predictors for overall survival were MKI67 (p = 6.6 × 10 −9 , hazard risk = 4.4), CDK4 (p = 3.2 × 10 −8 ; HR = 4.0), CHEK1 (p = 0.2 × 10 −5 , HR = 3.0), CCND1 (p = 0.4 × 10 −5 , HR = 3.1), and CDKN2C (p = 0.8 × 10 −5 , HR = 2.8). These genes belonged to the cell cycle and apoptosis pathways. From an initial set of 86 genes with known pathogenic role in MCL, a final set of 28 genes were selected because their predictive value for overall survival using a Kaplan-Meier and log-rank test in the GSE93291: P, p value; HR, hazard risk. The gene information is based on UniProt [54], and Genecards [55].

Description of the Basic Neural Network Architecture
The multilayer perceptron (MLP) analysis was performed as previously described [35][36][37]45,56,57]. The architectures are shown in Figures 1-3, and the analysis outline in Figure 4. The MLP procedure produces a predictive model for one or more dependent (target) variables based on the values of the predictor variables. The MLP is a feedforward architecture, the input layer contains the predictors (our gene expression data), the hidden layer contains unobservable nodes or units, and the output layer contains the target variables. The target variables were the overall survival outcome as dead vs. alive, and the gene expression of each prognostic and pathogenic gene as a categorical variable (high vs. low expression). Figure 5, on the top right side, shows the basic neural network architecture. Of note, the basic architecture of the radial basis function (RBF) is like the MLP, but only one hidden layer characterizes it. This research used a simple type of artificial neural network, but solid enough to provide a "basic analysis unit" that conforms a more complex analysis algorithm as shown in Figure 5. A thorough description is shown in our recent publication of artificial analysis of gene expression data of diffuse large b-cell lymphoma (DLBCL) and non-Hodgkin lymphomas [46,58].

Parameters of the Neural Network
A thorough description of the artificial neural network procedure is described in our recent publication [58]. The predictors (covariates) were the 20,862 genes of the array. The covariates were rescaled by default to improve network training. All rescaling was performed based on the training data, even if a testing or holdout sample is defined. The method for rescaling was the standardized (subtract the mean and divide by the standard deviation (x-mean/s)). Other available methods for rescaling were the normalized ((x − min)/(max − min)), adjusted normalized ([2 × (x − min)/(max − min)] − 1), or none. The cases were randomly assigned to the training set, testing set, and holdout according to the relative number of cases, being 70%, 30%, and 0%, respectively. To avoid bias, each individual neural network underwent a random assignation of the samples into the training and testing sets.
The "best" architecture design for the analysis was searched and finally selected [58,59]. The architecture can be selected automatically (with a minimum number of units in the hidden layer of 1 and a maximum of 50) or can be a custom architecture. A custom architecture selection provides control over the hidden and output layers and can be most useful when you know in advance what architecture you want or when you need to tweak the results of the automatic architecture selection.
In a custom architecture, the number of hidden layers could be one or two. The number of units of the hidden layer could be automatically computed or custom. The activation function of the hidden layers was the hyperbolic tangent (γ(c) = tanh(c) = (e c − e −c )/(e c + e −c )), or sigmoid (γ(c) = 1/(1 + e −c )).
Several types of training were available: the batch, online, and mini-batch. The optimization algorithm included the scaled conjugate gradient, and gradient descent. The training options were the following: initial lambda (0.0000005); initial sigma (0.00005); interval center (0); and interval offset (±0.5).
The output included the network structure and network performance. Several parameters displayed the network performance: model summary; classification results; receiver operating characteristic ROC curve; cumulative gains chart; lift chart; predicted by observed chart; and the independent variable importance analysis. ROC analysis displayed a curve for each categorical dependent variable and category and the area under each curve [35][36][37]45,46,56,57]. The predicting variables (predictors) were ranked according to their normalized importance for predicting the target (dependent) variable and for determining the neural network. This analysis performed a sensitivity analysis that is based on the combined training and testing samples or only on the training sample if there is no testing sample [32,33,60].
The predicted value or category and the predicted pseudo-probability for each dependent variable were saved. The synaptic weight estimates were exported to an XML file.
If it was necessary to replicate the results exactly, the same initialization value for the random number generator, data order, and variable order should be used, in addition to using the same procedure settings.
The setup of a radial basis function (RBF) is similar to the MLP. In a RBF, the activation function for hidden layer was normalized or ordinary radial basis function. Figures 1 and 2 show the general architecture for MLP and RBF [32,33,60]. Figure 3 shows the sensitivity analysis [32,33,60].

Gene Set Enrichment Analysis (GSEA)
GSEA is a method that determines whether a priori defined set of genes shows statistically concordant differences between two "biological" states (e.g., phenotypes) [48,49]. Three types of files were necessary to run the application: (1) the gene cluster text file (*.gct) with the GSE93291 gene expression dataset; (2) the phenotype data as a categorical class (e.g., dead/alive) file format (*.cls); and (3) the gene set database as a gene matrix file format (*.gmx). The GSEA parameters were the following [37]: number of permutations (1000); collapse to gene symbols; permutation type (phenotype); chip platform (GPL570, HG-U133 Plus 2); enrichment statistic (weighted); metric for ranking genes (signal2noise); gene list sorting mode (real); gene list ordering mode (descending); max size (500); and min size (15) [37]. General architecture for multilayer perceptron (MLP) networks. A neural network is a set of non-linear data modeling tools consisting of input layers plus one or two hidden layers. The multilayer perceptron procedure is a feedforward architecture. In comparison to RBF, the MLP con find more complex relationships but it is slower to compute. The MLP network is a function of one or more predictors (also called inputs or independent variables) that minimizes the prediction error of one or more target variables (also called outputs) [32,33,60].   . Summary of the analysis methodology. The analysis was comprised of two methods, one based on the analysis of 20,862 genes and a second based on 10 immuno-oncology panels. This research used artificial neural networks and several machine learning techniques to identify genes associated with the overall survival of the patients. Correlation with known MCL pathogenic genes and the LLMPP MCL35 proliferation assay was also made.

Summary of the Research Analysis Algorithm
The algorithms for the analysis of the gene expression data of MCL are shown in First, all the genes of the array were used as predictors (input layer) for the target variables (output layer) of overall survival (dead/alive) and for the 28 genes with prognostic value in MCL (high/low expression) using an artificial neural network. The neural network included both a multilayer perceptron and a radial basis function analysis for each target variable ( Figure 5). In the output of each individual neural network, all the genes of the array were ranked according to their normalized importance for predicting the target variable. Then, the genes with a normalized importance above 70% were selected. In addition, the normalized importance of all the neural networks were averaged, the genes ranked according to the averaged normalized importance for prediction, and the top 1% genes were selected. As a result, the initial set of 20,862 genes was reduced to a smaller number (n = 1394).
Next, an MLP was performed using the 1394 genes as predictors (input layer) of the overall survival outcome (dead/alive, output layer); this analysis was repeated 20 times, and the top 4 MLPs with higher area under the curves were selected. The normalized importance of each 1394 were averaged between the four results and ranked from higher to lower values. Then, using multiple MLP analysis, the minimum number of genes (starting from the one with higher normalized importance) that provided the highest area under the curve was found (n = 58) ( Figure 6).   Figure 4, the neural networks reduced the initial input of 20,862 genes to 58 predictive genes. Next, the overall survival outcome (dead/alive) was predicted using 58 genes and a neural network. Several parameters display the network performance: model summary; classification results; receiver operating characteristic ROC curve; cumulative gains chart; lift chart; predicted by observed chart; and the independent variable importance analysis. ROC analysis displays a curve for each categorical dependent variable and category and the area under each curve [34][35][36]44,45,55,56]. The genes were ranked according to their normalized importance for predicting the overall survival outcome as a dichotomic variables (dead vs. alive). A GSEA analysis confirmed the association toward a dead outcome. The characteristics of the network were as follows. Case processing: training n = 93 (76%); testing n = 30 (24%). Units n = 58. Rescaling = standardized. Hidden layer: number = 1; units = 2; activation function = hyperbolic tangent. Output layer: dependent variables = 1 (overall survival outcome dead/alive); units = 2, activation function = softmax, error function = cross-entropy. Model summary: training, cross-entropy error = 30.8, 14% of incorrect predictions; testing, cross-entropy error = 14.5, 23% of incorrect predictions. Classification: training, 86% overall correct (93.8% alive, 82% dead); testing, 77% overall correct (82% alive, 74% dead). Area under the curve = 0.9.   (Figures 4 and 5), a final set of 10 genes with overall survival relationship was highlighted. These genes not only correlated with the clinical outcome but also with the proliferation index, as expressed by MKI67. Of note, ki67 is a marker routinely used for prediction in mantle cell lymphoma, and the most relevant marker of the LLMPP MCL35 proliferation assay.

Figure 8.
Artificial neural network analysis for predicting of the overall survival of mantle cell lymphoma using several immune oncology panels (Method 2). Overall survival was predicted using 10 immuno-oncology panels. After several multilayer perceptron analyses, a set of 125 genes predicted the overall survival outcome (dead/alive) with high accuracy. Among the most relevant genes, TYMS was highlighted. GSEA analysis had a sinusoidal-like, with some genes enriched toward dead or alive survival outcomes.
Finally, a Cox regression for overall survival (backward conditional) reduced the list to 19 genes. From these 19 genes, additional analyses included Kaplan-Meier with log-rank test for overall survival using cutoffs (Figure 7), analysis of other types of cancer ("pan-cancer analysis") ( Figures 9 and 10), other machine learning (Figures 11-13), and immunohistochemistry for RGS1 ( Figure 14). Overall survival in a pan-cancer series. The multilayer perceptron using the 20,862 genes identified a final set of 19 genes with prognostic value in mantle cell lymphoma. As a start point of the gene expression of the set of 19 genes and using a risk-score formula [36,46], we confirmed that these genes also contributed to the overall survival of diffuse large B-cell lymphoma (DLBCL). Additionally, these genes could also predict the overall survival of a pan-cancer series of 7289 cases from The Cancer Genome Atlas (TCGA) program that included the most frequent human cancers. Of note, the weight and direction of the overall survival association was different in each subtype of neoplasia. Risk scores were calculated by multiplying the beta values of the multivariate Cox regression analysis for overall survival of each gene with the values of the corresponding gene expressions, as previously described [58].  . Bayesian network. A Bayesian network successfully modeled the overall survival outcome (dead/alive) using the 19 genes, previously identified in the neural network analysis ( Figure 5, Method 1). The Bayesian network enables you to build a probability model by combining observed and recorded evidence with "common-sense" real-world knowledge to establish the likelihood of occurrences by using seemingly unlinked attributes. The node focuses on Tree Augmented Naïve Bayes (TAN) and Markov Blanket networks that are primarily used for classification. This graphical model shows the variables (nodes) and the probabilistic, or conditional, independencies between them. The links of the network (arcs) may represent causal relationships, but the links do not necessary represent direct cause and effect. This Bayesian network is used to calculate the probability of a patient of being alive or dead, given the gene expression of 19 genes, if the probabilistic independencies between the gene expression and the overall survival outcome as displayed on the graph hold true. Bayesian networks are very robust in case of missing data.

Algorithm Based on the Input of 10 Immune Oncology Panels (Method 2)
In comparison to the first algorithm in which the whole genes of the array were used (n = 20,862), this second algorithm used 9 different immune oncology panels as input data (7817 genes in total) ( Figure 8). Nine individual MLP analysis for the prediction of overall survival outcome (dead/alive) were performed, and the genes with a normalized importance above 70% in each panel were pooled (n = 125). A GSEA analysis confirmed the association of these genes towards the dead or alive overall survival outcome (phenotype). Next, an additional MLP analysis confirmed the prediction of the overall survival outcome and ranked the 125 genes according to their normalized importance. The top genes were later tested for conventional overall survival analysis.

Conventional Statistical Analyses
Traditional statistics calculated the overall survival analyses. Overall survival was calculated from time of diagnosis to the last follow-up time, and recorded as alive or dead (event), following the criteria of Cheson B. D. [61,62]. Comparison between groups was performed using Kaplan-Meier analysis and the log-rank test. The Breslow and Tarone-Ware tests were also used. The Cox regression (with the method enter or backward conditional) was used to calculate the hazard-risks and the 95% confidence intervals. A p value less than 0.05 was considered statistically significant.
In case of a neural network analysis, poor prognosis/survival corresponds to the cases whose overall survival event was dead. In case of an overall survival analysis using the Kaplan-Meier test, poor prognosis corresponds to the group with lower cumulative survival proportion in the plot.  Figure 5, Method 1). This model uses the C5.0 algorithm to build either a decision tree or a rule set. A C5.0 model works by splitting the sample based on the field that provides the maximum information gain. Each subsample defined by the first split is then split again, usually based on a different field, and the process repeats until the subsamples cannot be split any further. Finally, the lowest-level splits are reexamined, and those that do not contribute significantly to the value are removed. In this model, the target field (variable) must be categorical (i.e., nominal or ordinal, such as de overall survival outcome as dead vs. alive). The input fields (predictors) can be of any type (in our analysis, the 19 genes were entered as quantitative gene expression). The C5.0 models are quite robust in the presence of problems such as missing data and large numbers of input fields. The C5.0 tree shows how using only the gene expression of 9 genes, the overall survival outcome as dead or alive can be predicted with high accuracy.

•
Using 20,862 genes as a start point (input layers) (Method 1), several neural network analyses correlated with the overall survival outcome and with known pathogenic genes of MCL (output layers), and a final set of 19 genes with predictive value was highlighted ( Figure 5); • This type of analysis was repeated focusing on 10 immune, cancer, and immunooncology panels (Method 2), and 15 genes were highlighted ( Figure 8); • Other machine learning techniques were used to predict the overall survival (Figures 11 and 12); • The highlighted genes also predicted the overall survival of a pan-cancer series (Figures 9, 10 and A1); • The combination of both Methods 1 (19 genes) and 2 (15 genes) with the LLMPP MCL35 assay (17) genes and analysis using several machine learning and neural networks techniques predicted the overall survival outcome (dead vs. alive) with high accuracy.

Prediction of Overall Survival Based on the 20,862 Genes of the Array (Method 1)
Dimensionality reduction refers to techniques for reducing the number of input variables in training data. Fewer input dimensions often mean correspondingly fewer parameters or a simpler architecture in the machine learning model, referred to as degrees of freedom [66]. The input layer of 20,862 predicted the overall survival of mantle cell lymphoma (MCL), using an analysis algorithm ( Figure 5). The output variables (targets) were the overall survival outcome as a dichotomous variable (dead/alive), and the 28 genes (high/low expression) with prognostic relevance for the overall survival were confirmed in the same series (Table 2). Tables A1 and A2 show the complete details of the artificial neural networks. The multilayer perceptron (MLP) technique had better performance than the radial basis function (RBF): comparing area under the curve, percentage of incorrect predictions (testing set), and overall percentage of correct classification (testing set), for MLP vs. RBF, the results were 0.85 ± 0.05 vs. 0.77 ± 0.09 (p = 0.000053), 15.3% ± 5.9 vs. 26.5% ± 10.2 (p = 0.000005), and 84.7% ± 5.9 vs. 73.5% ± 10.2 (p = 0.000005), respectively. CCND1 was the best predicted gene; in the MLP analysis CCND1 had a percentage of incorrect predictions in the testing set of 2.8%, the lowest value among all genes (Table A1).
From the initial 20,862 genes, the list was reduced to 1394 genes, and additional multilayer perceptron analyses led to a set of 58 genes ( Figure 6). The network performance of the MLP with the input of 58 genes was "good", with an area under the curve (AUC) of 0.9. The genes were ranked based on their normalized importance for prediction, and GSEA confirmed that most of these genes were associated with the death survival outcome ( Figure 6); the most relevant were KIF18A, FANCG, GCNA, YBX3, ZCCHC4, and DMTF1.
Based on the 58 genes, a subsequent multivariate Cox regression analysis, backward conditional, highlighted a set of 19 genes (Table A3), and a final set of 10 genes was found after using a cut-off and a Kaplan-Meier analysis for overall survival (Table 2). KIF18A, YBX3, PEMT, GCNA, and POGLUT3 were associated with an unfavorable overall survival, and SELENOP, AMOTL2, IGFBP7, KCTD12, and ADGRG2 to a favorable survival ( Figure 6). Finally, the 10 genes were correlated with the cell proliferation marker of MKI67, which is one of the most relevant genes in the pathogenesis of MCL (Table 3). The cases with low MKI67 were associated with high KCTD12, ADGRG2, SELENOP, and IGFBP7. However, high MKI67 associated with high YBX3. Table A4 shows a multivariate analysis for overall survival between MIK67 and the 10 genes using a Cox regression.
Therefore, the dimensionality/data reduction of the Methods 1 went from 20,862 initial genes, to 1394, 58, 19, and the final 10 most relevant prognostic genes for overall survival of MCL patients.

Prediction of Overall Survival Based on the Immuno-Oncology Panels (Method 2)
The prediction of the overall survival outcome was performed using another strategy, based on nine different immune oncology pathways, multilayer perceptron neural networks, GSEA, and Kapan-Meier analyses ( Figure 8).
The characteristics and performance parameters of the neural networks are shown in Table A5. The most predictive panels (pathways) were the autoimmune (AUC = 0.98), the pan cancer human IO360 (AUC = 0.94), human inflammation (AUC = 0.89), pan cancer (AUC = 0.89), and metabolic (AUC = 0.87). Interestingly, some pathways had a more predictive power toward the dead than the alive outcome.
After selecting the genes with a normalized importance above 70% and merging, a final set of 125 was identified. A GSEA on these 125 genes had a sinusoidal-like pattern, with some genes associated toward poor (dead) and others to favorable (alive) overall survival. The genes were ranked according to their normalized importance for prediction using a multilayer perceptron analysis, and the top 15 genes were CD8B, CEACAM6, FABP5, CFB, IL6ST, AHR, BST2, ROBO4, AR, ID1, PIK3CD, ITGAX, TYMS, CSF1, and PCK2 (normalized importance >0.68). Among them, TYMS was highlighted, and this gene by itself managed to predict the overall survival of the patients (Hazard risk (HR) = 3.2, 95% CI 2.0-5.0, p = 8.9 × 10 −7 ). Of note, high TYMS also correlated with high MIK67 expression (Fisher's exact test, p = 0.001).

Prediction of Overall Survival of a Pan-Cancer Series
The predictive value of the set of 19 genes, derived from neural network analysis and dimensional reduction of the initial 20,862 genes ( Figure 5, Method 1), was tested for the prediction of a pan cancer series of 7289 cases from The Cancer Genome Atlas (TCGA) database and GSE10846 dataset for diffuse large B-cell lymphoma (DLBCL). Using a riskscore formula [36,46], a different overall survival of the patients was found, confirming the pathological role of these genes in cancer (Figures 9 and 10, Table A6, Figure A1). In overall high-risk versus low-risk cases, Cox regression hazard risk = 3.3 (95% CI 2.9-3.6), p < 0.0001.

Prediction of Overall Survival Outcome Using other Machine Learning Techniques
The predictive value of the set of 19 genes (Method 1) as quantitative variables for the overall survival outcome was modeled using other machine-learning techniques, including logistic regression, Bayesian network, discriminant analysis, KNN algorithm, LSVM, tree-AS, C5, CHAID, Quest, random, and C&R trees. Among them, the highest overall accuracy for prediction was achieved by the C5 tree (95%, 9 genes used), and Bayesian network (85%, 19 genes, Figures 11 and 12).

Combination of Method 1, Method 2, and the LLMPP MCL35 Prognostic Gene Signature
A machine learning and neural network modeling was performed using the highlighted genes of both Methods 1 (19 genes) and Methods 2 (15) with the previously identified prognostic genes of MCL of the LLMPP, the MCL35 signature [50,[67][68][69]. All the available artificial intelligence methods were tested, and high overall accuracy for predicting was found for logistic regression (100%), Bayesian network (92%), discriminant analysis (86%), CHAID (85%), C&R tree (85%), and SVM (81%) (Table 4, Figure 13). A total of 13 models were selected and ranked according to their overall accuracy for predicting the overall survival. In the modeling, every possible combination of options was tested, and the best models were saved. Of note, in the final models not all the genes were necessary or contributed to the model, and only the best combinations were selected (e.g., 50 genes in the Bayesian network but only 6 in the CHAID tree).
The RGS1 protein expression was evaluated as low and high, and correlated with the overall survival of the patients (p = 0.048) ( Figure 10). Nevertheless, no correlation was found between RGS1 and the other clinicopathological characteristics.

Discussion
Mantle cell lymphoma is a hematological neoplasia that belongs to the group of non-Hodgkin lymphomas (NHL) and it is derived from mature B-lymphocytes [16].
The postulated cell of origin in most of the cases is a naïve pregerminal center Bcell of the mantle zone [1,9,16,17,46], because of the absence of somatic mutations in the variable region of the heavy chain of immunoglobulin genes (IgVH). IgVH somatic mutational status is a marker of the transition of a B-lymphocyte through a follicular germinal center [70]. However, in 20-30% of the cases somatic hypermutation is found, which suggests a postgerminal origin (marginal zone) [71], and these cases are associated with a better prognosis [72]. Because of the aggressive clinical behavior of mantle cell lymphoma, it is critical to find prognostic makers that will allow identifying the patients who should receive more aggressive therapy.
Mantle cell lymphoma is characterized by increased cell division and replication, decreased response to DNA damage, and enhanced cell survival (impaired apoptosis) [16]. Some of these pathways and genes correlate with prognosis. For instance, TP53 and NOTCH1 mutations, overexpression of SOX11, and high proliferation index (Ki67 staining) associate with a poor prognosis.
This research identified new prognostic markers using gene expression data. Dimensionality reduction refers to techniques for reducing the number of input variables in training data. Fewer input dimensions often mean correspondingly fewer parameters or a simpler architecture in the machine learning model, referred to as degrees of freedom [66]. A neural network analysis correlated the 20,862 genes of the array with the overall survival outcome (dead/alive), and ranked the genes according to their normalized importance for prediction. Additionally, the analysis was enriched with the inclusion of 28 prognostic genes, which were identified from the literature and later confirmed to have prognostic relevance in this series (Table 1). Therefore, the input data of the neural network were solid and resulted in the identification of potentially relevant new prognostic markers. Additionally, the second type of neural network analysis was performed using several immune oncology pathways, which provided a more supervised training and analysis. The fact that we found a correlation of some of the highlighted genes with the expression of MKI67, a marker of proliferation known to be critical in mantle cell lymphoma pathogenesis, suggests that the identified new markers are also potentially relevant.
The highlighted genes influence apoptosis, angiogenesis, cell proliferation, and metabolic processes. They contribute to hematological neoplasia or cancer (Table 5). Therefore, it is expected that these genes also affect the progression of the pan cancer series. Table 5. Function and association of the highlighted genes in neoplasia.

KIF18A
Microtubule motor activity, role in mitosis Overexpressed in various types of cancer; inhibitors are available [73] YBX3 Translation repression, negative regulation of intrinsic apoptosis signaling Related to myelodysplastic syndromes and acute myeloid leukemia [74] PEMT Negative regulation of cell proliferation, positive regulation of lipoprotein metabolic process Critical role in breast cancer progression [75] GCNA Acidic repeat-containing protein, expressed in germ cells (testis) Regulate genome stability [76,77] POGLUT3 Protein glucosyltransferase, specifically targets extracellular EGF repeats of proteins (NOTCH1 and NOTCH3) Related to glioblastoma multiforme tumorigenesis [78] SELENOP Transport of selenium, response to oxidative stress Prostate cancer recurrence [79] AMOTL2 Actin cytoskeleton organization, angiogenesis, cell migration, Wnt-signaling pathway Angiogenesis in pancreatic, and proliferation in lung cancer [80,81] IGFBP7 Cell adhesion, metabolic process (retinoic acid, cortisol), regulation of cell growth Prognosis of acute lymphoblastic leukemia [82] KCTD12 GABA-B receptors auxiliary subunit Proliferation in breast cancer [83] ADGRG2 G protein-coupled receptor signaling pathway Tumor suppressor in endometrial cancer [84] TYMS Regulation of mitotic cell cycle (G1/S transition) Association with non-Hodgkin lymphomas, prognosis of pancreatic cancer [85,86] The gene information is based on UniProt [54], and Genecards [55]. TYMs was highlighted in Method 2; the rest of genes in Method 1.
It is important to point out that one could also use background information (e.g., patient age, sex, comorbidities, etc.) into the artificial neural network analyses. Incorporating such information would have a large impact on the results. In this research, the target was the prediction of the overall survival of patients based on the gene expression data as proof of concept. In future analyses, background information will be incorporated in MCL analysis, in a similar way as we have recently done in diffuse large b-cell lymphoma (DLBCL) [35].
In addition to neural networks, other machine learning techniques were tested, and the C5 tree and Bayesian networks had the best accuracy for predicting the overall survival outcome. Of note, the type of analyses used do not necessarily represent direct cause and effect, but the probabilistic or conditional independencies between the markers.
The recent advances in machine learning have led to many artificial intelligence (AI) applications, which will produce autonomous systems. However, the effectiveness of these systems is limited by the machine's current inability to explain their decision and actions to human users [87]. Therefore, explainable AI (XAI) will be essential to understand, trust, and effectively managed AI machine partners [87]. In this research, the artificial neural networks highlighted the most relevant genes according to their normalized importance for predicting the overall survival of the patients. To make the results more explainable, we performed serval additional machine learning techniques and conventional statistics to understand the results. For future work, the explanation of algorithms will be developed. Of note, in medicine, AI technologies can be clinically validated even when their function cannot be understood by their operators [88].
Future research directions will be the validation of the methodology and highlighted genes in other series of mantle cell lymphoma and non-Hodgkin lymphomas.

Conclusions
This research combined artificial neural networks, machine learning, and conventional statistics to model the overall survival of mantle cell lymphoma and highlight pathogenic genes. Artificial intelligence is a promising field in the understanding of hematological neoplasia, and other types of cancer. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study, according to a protocol approved by the National Cancer Institute institutional review board.

Data Availability Statement:
The gene expression data (GEO data sets) were obtained from the publicly available database of the NCBI resources webpage, located at https://www.ncbi.nlm.nih. gov/gds (accessed on 15 August 2021).
Acknowledgments: I would like to thank all the researchers and colleagues that contributed to the generation of the GSE93291, GSE10846, and The Cancer Genome Atlas (TCGA) program.

Conflicts of Interest:
The authors declare no conflict of interest.  Input layer: standardized rescaling method for covariates. Hidden layer: softmax activation function. Output layer: identity activation function, sum of squares error function. Model summary, testing, sum of square error (the number of hidden units is determined by the testing data criterion: The "best" number of hidden units is the one that yields the smallest error in the testing data).     Figure A1. Differential gene expression of the set of 19 genes per cancer subtype. Based on a riskscore formula and the gene expression of 19 genes, the overall survival for each risk-group could be calculated. The contribution in the prognosis for each gene is shown on the right. This Figure is complementary to Figure 9.