Identification of Altered Primary Immunodeficiency-Associated Genes and Their Implications in Pediatric Cancers

Simple Summary In children, cancer remains the most common cause of disease-related mortality and is responsible for more deaths from infancy through adolescence than any other disease. Malignancies are observed more frequently in individuals with primary immunodeficiencies (PID), and cancer is one of the most common causes of death in patients with PIDs. However, the molecular mechanisms that link the immune function to malignancy development remain poorly understood. The primary aim of this project was to identify and highlight the molecular mechanisms by which PID-related genes may lead to the development of pediatric cancers and was completed using a novel bioinformatics framework. This study highlighted multiple PID-related genes for further investigation regarding their implications in PIDs and pediatric cancer mechanisms which may lead to the identification of new therapeutic targets. Abstract Background: Cancer is the leading cause of disease-related mortality in children and malignancies are more frequently observed in individuals with primary immunodeficiencies (PIDs). This study aimed to identify and highlight the molecular mechanisms, such as oncogenesis and immune evasion, by which PID-related genes may lead to the development of pediatric cancers. Method: We implemented a novel bioinformatics framework using patient data from the TARGET database and performed a comparative transcriptome analysis of PID-related genes in pediatric cancers between normal and cancer tissues, gene ontology enrichment, and protein–protein interaction analyses, and determined the prognostic impacts of commonly mutated and differentially expressed PID-related genes. Results: From the Fulgent Genetics Comprehensive Primary Immunodeficiency panel of 472 PID-related genes, 89 genes were significantly differentially expressed between normal and cancer tissues, and 20 genes were mutated in two or more patients. Enrichment analysis highlighted many immune system processes as well as additional pathways in the mutated PID-related genes related to oncogenesis. Survival outcomes for patients with altered PID-related genes were significantly different for 75 of the 89 DEGs, often resulting in a poorer prognosis. Conclusions: Overall, multiple PID-related genes demonstrated the connection between PIDs and cancer development and should be studied further, with hopes of identifying new therapeutic targets.


Introduction
Cancer remains the most common cause of disease-related mortality in children, and it is responsible for more deaths from infancy through adolescence than any other

Data Collection and Processing
The TCGAbiolinks R package was used to download and process publicly available genomic data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database within the Genomic Data Commons (GDC) Data Portal for six

Data Collection and Processing
The TCGAbiolinks R package was used to download and process publicly available genomic data from the Therapeutically Applicable Research to Generate Effective Treatments (TARGET) database within the Genomic Data Commons (GDC) Data Portal for six different pediatric tumors [9,10]-acute myeloid leukemia (AML; n = 2245), acute lymphocytic leukemia (ALL; n = 463), neuroblastoma (NBL; n = 153), Wilms' tumor (WT; n = 124), rhabdoid tumor (RT; n = 63), clear cell sarcoma of the kidney (CCSK; n = 13). Each available tumor sample of each cancer type was analyzed separately. Two adult healthy tissue samples were used as control (normal tissue comparison) from the Genotype-Tissue Expression (GTEx) Project. The Recount2 pipeline was used to compare expression among tumor samples [9,11]. GTEx blood normal tissue samples were used as control for ALL and AML whereas GTEx kidney normal tissue samples were used as control for CCSK, NBL, RT, and WT. The integration of the GTEx adult normal tissue samples with the TARGET tissue samples have been justified due to samples being processed in the same manner as highlighted by previously published work [12,13]. TARGET tumor samples were preprocessed and normalized for GC content and quantile filtered using the TCGAanalyze_Preprocessing, TCGAanalyze_Normalization, and TCGAanalyze_Filtering functions from the TCGAbiolinks package [9]. Batch correction was performed on the TARGET tumor and the GTEx healthy tissue samples using the TCGAbiolinks TCGAbatch_Correction function, and then was normalized for GC content and filtered by quantile [9].

Differential Gene Expression Analysis of Unified TARGET-GTEx Datasets
Differential gene expression analysis (DEA) was performed using the TCGAana-lyze_DEA function in the TCGAbiolinks package, using both the limma-voom and edgeR pipelines which have been used previously to analyze genes that were differentially expressed in tumor samples in comparison to healthy tissue samples [9,14,15]. Log fold change (logFC) > |±2.0| and false discovery rate adjusted p-value (FDR adj. p.) < 10 × 10 −16 were defined as cut-off values to retain significant differentially expressed genes (DEGs) for the analysis. Differentially expressed genes (DEGs) were then compared to a subset of 472 PID-related genes curated from the Fulgent Genetics Comprehensive Primary Immunodeficiency (NGS) panel [16]. The Fulgent Genetics NGS panel contained the greatest number of PID-related genes in comparison to other publicly available gene panels for PID-related genes [16]. The subset of DEGs that were noted to be PID-related genes was visualized using the EnhancedVolcano R package [14]. DEA results from all the pediatric tumor samples for each pipeline were summarized and visualized. For this study, trends in AML and ALL cancer types included the results from the TB (primary blood-derived cancer peripheral blood tissue) and TBM (primary blood-derived cancer bone marrow tissue) tissues in the TARGET database. For all other cancers, the TP (primary tumor tissue) sample type was available and used. Multiple ALL phases were also available, however, only ALL-P2 produced significant results with the limma-voom pipeline and only ALL-P1 produced significant results with the edgeR pipeline, and these were grouped when looking at overall DEA trends. Genes that were commonly significantly altered were further investigated to assess their biological and clinical significance in pediatric cancers.

Mutational and Copy Number Variation Analysis
Mutational analysis was performed in the five available TARGET pediatric cancer cohorts from the cBioPortal [17]. The cancers included in the mutational analysis were ALL, AML, NBL, RT, and WT, and the genes queried included the panel of 472 PID-related genes. Genes that were mutated in at least two of the 2087 patient profiles were analyzed further regarding the types of mutations and visualized using the ggplot2 package in R as well as investigated regarding their clinical significance. It is important to note that none of the genes were mutated in more than 1% of total patients which may be partially due to the limited data available in pediatric cancer cohorts, and therefore, our analysis is limited. Protein changes in all cancer types were extracted from cBioPortal [17]. Copy number variant analysis was performed by obtaining the copy number variation data from the four TARGET pediatric cancer cohort projects with this data available including ALL, AML, NBL, and WT on cBioPortal [17].

Functional Enrichment Analysis
Genes that were significantly altered in the DEA were analyzed further through a Gene Ontology (GO) enrichment analysis using the topGO package in R to assess the altered molecular pathways for the DEGs [18]. GO enrichment analysis regarding biological process (BP) terms used Kolmogorov-Smirnov (KS) statistical testing with KS < 0.05 and enrichment ≥ 2. GO enrichment analysis results were visualized using the ggplot2 package in R. STRING (v11) was used for pathway analysis for the 89 significantly DEGs and 20 genes mutated in two or more patients to determine protein interactions important in the cancer profiles [19]. The enrichment p-values and BP GO terms were assessed to identify and validate pathways found in the earlier GO analysis.

Survival Analysis
Using cBioPortal, survival analysis was performed by obtaining the survival data from the following available pediatric tissue samples: ALL, AML, NBL, RT, and WT [17]. The altered group for this analysis is defined as harboring somatic mutations from whole genome or whole exome sequencing. Kaplan-Meier survival curves and associated logrank test p-values were generated from the analysis. Survival analysis was performed using two different gene queries: the panel of 472 PID-related genes, and the 20 PID-related genes mutated in two or more patients. Significant survival difference was determined using a p-value of <0.05. Following this, survival analysis using overall survival as an outcome was completed for each DEG for each pediatric cancer type with TARGET data publicly available. Using an R script, Kaplan-Meier survival curves were generated to assess the clinical impact of significant PID-related DEGs in pediatric cancers, and those with significant Cox regression results were identified (p < 0.05). The cut-off points for highand low-expressing groups were determined by tertial splitting (retaining the samples with expression levels above the 67th percentile or below the 33rd percentile, respectively).

Differential Expression of PID-Related Genes in Pediatric Cancer Datasets
Limma-voom ( Figure S1) and edgeR ( Figure S2) pipelines were individually analyzed for each pediatric cancer type, and the significantly altered genes were extracted. Overall, 89 of the PID-related genes were differentially expressed in at least one pipeline. Of the 472 PID-related genes, 88 were differentially expressed in at least one pediatric cancer type when using the limma-voom pipeline ( Figure S3). Furthermore, 23 PID-related genes were differentially expressed in at least two different pediatric cancer types when using the limma-voom pipeline (Figure 2A). The three most upregulated genes were AICDA, TNFRSF13B, and CR2 (Table S1). The three most downregulated genes were SEMA3A, GATA2, and CFTR (Table S1). The three genes that were most commonly altered were CR2, MS4A1, and SEMA3A. (Figure 2A). CR2 and MS4A1 were both differentially expressed in NBL, AML, and ALL, whereas SEMA3A was differentially expressed in RT, NBL, and WT. SEMA3A was downregulated in all the cancers in which it was differentially expressed, while the regulation of CR2 and MS4A1 varied according to cancer type ( Figure 2A). MS4A1 was upregulated in AML and ALL (TBM) samples and downregulated in NBL and ALL (TB) samples (Table S1). CR2 was upregulated in AML and downregulated in both NBL and ALL (Table S1).  (Figure 2A). CR2 and MS4A1 were both differentially expressed in NBL, AML, and ALL, whereas SEMA3A was differentially expressed in RT, NBL, and WT. SEMA3A was downregulated in all the cancers in which it was differentially expressed, while the regulation of CR2 and MS4A1 varied according to cancer type ( Figure 2A). MS4A1 was upregulated in AML and ALL (TBM) samples and downregulated in NBL and ALL (TB) samples (Table S1). CR2 was upregulated in AML and downregulated in both NBL and ALL (Table S1). Using the edgeR pipeline, only three PID genes were differentially expressed: ACTB, IGLL1, and RAG1 ( Figure 2B). IGLL1 and RAG1 were upregulated in ALL, whereas ACTB was downregulated in AML (Table S1). RAG1 and IGLL1 were the only two genes that were differentially expressed when using both the limma-voom and egdeR pipelines, indicating that additional genes may be strong candidates for further exploration due to their consistent significant differential expression. Using the edgeR pipeline, only three PID genes were differentially expressed: ACTB, IGLL1, and RAG1 ( Figure 2B). IGLL1 and RAG1 were upregulated in ALL, whereas ACTB was downregulated in AML (Table S1). RAG1 and IGLL1 were the only two genes that were differentially expressed when using both the limma-voom and egdeR pipelines, indicating that additional genes may be strong candidates for further exploration due to their consistent significant differential expression.

PID-Related DEGs Are Associated with Various Immunological and Oncologic Pathways
GO enrichment analysis of DEGs overall (Table S2) and the subset of the PID-related DEGs (Table S3) identified multiple molecular mechanisms that were significantly altered. The identified individual GO terms were grouped into overarching themes. For all the genes that were differentially expressed, metabolic molecular mechanisms were most commonly altered in pediatric cancers, followed by immune system function, developmental processes, vesicle and ion transport, and transcription regulation. For the subset of differentially expressed PID-related genes, immune system functioning was the only common theme identified, as would be expected, owing to the impact of these genes on PIDs. Innate immunity was significantly impacted in all the pediatric cancers studied. Adaptive immune responses and development were identified in ALL and AML, respectively. The only cancers that had significant PID-related gene enrichment analysis results were AML and ALL (Table S3).
STRING was used to assess the gene interaction networks of the 89 and 20 genes highlighted by the DEA and mutational analysis, respectively. A significant enrichment p-value was identified for the gene lists of both the DEGs and genes mutated in two or more patients, demonstrating that the proteins encoded by these genes have significant co-interactions ( Figures S6 and S7 and Tables S7 and S8). Network analyses of the DE genes mutated in two or more patients produced similar results using STRING when assessing the identified GO enrichment of biological processes (Tables S7 and S8). Interestingly, these results identified GO terms similar to those identified in the earlier GO analysis of the DEGs when observing each cancer type individually (Table S3). Most of the terms were related to the immune system, particularly immune system processes (GO:0002376), responses to stimuli or other organisms (GO:0051707/GO:0050896), and immune system responses (GO:0006955; Tables S7 and S8). For the mutated genes, pathways related to proliferation, differentiation, signaling, and cell motility were also commonly observed, highlighting an additional mechanism impacted by these gene mutations beyond what is impacted by the differentially expressed genes (Table S8).

Pediatric Cancers Harbor Mutations in PID-Related Genes
Mutational analysis of the five pediatric tissues using mutation data from the TARGET database revealed that out of the 472 PID-related genes, 64 were mutated in at least one of the 2087 patients, and 20 were mutated in two or more patients, resulting in a total of 115 different mutations observed in genes that were mutated in at least two patients ( Figure 3). The top four mutated genes in pediatric cancers were NRAS (1.8%), KRAS (0.9%), JAK2 (0.3%), and CREBBP (0.3%), highlighting that these genes should be analyzed further to assess the clinical significance of these mutations ( Figure 3). co-interactions ( Figure S6,S7 and Table S7,S8). Network analyses of the DE genes mutated in two or more patients produced similar results using STRING when assessing the identified GO enrichment of biological processes (Table S7,S8). Interestingly, these results identified GO terms similar to those identified in the earlier GO analysis of the DEGs when observing each cancer type individually (Table S3). Most of the terms were related to the immune system, particularly immune system processes (GO:0002376), responses to stimuli or other organisms (GO:0051707/GO:0050896), and immune system responses (GO:0006955; Table S7-S8). For the mutated genes, pathways related to proliferation, differentiation, signaling, and cell motility were also commonly observed, highlighting an additional mechanism impacted by these gene mutations beyond what is impacted by the differentially expressed genes (Table S8).

Pediatric Cancers Harbor Mutations in PID-Related Genes
Mutational analysis of the five pediatric tissues using mutation data from the TAR-GET database revealed that out of the 472 PID-related genes, 64 were mutated in at least one of the 2087 patients, and 20 were mutated in two or more patients, resulting in a total of 115 different mutations observed in genes that were mutated in at least two patients ( Figure 3). The top four mutated genes in pediatric cancers were NRAS (1.8%), KRAS (0.9%), JAK2 (0.3%), and CREBBP (0.3%), highlighting that these genes should be analyzed further to assess the clinical significance of these mutations ( Figure 3). The most common mutations observed were missense mutations (88.7%), followed by truncating mutations (9.6%) and in-frame deletion mutations (1.7%; Figure 3). Genes that included multiple patients with the same mutation were CREBBP (R1446C), CSF3R (T618I), JAK2 (R683S and D873N), KRAS (G12D, G13D, and G12V), and NRAS (Q61K, Q61R, G12D, G13D, G12S, and G12A; Table 1). These mutations highlight the need to further assess their clinical significance in pediatric cancers.

Copy Number Alterations in PID-Related Genes among Pediatric Cancers
Copy number variation (CNV) analysis of four TARGET cancer cohorts (ALL, AML, NBL, and WT) resulted in the identification of 4865 copy number alterations (amplified/deleted regions) across 1105 patients (Table S4). Significant focal amplifications in the genes PNP, FCGR3A, and CFHR1 were found in a small fraction of the patients ( Figure 4A), while deep deletions were observed in ETV6, CFHR1, and RPS15, especially for pediatric ALL and AML ( Figure 4A). Moreover, pediatric ALL and AML were most affected by deep deletions in PID-related genes, with an overall alteration frequency of 66% and 57% across patients in their respective cancer cohorts and deep deletions making up 52% and 44% of the overall frequency, respectively ( Figure 4B). Pediatric WT had an overall alteration frequency of 64%, the majority of which were gene amplifications (42%; Figure 4B). In addition, 61% of pediatric NBL patients harbored copy number alterations, where 39% were amplifications and 22% were deep deletions ( Figure 4B). The overall alteration frequencies were determined by dividing the number of PID-related genes altered by the total number of profiled patients in each cancer cohort. and 44% of the overall frequency, respectively ( Figure 4B). Pediatric WT had an overall alteration frequency of 64%, the majority of which were gene amplifications (42%; Figure  4B). In addition, 61% of pediatric NBL patients harbored copy number alterations, where 39% were amplifications and 22% were deep deletions ( Figure 4B). The overall alteration frequencies were determined by dividing the number of PID-related genes altered by the total number of profiled patients in each cancer cohort.

Altered PID-Related Genes Affect Overall Survival in Pediatric Cancer Cohorts
Survival analysis of the panel of PID-related genes in all TARGET cohorts identified a significant difference in overall survival, with the altered group having significantly lower survival (p = 0.0250; Figure S4). Survival analysis of genes mutated in two or more patients also revealed significant differences in overall survival, with the altered group having significantly decreased survival (p = 1.724 × 10 −6 ; Figure S4).
To better assess the implications of the direction of differential expression on prognosis, a survival analysis was also conducted on each significant PID-related DEG for each cancer type. Significant differences in overall survival were determined using Cox regression and Kaplan-Meier survival analysis. A total of 79 genes were found to have significant effects on prognosis based on their expression (Table S5), and 23 genes had significant impacts on prognosis in at least two different cancer types, including C1R, RPL5, and TERT. Higher expressions of C1R and RPL5 both had significant impacts on overall survival for ALL, AML, and RT, whereas high expression of TERT was significant for survival in ALL, AML, and NBL (Table S5). The results from STRING when assessing the identified GO enrichment of biological processes of the genes are described ( Figure S5 and Table S6).
For all cancer types, upregulation of the C1R gene was associated with poor prognosis in pediatric cancer patients ( Figure 5), especially in the ALL cohort ( Figure 5A). For ALL, upregulation of the RPL5 gene was associated with decreased overall survival (Figure 5). Note that ALL produced the most significant results from this analysis ( Figure  5D,E). Kaplan-Meier curves generated for TERT demonstrate that for ALL, AML, and NBL cancer types, upregulation of TERT results in poorer prognoses ( Figure 5). Interestingly, ALL and NBL had the most significant differences in overall survival based on the increased expression of TERT ( Figure 5F,H).

Altered PID-Related Genes Affect Overall Survival in Pediatric Cancer Cohorts
Survival analysis of the panel of PID-related genes in all TARGET cohorts identified a significant difference in overall survival, with the altered group having significantly lower survival (p = 0.0250; Figure S4). Survival analysis of genes mutated in two or more patients also revealed significant differences in overall survival, with the altered group having significantly decreased survival (p = 1.724 × 10 −6 ; Figure S4).
To better assess the implications of the direction of differential expression on prognosis, a survival analysis was also conducted on each significant PID-related DEG for each cancer type. Significant differences in overall survival were determined using Cox regression and Kaplan-Meier survival analysis. A total of 79 genes were found to have significant effects on prognosis based on their expression (Table S5), and 23 genes had significant impacts on prognosis in at least two different cancer types, including C1R, RPL5, and TERT. Higher expressions of C1R and RPL5 both had significant impacts on overall survival for ALL, AML, and RT, whereas high expression of TERT was significant for survival in ALL, AML, and NBL (Table S5). The results from STRING when assessing the identified GO enrichment of biological processes of the genes are described ( Figure S5 and Table S6).
For all cancer types, upregulation of the C1R gene was associated with poor prognosis in pediatric cancer patients ( Figure 5), especially in the ALL cohort ( Figure 5A). For ALL, upregulation of the RPL5 gene was associated with decreased overall survival ( Figure 5). Note that ALL produced the most significant results from this analysis ( Figure 5D,E). Kaplan-Meier curves generated for TERT demonstrate that for ALL, AML, and NBL cancer types, upregulation of TERT results in poorer prognoses ( Figure 5). Interestingly, ALL and NBL had the most significant differences in overall survival based on the increased expression of TERT ( Figure 5F,H). Cox proportional hazard ratios (HR) and 95% CI are described. p-value is based on log-rank test and 95% CI is represented by the shaded region.

Discussion
Differential expression analysis with the limma-voom pipeline highlighted multiple genes, CR2, MS4A1, and SEMA3A, which should be further analyzed, given their prevalence among multiple cancer types. CR2 encodes complement receptor 2 (CD21) and is a co-receptor on B cells, which play a central role in autoimmunity, such that patients with systemic lupus erythematosus show reduced CR2 levels [20]. CR2 also plays a role in recognizing foreign DNA during immune defense responses, which may indicate that CR2 contributes to the immune surveillance of tumor cells [3,20]. MS4A1 encodes CD20, which is known to be induced by chemokine signaling and has been shown to interact with multiple B-cell surface proteins [21]. Interestingly, CD20 is expressed on the surface of both normal and malignant B-cells and has been a target for therapies using anti-CD20 monoclonal antibodies, highlighting that these therapies may also be effective for pediatric cancers [21]. Patients with Kallmann syndrome who experience gonadotropin-releasing hormone deficiency have been found to exhibit mutations in SEMA3A; however, regarding immune function, SEMA3A specifically has been found to act as an immune cell overactivation suppressor and is downregulated in some autoimmune diseases [22,23]. The aforementioned genes highlight molecular mechanisms, including B-cell receptor Cox proportional hazard ratios (HR) and 95% CI are described. p-value is based on log-rank test and 95% CI is represented by the shaded region.

Discussion
Differential expression analysis with the limma-voom pipeline highlighted multiple genes, CR2, MS4A1, and SEMA3A, which should be further analyzed, given their prevalence among multiple cancer types. CR2 encodes complement receptor 2 (CD21) and is a co-receptor on B cells, which play a central role in autoimmunity, such that patients with systemic lupus erythematosus show reduced CR2 levels [20]. CR2 also plays a role in recognizing foreign DNA during immune defense responses, which may indicate that CR2 contributes to the immune surveillance of tumor cells [3,20]. MS4A1 encodes CD20, which is known to be induced by chemokine signaling and has been shown to interact with multiple B-cell surface proteins [21]. Interestingly, CD20 is expressed on the surface of both normal and malignant B-cells and has been a target for therapies using anti-CD20 monoclonal antibodies, highlighting that these therapies may also be effective for pediatric cancers [21]. Patients with Kallmann syndrome who experience gonadotropin-releasing hormone deficiency have been found to exhibit mutations in SEMA3A; however, regarding immune function, SEMA3A specifically has been found to act as an immune cell overactivation suppressor and is downregulated in some autoimmune diseases [22,23]. The aforementioned genes highlight molecular mechanisms, including B-cell receptor signaling and the suppression of immune cell activation, which are important immune functions that should be studied further to better understand the pathways between PIDs and the development of pediatric cancers.
Differential expression analysis highlighted two additional genes, RAG1 and IGLL1, for further study due to their identification in both DEA pipelines. RAG1 is responsible for the initiation of V(D)J recombination, which diversifies and strengthens the antigenspecific receptors of T-cells and B-cells, and mutations in the gene have been associated with immunodeficient and autoimmune clinical phenotypes [24]. Recent studies have shown that the high expression of RAG1 is a potential driver of oncogenesis in B-cell ALL, and the aberrant activity of RAG1/2 is associated with the promotion of lymphocytic malignancies due to chromosomal translocations and DNA deletions in cancer genes [25,26]. Mutations in IGLL1 have been observed to lead to defects in B-cell function, as it belongs to the immunoglobulin gene superfamily and is important in humoral immunity [27]. Genes that are most commonly differentially expressed between normal and cancer tissues are often involved in B-cell receptor regulation or signaling, indicating an important molecular mechanism requiring further exploration to increase our understanding of the relationship between PIDs and pediatric cancers.
Protein-protein network interactions identified through STRING analysis revealed several biological connections between the proteins translated by PID-related DEGs and genes mutated in two or more pediatric cancer patients. Given our developing understanding of the impact of PID-related genes on malignancies, the common theme of immune system processes being widely involved was expected. However, enrichment analysis has identified additional pathways in the mutated PID-related genes that could demonstrate the potential of these genes to alter cell cycle progression and metastasis, which are hallmarks of cancer, due to the enrichment of proliferative and motility processes [3][4][5]. Further analysis is necessary to outline how these mutations can lead to the development and progression of pediatric cancers.
Leukemia and PIDs are both disorders of the hematopoietic system [3,28]. Therefore, our findings were consistent with our expectations: PID-related genes were more commonly differentially expressed in pediatric leukemias and produced meaningful enrichment analysis results because similar immune components were likely involved. To better understand the molecular mechanisms linking PIDs to pediatric leukemias, differentially expressed PID-related genes and altered pathways in pediatric leukemias should be further analyzed.
The mutational analysis highlighted the likely importance of NRAS, KRAS, JAK2, and CREBBP in pediatric cancer development because these genes are the most commonly mutated genes in pediatric cancer patients. Mutations in the RAS genes, including NRAS and KRAS, have been observed in many tumors, with NRAS mutations being more common in patients with AML [29]. The promotion of oncogenesis has been observed as a result of mutations in Ras proteins, highlighting the importance of further studying these genes, given their role in pediatric cancer molecular mechanisms and development [30]. JAK2 mutations have been observed in malignancies and hematologic diseases [31]. JAK2 encodes a cytoplasmic tyrosine kinase that is important in the signal transduction of hematopoietic growth factors, which is necessary to maintain the integrity of cellular viability [31,32]. CREBBP is involved in histone modification [33]. Interestingly, alterations in CREBBP among pediatric patients with ALL who do not relapse are rare, but CREBBP mutations are common in childhood ALL cases that do relapse, highlighting the importance of further elucidating their role in pediatric cancers [33].
The mutational analysis also highlighted specific mutations found in multiple pediatric cancer patients. The CREBBP R1446 position is located in the histone acetyltransferase domain, which is involved in substrate binding [33]. The R1446 location is noted to be a hotspot for driver mutations associated with pediatric ALL, with missense mutations at R1446C, R1446H, and others being common [33,34]. CSF3R mutations were previously identified as oncogenic, and the T618I mutation has been identified to truncate the cytoplasmic tail of CSF3R, which is similar to CSF3R mutations that have been observed to progress congenital neutropenia to AML [35]. Mutations involved in the amino acid residue R683 of JAK2 are commonly found in childhood cancers and impact the auto-regulation of JAK2 activity [36]. Specifically, R683S has been identified in ALL, and it may upregulate phosphorylation levels and the activation of the MEK/ERK pathway [36,37]. The MEK/ERK pathway is necessary to break B-cell tolerance, indicating a possible mechanism that can explain how the R683S mutation may lead to the development of PIDs and pediatric cancers [38]. Mutation D873N of JAK2 has also been linked to ALL [37]. JAK2 is a signaling molecule for cytokines; thus, when mutated, it may indicate a mechanism explaining altered immune surveillance in pediatric cancers [39]. KRAS mutations at the amino acid location G12 have been associated with more aggressive cancer types and a poorer prognosis [40]. KRAS G13 mutations have also been identified in cancers and have been potential therapeutic targets for epidermal growth factor receptor (EGFR) inhibitors [41]. All NRAS mutations at amino acid locations G12, G13, and Q61 have been identified in cancers and upregulate NRAS cycling; however, they impact NRAS activity differently [42,43]. The incidence of these mutations varies among cancer types such that NRAS mutations in melanomas are almost entirely at the Q61 position [43]. All mutations observed in at least two different patient profiles had previously been identified in the literature on cancer; however, they have been found to alter different mechanisms. In addition, the extent of knowledge of the aforementioned mutations varies. Therefore, the pathways that connect these genes and the specific molecular mechanisms impacted must be further analyzed to elucidate the relationship between PID-related genes and pediatric cancers. Despite the specific mutations that differ between PIDs and pediatric cancers, there are common PID-related genes that are mutated in pediatric cancers, highlighting their likely relationship.
CNV analysis revealed that a significant proportion of PID-related genes were altered across the four pediatric cancer cohorts examined. Deep deletion of ETV6, a transcriptionregulating gene, has been previously implicated in leukemias and may play a role in oncogenesis, treatment response, and outcome in pediatric ALL patients [44][45][46][47]. Furthermore, the analysis identified CNVs in the ribosomal protein (RP) gene family, specifically RPS15, with relation to deep deletion in pediatric ALL and AML. In a study of adult cancers, a significant prevalence of alteration in RP genes was observed, with 82% of tumors harboring single/double deletions within these genes [48].
CNVs in the complement factor H (CFH) gene cluster were commonly identified in the CNV analysis, with CFHR1 being notably deep deleted in the pediatric ALL and AML samples. Interestingly, a high prevalence of deletions in CFH-related genes 3 and 1 (CFHR3-CFHR1) have been detected in pediatric cancer patients post-hematopoietic stem cell transplant (HSCT) [49]. Heterozygous deletions in CFHR3-CFHR1 genes have also been linked to atypical hemolytic uremic syndrome in pediatric ALL patients and transplantassociated thrombotic microangiopathy in neuroblastoma patients [50,51]. As such, the role of CNVs in PID-related genes and their significance in the development, maintenance, and outcomes of pediatric cancers should be thoroughly considered.
Only three genes-ACTB, CSF3R, and GATA2-were identified in both the DEA and mutational analysis, which highlights their possible significance in cancer and disease development. Mutations in ACTB have been reported in congenital developmental disorders and adult lymphoid hematological cancers as well as pericytomas when an ACTB gene fusion is present [52][53][54]. However, little has been reported on its role in pediatric cancers. CSF3R encodes the receptor for colony-stimulating factor 3, which is a member of the IL-6 superfamily of cytokines, and mutations in the gene have led to the development of AML [35]. GATA2 variants carry a known predisposition to pediatric cancers due to the disruption of the maintenance of hematopoietic stem cells [55]. The three commonly altered genes play roles in important mechanisms regarding pediatric cancer development. However, the interplay between them should be further analyzed to better understand the implications when they are dysregulated. For instance, these variants of interest may be investigated for their impact on oncogenesis and immune evasion through comparative screening of critical oncogenic and immune-related markers in altered and unaltered cells, or qualitative analysis of tissue for such markers in affected and unaffected patients.
Using cBioPortal, significant differences in prognosis were identified for the panel of PID-related genes and genes mutated in two or more patients. Therefore, variants in PID-related genes may have an impact on the prognosis of those with pediatric cancers, and the mechanisms behind how the mutations lead to poorer outcomes should be studied. Furthermore, this also highlights the poorer prognosis for individuals with variant expression in the commonly mutated genes, which may indicate that alternative clinical approaches are needed for these patients to improve their prognosis. Continued analysis regarding survival outcomes of patients based on the direction of differential expression should be conducted to better understand the implications of altered regulations of all the identified genes.
Survival analysis of each significant PID-related DEG in each cancer type identified significant differences in prognosis based on the regulation of many genes. C1R, which was identified to decrease overall survival for patients with ALL, AML, and RT when upregulated, has been reported to promote the progression of squamous cell carcinoma when elevated [56]. The findings suggested that C1R may be a common prognostic indicator for a variety of cancer types. RPL5 has been noted to be a tumor suppressor gene, and downregulation of the gene has been associated with cancer recurrence and poor prognosis [57,58]. This highlights the likely reason for the downregulation of RPL5 resulting in poorer prognosis for RT patients but not the reason for the upregulation of RPL5 resulting in poorer prognosis for both ALL and AML patients. Further analysis is necessary to understand the differences in RPL5 mechanisms of action in leukemias compared to other cancer types regarding its role in prognosis.
Upregulation of TERT was determined to worsen the patient prognosis for multiple cancer types, which may be expected based on reports that telomerase activity increases with upregulation of TERT [59]. Increased telomerase activity allows for greater proliferative potential and immortality of cancerous cells, which represent hallmarks of cancer [4,5]. Upregulation of TERT resulting in increased telomerase activity may, in part, explain the poor prognoses identified in this study and previously in breast cancer [59]. Identification of these genes that resulted in a poor prognosis for multiple cancer types indicated potential biomarkers that suggest the need to adapt treatment strategies from those currently used to improve patient outcomes.
Interestingly, the common DEGs had cancer-type selective results on prognosis. MS4A1, CR2, IGLL1, and RAG1 each significantly impacted prognosis in ALL, and SEMA3A significantly impacted survival in AML. Upregulation of MS4A1, IGLL1, RAG1, and SEMA3A resulted in significantly decreased survival in pediatric cancer patients. Of particular note, SEMA3A was under-expressed in all identified cancer types compared to their healthy tissue samples, which may suggest that reactivation of SEMA3A is critical for cancer progression. In fact, a previous study indicated that SEMA3A upregulation promotes tumorigenesis and leads to poorer overall prognosis in adult cancers [60]. Considering that upregulation of MS4A1, IGLL1, and RAG1 results in decreased overall survival, further analysis to better understand the implications of upregulation could highlight potential patterns for variations in prognosis. In contrast to the four genes highlighted through DEA, CR2 was under-expressed in ALL in the DEA and resulted in a poorer prognosis in ALL. Therefore, further analysis of CR2 regarding the mechanisms impacted by its downregulation should be performed to better determine its prognostic impact and may suggest the need for alternative therapeutic approaches to be pursued for individuals with pediatric ALL and CR2 downregulation.
Multiple significant DEGs were observed, including CR2, MS4A1, IGLL1, RAG1, and SEMA3. Many DEGs were involved in B-cell receptor regulation and signaling, highlighting pathways that must be further explored to assess their significance in pediatric cancer development and outcomes. Enrichment analysis indicated that pediatric leukemias were most commonly involved in significantly altered molecular pathways and PID-related gene expression. While this may be expected based on the hematological relationship between leukemia and PIDs, the mechanisms behind this relationship should be further explored.
Protein-protein interactions and GO pathway analysis further confirmed these findings, in addition to highlighting biological processes related to hallmarks of cancer and may suggest some of the mechanisms of action by which the identified PID-related genes impact cancer development. The mutational analysis determined that genes previously identified in cancers were most commonly observed in the profiled patients with mutations in NRAS, KRAS, JAK2, and CREBBP. Additionally, CNV analysis identified amplification of PNP, FCGR3A, and CFHR1 as well as deep deletion of ETV6, CFHR1, and RPS15 as the most frequently detected copy number altered PID-related genes across the four available pediatric cancer cohorts. Multiple PID-related genes appear to have significant implications in pediatric cancers; however, the distinct molecular mechanisms between PIDs and pediatric cancers in relation to these genes must be further explored.
Overall survival was decreased for the PID-related and commonly mutated genes, which highlights the need for further investigation into these genes to better understand the mechanisms of action being altered and their potential clinical significance when treating patients. Furthermore, survival analysis of each significant PID-related DEG determined that C1R, RPL5, and TERT may be indicative of patient prognosis when differentially expressed and should be studied further to better understand how treatment alterations regarding these genes may improve patient outcomes.

Conclusions
Future directions from this project include investigating differences in expression and patient prognosis based on smaller subsets of age groups including children, youth, and young adults as well as additional cancer types. This study highlighted multiple PIDrelated genes for further investigation regarding their implications in PIDs and pediatric cancer mechanisms, which may lead to the identification of new therapeutic targets. samples against the GTEx-blood or kidney normal tissue tissues depending on cancer type. This analysis was performed with the TCGAbiolinks R package using the edgeR pipeline. Figure S3: Venn diagram summary of DEA results for PID-related genes that are significantly differentially expressed in pediatric cancer types including AML, ALL, WT, RT, and NBL. The different gene names are listed in each of the cancer types in which they were significantly differentially expressed in. The colour of the gene represents the differential expression of each gene either upregulated (red), downregulated (blue), or mixed (black) if the gene had mixed regulation in different cancer types. The analyses have been performed using the TCGAbiolinks R package and the limma-voom pipeline. Figure S4: Kaplan-Meier survival curve summary for (A) 472 PID-related genes (p = 0.025) and (B) 20 genes mutated in >0.1% of patients (p = 1.724 × 10 −6 ). The altered group is defined as harboring somatic mutations from whole genome or whole exome sequencing. Only patients with mutations data were used from the portal in this analysis. p < 0.05 was considered to indicate statistical significance. Analysis was done using cBioPortal including the following TARGET projects: ALL-P2, AML, NBL, RT, and WT. Figure S5: STRING network visualization retrieved for the 23 PID-related genes with significant effects on overall survival in at least two different pediatric cancer types. STRING (v11) online platform was used for analysis. Figure S6: STRING network visualization retrieved for the 89 PID-related differentially expressed genes in pediatric cancers. STRING (v11) online platform was used for analysis. Figure S7: STRING network visualization retrieved for the 20 PID-related genes mutated in two or more pediatric cancer patients. STRING (v11) online platform was used for analysis. Table S1: Summary of DEA results for PID-related genes that are significantly differentially expressed in pediatric cancer types including AML, ALL, WT, RT, and NBL. Analyses were performed using the TCGAbiolinks R package and the limma-voom (use adj. p-value) or edgeR (use FDR) pipeline. Table S2: GO ID and terms from the biological process GO enrichment analysis of all differentially expressed genes TARGET tumor samples against the GTEx normal tissue samples. This analysis was performed with the topGO R package. Table S3: GO ID and terms from the biological process GO enrichment analysis of all PID-related differentially expressed genes TARGET tumor samples against the GTEx normal tissue samples. This analysis was performed with the topGO R package. Table S4: Summary copy number variant analysis for 472 PID-related genes in all the following cancer types: ALL, AML, NBL and WT. Attached file: Supplementary_Data_S1.xlsx. Table S5: Summary survival analysis results for 89 differentially expressed genes in the following cancer types: ALL, AML, CCSK, NBL, RT, and WT. This analysis was done using Cox regression analysis with a generated R script. Attached file: Supplementary_Data_S2.xlsx. Table S6: Summary results from the biological process GO enrichment analysis of the 23 PID-related genes with significant effects on overall survival in at least two different pediatric cancer types. This analysis was performed using STRING (v11). Table S7: Summary results from the biological process GO enrichment analysis of the 89 PID-related differentially expressed genes. Only the top 40 terms are included in this table. This analysis was performed using STRING (v11). Table S8: Summary results from the biological process GO enrichment analysis of the 20 PIDrelated genes mutated in two or more pediatric cancer patients. Only the top 40 terms are included in this table. This analysis was done using STRING (v11).