Investigation of Rare Non-Coding Variants in Familial Multiple Myeloma

Multiple myeloma (MM) is a plasma cell malignancy whereby a single clone of plasma cells over-propagates in the bone marrow, resulting in the increased production of monoclonal immunoglobulin. While the complex genetic architecture of MM is well characterized, much less is known about germline variants predisposing to MM. Genome-wide sequencing approaches in MM families have started to identify rare high-penetrance coding risk alleles. In addition, genome-wide association studies have discovered several common low-penetrance risk alleles, which are mainly located in the non-coding genome. Here, we further explored the genetic basis in familial MM within the non-coding genome in whole-genome sequencing data. We prioritized and characterized 150 upstream, 5′ untranslated region (UTR) and 3′ UTR variants from 14 MM families, including 20 top-scoring variants. These variants confirmed previously implicated biological pathways in MM development. Most importantly, protein network and pathway enrichment analyses also identified 10 genes involved in mitogen-activated protein kinase (MAPK) signaling pathways, which have previously been established as important MM pathways.


Introduction
Multiple myeloma (MM) is a malignancy of plasma cells that are specialized and terminally differentiated B cells. Plasma cells synthesize and secrete antibodies to maintain humoral immunity. MM is characterized by the expanded proliferation of a single clone of plasma cells in the bone marrow, leading to the enhanced production of monoclonal immunoglobulin, also called M protein. The presence of M protein is an important diagnostic criterion for MM, along with the "CRAB" features, which is a mnemonic for calcium levels, renal failure, anemia and bone lesions, which have recently been extended [1]. In most cases, patients diagnosed with MM have one of the two precursor conditions, monoclonal gammopathy of unknown significance (MGUS) or smoldering multiple myeloma (SMM) [1].
MM is the second most common hematological malignancy, responsible for 1% of overall cancer-related deaths [2]. Although a relatively uncommon global disease, it is prevalent in countries with high socioeconomic status [3]. The genetic architecture of multiple myeloma is very complex. It consists of primary and secondary genetic events, including, but not limited to, chromosomal translocations, regional gains and deletions, hyperdiploidies, gene mutations and copy number variations (CNVs) [1]. In addition, high-risk, rare, high-penetrance germline variants have been discovered through wholeexome (WES) and whole-genome sequencing (WGS) in MM families [4][5][6]. Genomewide association studies (GWASs) have also helped to discover several common and low-penetrance risk loci [7,8].
Inherited predisposition to MM is evident among the first-degree relatives of MM patients, who are at a 2-4 times higher risk of developing this disease when compared to the general population [9]. In our previous study, we investigated 21 MM/MGUS families to identify germline predisposition genes through WGS and WES. Several pathogenic coding variants, including missense, loss-of-function (LoF) and CNVs, were identified. These variants were in genes functionally related to previously suggested MM susceptibility, immune process, tumor-related and MM somatic driver genes [6].
To further explore the basis of MM predisposition in MM families, we focused on the non-coding region of the genome in the present study. The non-coding region makes up 98% of the total human genome. Moreover, non-coding variants are gaining importance in the understanding of inherited cancer susceptibility [10]. Non-coding variants, e.g., the 5 untranslated region (UTR) and 3 UTR, due to their location upstream of the transcription start and downstream of the transcription end site, respectively, can bring about changes in transcription and posttranscriptional regulation. Considering the meaningful regulatory potential of these variants, we examined and prioritized non-coding variants from the WGS data of 14 MM families from Germany and the Netherlands. Prioritization was carried out using our internally established Familial Cancer Variant Prioritization Pipeline (FCVPPv2) [11] and other non-coding variant prioritization tools, such as Combined Annotation Dependent Depletion v1.6 (CADD) and SNPnexus [12,13].

Multiple Myeloma Families and Whole-Genome Sequencing
Samples from the patients and their healthy family members, as well as other familial and clinical information, were obtained after informed written consent. The study was carried out according to the rules of the Declaration of Helsinki, after the approval of the ethics committee of the Medical Faculty of the University of Heidelberg. All the patients, from the University Medical Center Groningen (UMCG), Netherlands, were briefed and signed consent was obtained for WGS to identify the cause of cancer predisposition in their families. These patients were enrolled as part of the Groningen-Heidelberg-Stettin EU TRANSCAN familial cancer whole-genome sequencing project because of their family history of cancer. They were referred to UMCG clinically for diagnostics and counseling because of their cancer family history. Clinical requirements for their testing and WGS did not indicate any further need for the approval of the ethics review board of the UMCG.
In total, 14 families with 31 cases and 16 unaffected individuals (controls or possible carriers) participated in this study (Supplementary Figure S1). Among these, 12 families were recruited from Heidelberg, Germany, and two from the Netherlands. At least two cases were enrolled from each family. These individuals were diagnosed either with MM and its precursors, MGUS and SMM, or with AL amyloidosis. Participating unaffected family members recruited in Heidelberg were analyzed for the following parameters: blood count, creatinine, glomerular filtration rate, calcium, immunoglobulin levels, free light chains and their ratios, protein electrophoresis and immunofixation in serum and urine to exclude undetected MM or its precursor stages [14]. Only individuals with negative immunofixation in serum and urine were considered unaffected.
Sequencing of all the samples was carried out at the core facility of DKFZ in Heidelberg. DNA was extracted from peripheral blood using the QIAamp ® DNA Mini Kit (40724 Hilden, Germany). Paired-end sequencing with a 150 bp read length was performed on the Illumina X10 platform (10785 Berlin, Germnay), followed by sequence mapping to the reference human genome (build GRC37, assembly hs37d5) using BWA mem (version 0.7.15, with parameters: -T 0) [15] and the removal of duplicates via Sambamba (version 0.6.5, with parameters: t 1 -l 0 -hash-table-size = 2000000 -overflow-list-size = 1000000 -io-buffer-size = 64) [16].
Variant calling for single-nucleotide variants (SNVs) and indels was carried out using Platypus (version 0.8.1) [17]. The variants were annotated with Gencode (v19) gene definitions in a multistep process using the following tools: ANNOVAR [18], 1000 Genomes phase III [19], dbSNP [20], dbNSFP v2.9 [21] and ExAC [22] at a read depth of >10. A minor allele frequency threshold of 0.001 was used for gnomAD exome and genome data [23] and a variant frequency of 2% from the local set to remove common variants and technical artifacts, respectively. A pairwise comparison of the variants in the cohort was performed to confirm family relatedness and exclude sample mix-ups.

Prioritization through FCVPPv2
We used our in-house variant filtering pipeline, the Familial Cancer Variant Prioritization Pipeline (FCVPP) version 2, developed by Kumar et al., for the pedigree-based prioritization of the variants [11]. Pedigree segregation meant that variants were selected if they were present in all the cases of a family and absent from all the healthy family members. The possible carriers could show either the presence or absence of the variant of interest. Family members were considered as cases if they were diagnosed with MM, MGUS or AL amyloidosis. Those detected with plasma cell dyscrasias, solitary plasmacytomas and aberrant plasma cell clones were termed "possible carriers". Healthy family members without these two parameters (MM, MGUS or AL amyloidosis diagnosis and plasma cell anomalies) were considered non-carriers. The exceptions to the above rule were the healthy family members who were more than 10 years younger than the earliest age of diagnosis in the family; these were treated as "possible carriers". Using the CADD tool v1.3, a filter of ≥15 was applied after pedigree segregation to obtain the top 1.5% deleterious variants in the human genome. In addition, another web-based annotation tool, SNPnexus [13], was used to check for different non-coding scores, such as EIGEN [24], Funseq2 [25], FATHMM [26], ReMM [27] and Deep-SEA [28].
After these filtering steps, non-coding variants were selected for further evaluation; these included 5 UTR, 3 UTR, upstream variants and variants that were labeled upstream and downstream. The variants were visually inspected, using the Integrative Genomics Viewer (IGV; version 2.4.10) [29], within WGS data for cases and controls, as an added measure to minimize the possibility of false-positive results and to enhance the confidence of variant calls.

Conservation
The selected non-coding variants were then prioritized based on their conserved locations using three different evolutionary conservation scores; these included Genomic Evolutionary Rate Profiling (GERP) score > 2 [30], vertebrate PhastCons ≥ 0.3 [31] and vertebrate Phylogenetic P-value (PhyloP) ≥ 3.0 [32]. Variants were additionally assigned a score of 1-3 depending upon how many out of the three conservation scores were positive.

Analysis of Upstream and 5 UTR Variants
The 5 UTR and upstream variants were investigated according to the following steps. At first, the variants were intersected with the human promotor database downloaded from FANTOM 5 [33] using bedtools. CADD v1.6 s web-based interface gave information about the percentage of GC content, presence of CpG islands, transcription factor binding sites (TFBSs) and chromatin states in 127 cell lines and histone marks in 14 cell lines and tissues for the loci that our variants were present in.

TFs/TF Binding Sites
Prioritized upstream and 5 UTR variants were further assessed based on their location at TFBSs. Publicly available TF ChIP-seq data were obtained from ENCODE for the GM12878 cell line [34]. These data were compared with previously published TF enrichment data for MM [7]. To investigate the effect of a variant on TF binding, short FASTA mutated and wild-type sequences having variant points with 10 bp upstream and 10 bp downstream were uploaded on JASPAR for the above-mentioned best-performing variants [35].

Graphic Visualization
To obtain a visual representation of 5 UTR and upstream variants along with the different regulatory elements, variant maps were created using the UCSC genome browser [36].

Analysis of 3 UTR Variants
The 3 UTR variants were further investigated for being located at putative miRNA target sites. For this purpose, the entire human miRNA target atlas was downloaded from TargetScan (Release 7.0) [37] and matched against the filtered 3 UTR variants using bedtools' intersect function to obtain miRNA matches along with a context++ percentile score. A context++ score percentile of 90 or above was considered to be a significant score. Using CADD v1.6. [12], ChromHmm chromatin states (from 127 cell lines from the NIH roadmap epigenomics mapping consortium) [38], the Segway chromatin pattern [39] and the mirSVR score were extracted. Variants were marked positively if they had a mirSVR score of less than −0.1, as sites with mirSVR scores lower than −0.1 are generally considered good miRNA target sites with a high probability of downregulation of gene expression [40].

Biological Function and Pathway Enrichment Analysis
All the respective genes from the pipeline surviving variants were used for protein interaction network analysis using STRING v10 [41] and for pathway enrichment analysis using Reactome [42]. Biological function information for both sets of variant genes was collected through UniProtKB/Swiss-Prot [43].
A sequential flow chart of all the above prioritization tools with the filtered number of variants at each step is shown in Figure 1.

Results
WGS on 14 MM families identified 928,170 rare variants (MAF < 0.1%); these included variants annotated by ANNOVAR as exonic, intronic, intergenic, splicing, upstream, downstream, upstream; downstream, 3′ UTR, 5′ UTR and 3′ UTR; 5′ UTR ( Figure 1). Among these annotations, the focus of this present work was on the 3′ UTR, 5′ UTR and upstream variants, which amounted to 20,445. After pedigree segregation in the next step, this number was reduced to 2682. Further pruning was performed when the CADD score of ≥ 15 was applied, resulting in 150 variants. As most of the non-coding scores extracted using SNPnexus were high after filtering for CADD ≥ 15 (data not shown), these were not used for the prioritization of the variants. Out of these pipeline-surviving variants, 51 were 5′ UTR, 53 were upstream or upstream; downstream and 46 were 3′ UTR variants.
Through the in silico functional analysis of the 104 5′ UTR and upstream variants with CADD v1.6., a conservation score, the presence in the promotor region of the respective gene and within a CpG island, as well as the chromatin state, histone marks and TFBSs on the location of each variant were compiled, as shown in Supplementary Table S1, and all TFs binding to the variant positions according to the ENCODE data are shown in supplementary Table S2. The variants with positive scores of the selected annotations in CADD v1.6. were shortlisted as the 14 top variants (Table 1). Genes identified through the

Results
WGS on 14 MM families identified 928,170 rare variants (MAF < 0.1%); these included variants annotated by ANNOVAR as exonic, intronic, intergenic, splicing, upstream, downstream, upstream; downstream, 3 UTR, 5 UTR and 3 UTR; 5 UTR ( Figure 1). Among these annotations, the focus of this present work was on the 3 UTR, 5 UTR and upstream variants, which amounted to 20,445. After pedigree segregation in the next step, this number was reduced to 2682. Further pruning was performed when the CADD score of ≥15 was applied, resulting in 150 variants. As most of the non-coding scores extracted using SNPnexus were high after filtering for CADD ≥ 15 (data not shown), these were not used for the prioritization of the variants. Out of these pipeline-surviving variants, 51 were 5 UTR, 53 were upstream or upstream; downstream and 46 were 3 UTR variants.
Through the in silico functional analysis of the 104 5 UTR and upstream variants with CADD v1.6., a conservation score, the presence in the promotor region of the respective gene and within a CpG island, as well as the chromatin state, histone marks and TFBSs on the location of each variant were compiled, as shown in Supplementary Table S1, and all TFs binding to the variant positions according to the ENCODE data are shown in Supplementary Table S2. The variants with positive scores of the selected annotations in CADD v1.6. were shortlisted as the 14 top variants (Table 1). Genes identified through the top variants were SP5 (transcription factor Sp5), FNDC3B (fibronectin type III domain (v-erb-b2 erythroblastic leukemia viral oncogene homolog 3 (avian)), PSMC6 (proteasome (prosome, macropain) 26S subunit, ATPase, 6), CAMK1 (calcium/calmodulin-dependent protein kinase I), PLEKHG1 (pleckstrin homology domain containing, family G (with RhoGef domain) member 1) and DLG1 (discs, large homolog 1 (Drosophila)). All these top variants were annotated to be in the promoters of the corresponding genes, except the variant in DLG1, which was annotated to the promoter of DLG1-AS1. All were also mapped to CpG islands and they were located within binding sites for many important TFs, as shown in Table 1 and Supplementary Tables S1 and S2, extracted through ENCODE [34]. A FASTA sequence search around the variant through Jaspar [35] also showed the changes in binding sites due to these variants (Supplementary Table S3). Limited consensus was observed between TFBSs from ENCODE and Jaspar; in most cases, both highlighted the same TF families. We here only show the TFBS differences caused by our variants between the wild-type and mutated sequence in the Jaspar tables. Segway classification, chromatin state and histone mark evaluation supported their locations in active transcription start sites and in promotor or enhancer regions (Table 1, Supplementary Table S1).
The in silico functional scores for the 46 3 UTR variants are described in Supplementary Table S4, including conservation scores, miRNA binding sites, mirSVR scores and chromatin states. Supplementary Table S5 presents all miRNAs binding to the positions of these variants. The variants with positive scores in all aspects of functional analysis were shortlisted as six 3 UTR top variants. Among these top shortlisted variants, we identified genes such as LONRF1 (LON peptidase N-terminal domain and ring Finger 1), SGSM2 (small G protein signaling modulator 2), SLC35A1 (solute carrier family 35 member A1), B4GALT5 (beta 1,4-galactosyltransferase, polypeptide 5), MARCHF8 (membrane-associated ring finger 8, E3 ubiquitin-protein ligase) and FAM76B (family with sequence similarity 76, member B). All six variants fulfilled the conservation criteria, all had good mirSVR scores (<−0.1), and Segway and chromatin marks confirmed the location at the gene end (Table 2). miRNA matches were found for all of the selected variants, and, except for miRNAs in B4GALT5 and MARCHF8, all others had very high context++ scores (>90).
In a couple of instances, two different variants of the same gene were prioritized in unrelated families. One of these genes was CAMK1 (calcium/calmodulin-dependent protein kinase I). Its variants were identified in two families, i.e., in family 15, a 5 UTR variant (3_9811535_G_A), and in family 2, a 3 'UTR variant (3_9799045_C_G). The other was ZNF236, which had a 3 'UTR variant in family 6 and a 5 UTR variant in family 2.
The prioritized variants located in the 5 UTR and upstream or 3 UTR were grouped according to their biological functions obtained through Uniprot ( Figure 2). Common pathways between both sets of genes included transcription, signal transduction, cell cycle and differentiation, bone development, metabolism and growth, chromatin organization, cell adhesion, immune system and protein transport.   Supplementary Tables), Segway, chromatin states in 127 cell lines (those states are shown that are present in 20% or more cell lines) and mirSVR scores (−0.1 is considered significant) are extracted from annotation data on CADD website, miRNA binding and context ++ score (90 or above is considered high percentile score with high chance of miRNA affecting the mRNA) information is taken from Target Scan's human miRNA atlas; overall function of the genes is from Uniprot/Swissprot.  Independent protein interaction network and pathway enrichment analyses for the proteins corresponding to the genes with all the 150 upstream, 5′ UTR and 3′ UTR variants were performed with STRING and Reactome pathway analyses, respectively, and they gave similar results. Figure 3 shows the STRING protein interaction network for proteins of the upstream and 5′ UTR variant genes, highlighting proteins that belong to the mitogen-activated protein kinase (MAPK) and ErbB pathways in blue and green colors, respectively. These included PIK3R1 (phosphatidylinositol 3-kinase regulatory subunit alpha/P85-ALPHA), DLG1 (Discs large MAGUK scaffold protein 1), SPTB (ppectrin beta chain, erythrocytic (Beta-I spectrin)), APBB1IP (amyloid beta A4 precursor protein-binding family B member 1-interacting protein), CAMK2D (calcium/calmodulin-dependent protein kinase type II subunit delta), ERBB3 (Erb-B2 receptor tyrosine kinase 3), ERBB4 (receptor tyrosine-protein kinase erbB-4), PSMC6 (proteasome 26S subunit, ATPase 6) and PTK2/FAK1 (protein tyrosine kinase 2/Focal adhesion kinase 1). Functional details of all these pathway variants are also included in Table 1. Among these nine genes, ERBB3, PSMC6 and DLG1 were already among the pipeline's prioritized top variants. a b Independent protein interaction network and pathway enrichment analyses for the proteins corresponding to the genes with all the 150 upstream, 5 UTR and 3 UTR variants were performed with STRING and Reactome pathway analyses, respectively, and they gave similar results. Figure 3 shows the STRING protein interaction network for proteins of the upstream and 5 UTR variant genes, highlighting proteins that belong to the mitogen-activated protein kinase (MAPK) and ErbB pathways in blue and green colors, respectively. These included PIK3R1 (phosphatidylinositol 3-kinase regulatory subunit alpha/P85-ALPHA), DLG1 (Discs large MAGUK scaffold protein 1), SPTB (ppectrin beta chain, erythrocytic (Beta-I spectrin)), APBB1IP (amyloid beta A4 precursor protein-binding family B member 1-interacting protein), CAMK2D (calcium/calmodulin-dependent protein kinase type II subunit delta), ERBB3 (Erb-B2 receptor tyrosine kinase 3), ERBB4 (receptor tyrosine-protein kinase erbB-4), PSMC6 (proteasome 26S subunit, ATPase 6) and PTK2/FAK1 (protein tyrosine kinase 2/Focal adhesion kinase 1). Functional details of all these pathway variants are also included in Table 1. Among these nine genes, ERBB3, PSMC6 and DLG1 were already among the pipeline's prioritized top variants. Figure 4 depicts the protein interaction network for proteins corresponding to the 3 UTR variant genes and showing no pathway enrichment. Combining both of the abovementioned sets of proteins confirmed their involvement in the MAPK and ErbB pathways ( Figure 5). These included six genes that were already identified to be involved in the MAPK pathway in the 5 UTR analysis and only one 3 UTR variant gene, i.e., FGFR1 (fibroblast growth factor receptor 1), in the main core of the network.
Reactome confirmed the involvement of the MAPK and ErbB pathways (Table 3). It can be seen from the table that Reactome excludes STAT5A from the RAF/MAP kinase cascade and includes it in ERBB4 signaling but with a high false discovery rate (FDR). Additional pathway enrichment on STRING was performed by combining our gene set with the genes previously identified in our MM family study of missense and LoF variants [6] (Supplementary Figure S2). This did not highlight any new pathways, and the MAPK pathway remained the only considerably enriched pathway.
ing family B member 1-interacting protein), CAMK2D (calcium/calmodulin-dependent protein kinase type II subunit delta), ERBB3 (Erb-B2 receptor tyrosine kinase 3), ERBB4 (receptor tyrosine-protein kinase erbB-4), PSMC6 (proteasome 26S subunit, ATPase 6) and PTK2/FAK1 (protein tyrosine kinase 2/Focal adhesion kinase 1). Functional details of all these pathway variants are also included in Table 1. Among these nine genes, ERBB3, PSMC6 and DLG1 were already among the pipeline's prioritized top variants.  Figure 4 depicts the protein interaction network for proteins corresponding to the 3′ UTR variant genes and showing no pathway enrichment. Combining both of the abovementioned sets of proteins confirmed their involvement in the MAPK and ErbB pathways ( Figure 5). These included six genes that were already identified to be involved in the MAPK pathway in the 5′ UTR analysis and only one 3′ UTR variant gene, i.e., FGFR1 (fibroblast growth factor receptor 1), in the main core of the network.    Protein interaction network of genes with 3′ UTR variants generated by STRING. No pathway enrichment was found here. Due to the lack of pathway detection, the color of the nodes was randomly selected by the software and serves only for differentiation purposes.

Figure 5.
Protein interaction network of a combination of genes with upstream, 5 and 3 UTR variants generated by STRING. Proteins surrounded by violet circles represent genes with 5 UTR variants and those surrounded by red circles represent genes with 3 UTR variants; proteins with red filling belong to the MAPK pathway and those with blue filling belong to the ErbB pathway; some proteins, due to their involvement in both pathways, are filled with both red and blue. The density of the connecting lines between the protein nodes in the figure represents the interaction score, highlighting the importance of the MAPK and ErbB pathway-related proteins. Visual maps of the genetic and regulatory environment of the pipeline-prioritized top upstream and 5 UTR variants, as well as those identified by pathway enrichment analysis from Table 1, were created using the UCSC genome browser. These maps show variation sites relative to their positions in the gene (within gene promoter or enhancer/strong or weak promoter or enhancer), the number of CpG islands, conserved sequences, histone methylation marks and TFBSs in the GM12878 cell line with the help of UCSC annotation tracks (Supplementary Figures S3-S21).

Discussion
The genetic architecture of familial MM, despite progress in global MM research, remains largely elusive. Previous GWASs on MM point towards the non-coding part of the genome influencing the gene function through regulation [7,44]. With the study of non-coding variants in the WGS data from 14 MM families, we have tried to bridge a part of the knowledge gap in MM research. We identified 150 non-coding variants including 5 UTR, upstream and 3 UTR variants that segregated with MM cases among families and passed the CADD ≥ 15 criterion. These variants, when grouped based on the biological function of the corresponding genes and proteins, highlighted similar pathways that have previously been implicated by risk loci in MM GWASs [7] and familial rare germline variant investigation [6]. The highlighted pathways included the immune system, chromatin remodeling, cell cycle regulation, signal transduction and autophagy (Figure 2a,b). Further protein interaction network and pathway enrichment analyses highlighted the importance of the MAPK and ErbB pathways in the germline genetics of MM.
The prioritization of the variants was based on the segregation of the variants with MM in the families, followed by the well-established pipeline to further prioritize the variants based on several in silico prediction tools. For 5 UTR and upstream variants, several tools with different aspects of potential regulatory effects are available; however, for 3 UTR variants, most tools concentrate on the effects due to changes in miRNA binding sites. Thus, we were not able to evaluate other factors, such as the effects of the variants on the stabilization of the termination codons or enzymatic cleavage sites. Although most of the presented candidate genes are unlikely to have a causal relationship with MM, we are convinced that our data could be a valuable contribution to forthcoming, pooled sequencing efforts. Below, we discuss potential mechanisms explaining how the identified variants and genes may predispose to MM.
The importance of signal transduction pathways in MM has already been demonstrated [45]. These pathways play an important role in the interaction between MM cells and other cellular components, such as osteoclasts, osteoblasts, dendritic cells and endothelial cells in the bone marrow microenvironment [46]. Multiple signaling cascades are activated by a vital myeloma growth factor, interleukin-6 (IL-6), including the Jak-Stat, the Ras/Raf/Mek/Erk and the PI-3 kinase/Akt pathways [47]. Ras/Raf proteins also regulate the MYC gene promoter through the Raf/MAPK/MEK pathway. The involvement of the MAPK pathway in MM pathogenicity is reaffirmed in this study through functional protein interaction network and pathway enrichment analyses of variant genes on the STRING database and Reactome. A set of 150 proteins corresponding to the upstream, 5 and 3 UTR variant genes highlighted RAF/MAPK and its upstream ErbB signaling pathways. The resulting network contained 10 of~300 genes that are involved in the RAF/MAPK pathways, and 6 of 83 genes in ErbB signaling pathways. The RAF/MAPK pathway genes identified in our network were PIK3R1, DLG1, FGFR1, SPTB, APBB1IP, CAMK2D, ERBB3, ERBB4, PSMC6 and PTK2/FAK1 ( Figure 5, Table 3). Incidentally, three of these genes, ERBB3, PSMC6 and DLG1, were also among the top upstream and 5' UTR variant-related genes selected independently due to the all-round best performance in the different in silico functional analysis tools employed.
The ErbB family of proteins are receptor tyrosine kinases (RTKs). ErbB RTKs dimerize after the binding of ligands to their extracellular domains, leading to auto-phosphorylation, followed by the downstream signaling cascades [48]. One of the major signaling cascades of the ErbB family is the Ras/Raf/MAPK pathway [49]. A recent study evaluated the role of ERBB3/ERBB4 in signal transduction in mouse cells that expressed only ERBB3 and ERBB4. Upon enrichment analysis of regulated phosphoproteins with KEGG pathways, it was revealed that ErbB signaling, focal adhesion and MAPK signaling were among the top enriched pathways [50]. Signaling pathways downstream of RTKs have long been identified as therapeutic targets in different cancers [51]. MAPK activation through ERBB signaling controls key processes such as cellular growth, proliferation, differentiation, migration and apoptosis [52]. MAPK pathways mediate the signals that either promote or suppress the growth of malignant cells, and their critical role in the development of hematological malignancies, including multiple myeloma, has been demonstrated previously [47].
PSMC6 is a 26S proteasome subunit. Proteasome inhibition is an important part of therapy in MM patients since the efficacy of bortezomib was discovered some 20 years ago [53]. However, the development of resistance to bortezomib is common and it is now found that the downregulation of PSMC6 is one of the most common and validated reasons for conferring bortezomib resistance [54]. DLG1 is a multidomain scaffolding protein that plays a part in fundamental cellular pathways [55]. It also promotes the growth and survival of myeloma cells in bone marrow-independent niches by facilitating the interaction between CD28 and CD86 molecules on the cell surface. This allows the MM cells to be independent of the bone marrow microenvironment, resulting in extramedullary multiple myeloma (EMM) [56]. It is interesting to note that two other MAPK pathway-enriched genes, SPTB and PTK2/FAK1, also play a role in the development of an aggressive and rare form of EMM called plasma cell leukemia [57].
Fibroblast growth factor receptors, including FGFR1, are also members of the RTK family of receptors that play an important role in cell survival, differentiation, migration and proliferation. They have high homology with each other and bind to fibroblast growth factors (FGFs) [58]. Previously, we identified a CNV affecting genes FGFBP1 (fibroblast growth factor binding protein 1) and FGFBP2 (fibroblast growth factor binding protein 2) in a study of coding variants in MM families [6]. FGFBP1 and FGFBP2 are involved in FGF bioactivation and may affect cell proliferation and the bone microenvironment in MM. FGFR1 is the only 3 UTR variant gene that was highlighted in the MAPK pathway. FGFR mutations in different malignancies make them attractive targets for therapy. Recent studies show that the development of resistance to FGFR inhibitors is achieved through the activation of ERBB2 and ERBB3 [59]. An indirect adaptor-mediated interaction between FGFR1 and PIK3R1 (P85) also results in the activation-dependent regulation of extracellularsignal-regulated kinase (ERK) in MM cells [60].
CAMK2D is one of the isoforms of calcium2+/calmodulin dependent protein kinases. Calcium, as a second messenger, plays an important role in the development of B cells. Out of the four isoforms, alpha, beta, gamma and delta, the latter three are more widely expressed in the body, especially in lymphoid tissues, including bone marrow, and are involved as mediators in MAPK-dependent apoptosis pathways activated by calcium flux [61].
Other upstream and 5 UTR variants that were among the all-round top candidates included variants in genes related to transcription regulation, such as SP5, MDFIC, FOXJ2 and NRBF2 [62][63][64][65]. Elevated expression of SP5 has been detected in different human cancers [63] and can downregulate many WNT target genes, resulting in a decreased transcription response [66]. NRBF2 is involved in autophagy [65]. PLEKHG1 is also a top variant gene related to cell signaling [67]. Another cell signaling-related gene was HMGXB4, which is also involved in the Wnt/β-catenin signaling pathway [68]. The HMGXB4-TOM1 locus has been suggested as a myeloma risk locus at 22q13 [69].
The remaining four genes in the upstream and 5 UTR top variant gene lists were TBC1D4, ING2, AGFG1 and FNDC3B, related to protein transport, mRNA transport and adipogenesis, respectively [43].
Among the top shortlisted variants in 3 UTR, we identified genes such as LONRF1 and SGSM2, involved in protein ubiquitination and transport, respectively [70,71]. SLC35A1 and B4GALT5 are related to metabolism [72] and MARCHF8 is associated with the immune system [73]. A variant in FAM76B is also among the best-performing variants; however, the function of this gene is unknown. A search for previously recognized MM-related miRNAs [74] in our list did not prove fruitful; however, common miRNAs were present for different gene variants in different families.
All variants were specific to each family; however, for two genes, CAMK1 and ZNF236, two different variants, one in the 3 UTR and one in the 5 UTR, were prioritized in two unrelated families. CAMK1 plays a role in the G1 phase of the cell cycle, where it regulates the assembly of the cyclin D1/cdk4 complex [75]. Amplification of the cyclin D1 gene has not only been associated with multi-drug resistance in MM [76] but a polymorphism in the gene is a risk factor for t(11;14)(q13;q32) MM [77]. Regarding ZNF236, in our previous study of rare germline variants in familial MM, a missense variant in the gene was found [6]. Because of the limited knowledge of the function of this gene, it is difficult to link the potential pathogenicity of these variants to MM. The gene is believed to play a role in transcription regulation [43]. Recently, miRNA regulation for this gene was observed in a cleft palate-associated gene study, where ZNF236 overexpression was linked to cell proliferation [78].
In conclusion, we have identified new non-coding gene variants conferring a predisposition to MM in familial cases. Many of these variants are found in pathways and genes previously implicated in MM risk, and thus reaffirm the involvement of the ErbB and MAPK signaling pathways in MM pathogenicity. These results also highlight the importance and potential of the non-coding genome in the underlying mechanisms of different diseases.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/cells12010096/s1, Figure S1: Family pedigrees of the 14 families investigated in the study. Figure S2: Protein interaction network of a combination of genes with 5 and 3 UTR variants and genes with missense and loss-of-function variants generated by STRING. Proteins surrounded by green halo are missense and those surrounded by magenta are loss-of-function (LoF) variant related proteins. 5 UTR and 3 UTR variant proteins are surrounded by indigo and light red haloes respectively, Figures S3-S21: UCSC plots of top 5'UTR and MAPK pathway variants, Table S1: Summary of 5 UTR variants surviving the prioritization pipeline. Table S2: All Encode transcription factor binding at variant sites. Table S3: Jaspar TFBS difference between wild type and mutant sequence.

Data Availability Statement:
The data presented in this study are available on request from the corresponding authors. The data are not publicly available due to privacy reasons.