Rare CNVs and Known Genes Linked to Macrocephaly: Review of Genomic Loci and Promising Candidate Genes

Macrocephaly frequently occurs in single-gene disorders affecting the PI3K-AKT-MTOR pathway; however, epigenetic mutations, mosaicism, and copy number variations (CNVs) are emerging relevant causative factors, revealing a higher genetic heterogeneity than previously expected. The aim of this study was to investigate the role of rare CNVs in patients with macrocephaly and review genomic loci and known genes. We retrieved from the DECIPHER database de novo <500 kb CNVs reported on patients with macrocephaly; in four cases, a candidate gene for macrocephaly could be pinpointed: a known microcephaly gene–TRAPPC9, and three genes based on their functional roles–RALGAPB, RBMS3, and ZDHHC14. From the literature review, 28 pathogenic CNV genomic loci and over 300 known genes linked to macrocephaly were gathered. Among the genomic regions, 17 CNV loci (~61%) exhibited mirror phenotypes, that is, deletions and duplications having opposite effects on head size. Identifying structural variants affecting head size can be a preeminent source of information about pathways underlying brain development. In this study, we reviewed these genes and recurrent CNV loci associated with macrocephaly, as well as suggested novel potential candidate genes deserving further studies to endorse their involvement with this phenotype.


Introduction
Macrocephaly, defined as an occipitofrontal circumference (OFC) at least two standard deviations (SD) above the mean for a given age, sex, and ethnicity [1], affects about 2% of the general population and up to 5% of the children [2][3][4]. This phenotype may be driven by the expansion of the brain parenchyma, leading to a subgroup called megalencephaly, or be related to other conditions such as hydrocephalus or thickening of the frontal skull bone (cranial hyperostosis), unconnected to a primary brain development defect [5]. It can be present at birth (congenital) or be originated postnatally during the growth period (acquired), either occurring as an isolated feature (non-syndromic) or associated with other clinical signs (syndromic), including intellectual disability (ID)/neurodevelopmental delay, obesity, and overgrowth.
Because macrocephaly may indicate an underlying disorder, imaging exams, such as computerized tomography scan, head ultrasound, and magnetic resonance imaging can help to narrow the diagnosis, even in utero [6,7]. However, the etiology of most cases remains unknown, and frequently there is an absence of other significant clinical findings that could contribute to unravel the origin of this phenotype [4]. Nonetheless, it is known that several genetic syndromes have macrocephaly as a main feature, originating from de novo or inherited mutations, such as in Sotos and fragile X syndromes, respectively [2,3].
Most cases of macrocephaly with a known etiology are due to single-gene disorders affecting the PI3K-AKT-MTOR pathway, which directly acts on the process of brain development, including the maintenance, differentiation, and migration of neuronal progenitors, synaptogenesis, and regulation of protein translation. Other commonly affected pathways other previously identified pathogenic variants. A CNV size of 500 kb was established as threshold to reduce the number of genes to be evaluated as candidates for the macrocephaly phenotype within each variant. Genetic and clinical data of five of the recovered DECIPHER patients were previously published [30][31][32][33][34][35]. CNVs were classified following the joint consensus recommendation of the American College of Medical Genetics and Genomics (ACMG) and the clinical genome resource (ClinGen) [36].
Genes mapped to the genomic regions of the <500 kb de novo CNVs were evaluated regarding their potential to contribute to the macrocephaly phenotype, using the database PubMed (https://pubmed.ncbi.nlm.nih.gov/), looking for previous association with macrocephaly or involvement in cellular processes whose imbalance could lead to an abnormal head size, such as cellular proliferation. The Human Protein Atlas (https://www.proteinatlas.org/) was also evaluated to confirm the protein expression in the brain.
Genes and CNV syndromes associated with this phenotype were filtered among the records of the OMIM database (up to 13 June 2022) according to the following criteria: (a) containing the term "macrocephaly" or "megalencephaly", with a known molecular basis and phenotype description, or (b) containing the term "macrocephaly" or "megalencephaly", and using the phenotype mapping key "4", which identifies chromosomal duplication or deletion syndromes.
In the ClinGen and DECIPHER databases, we inspected all the curated recurrent CNVs recognized as dosage sensitive regions associated with clinical phenotypes [37,38], retrieving those in which macrocephaly is a typical clinical sign.
Further, we explored the PubMed repository to retrieve articles by using the terms "macrocephaly" or "megalencephaly", aiming to complement the list of genes and CNV syndromes. This analysis was based on the evaluation of articles published in the last five years, describing genetic findings of series of patients with macrocephaly.
The collected known genes were uploaded on the WebGestalt website (http://www. webgestalt.org/) to explore the biological pathways enriched in this set (Homo sapiens; genome protein-coding genes reference list, over-representation analysis, pathway as the functional database, crossing over data with the Reactome database).

CNVs Associated with Macrocephaly in the DECIPHER Database
We were able to retrieve information on DECIPHER about 1033 macrocephalic patients with CNV sizes ranging from 1.72 kb to 248.96 Mb (aneuploidy), with a mean size of 7.67 Mb and a median size of 1.32 Mb. For further analysis, we selected only 29 patients with de novo <500 kb CNVs ( Figure 1).
We also proposed potential candidate genes in four cases (~14%) without known macrocephaly gene within the CNV (Table 2): in one case, the detected CNV encompassed a known microcephaly gene (TRAPPC9), and in three others, the RALGAPB, RBMS3, and ZDHHC14 genes were highlighted, mainly based on an in silico analysis of their functional roles, as described in Table 2.

Literature Review of Macrocephaly Genes and Associated CNV Syndromes
We assembled a list of 341 bona fide genes whose association with macrocephaly has been previously corroborated, as present in a recognizable syndrome, or when at least more than one case was reported with a gene mutation linked to macrocephaly (Table S1). Aiming to unveil the main biological processes enriched for this set of genes, we performed an analysis on WebGestalt. As expected, the set of macrocephaly genes were enriched for development of the head, skull, and central nervous system and processes related to the cell cycle. In relation to neurogenesis, enriched processes included generation of new neurons, neuron projection, and gliogenesis ( Figure S1). Furthermore, we compiled 28 genomic loci with recurrent CNV syndromes that include macrocephaly among their clinical findings: 15 deletions, 11 duplications, one triplication, and one recurrent region [del/dup], as presented on Table 3. Eighteen CNV regions had an OMIM entry, four regions were exclusively described in the ClinGen or DE-CIPHER databases, and five regions were retrieved from the scientific literature. Seven loci were exclusively associated with macrocephaly: 1p32p31 deletion, 3q13.31 deletion, 4q32.1q32.2 triplication, 5p13 duplication, distal 7q (7q32-qter) duplication, 14q11.2 dele- We also proposed potential candidate genes in four cases (~14%) without known macrocephaly gene within the CNV (Table 2): in one case, the detected CNV encompassed a known microcephaly gene (TRAPPC9), and in three others, the RALGAPB, RBMS3, and ZDHHC14 genes were highlighted, mainly based on an in silico analysis of their functional roles, as described in Table 2.

Literature Review of Macrocephaly Genes and Associated CNV Syndromes
We assembled a list of 341 bona fide genes whose association with macrocephaly has been previously corroborated, as present in a recognizable syndrome, or when at least more than one case was reported with a gene mutation linked to macrocephaly (Table S1). Aiming to unveil the main biological processes enriched for this set of genes, we performed an analysis on WebGestalt. As expected, the set of macrocephaly genes were enriched for development of the head, skull, and central nervous system and processes related to the cell cycle. In relation to neurogenesis, enriched processes included generation of new neurons, neuron projection, and gliogenesis ( Figure S1). Furthermore, we compiled 28 genomic loci with recurrent CNV syndromes that include macrocephaly among their clinical findings: 15 deletions, 11 duplications, one triplication, and one recurrent region [del/dup], as presented on Table 3. Eighteen CNV regions had an OMIM entry, four regions were exclusively described in the ClinGen or DECIPHER databases, and five regions were retrieved from the scientific literature. Seven loci were exclusively associated with macrocephaly: 1p32p31 deletion, 3q13.31 deletion, 4q32.1q32.2 triplication, 5p13 duplication, distal 7q (7q32-qter) duplication, 14q11.2 deletion, and Xq22.3 telomeric deletion. Six loci were reported to cause macrocephaly or microcephaly with the same CNV type: 2q31.2 deletion, 10p15.3 deletion, 11q deletion, 15q11q13 deletion, 17q11.2 recurrent region (del/dup), and distal 22q11.2 duplication. A particularly interesting fact is that 17 loci (65%) exhibited mirror phenotypes: they are reciprocal deletions and duplications known to originate opposite effects on head size: 1q21.1, 4pter, 5q35, 7p22.1, 7q11.23, 8p23.1, 10q22.3q23.2, 13q31.3, 15q11q13, 15q26qter, 16p11.2, 17p13.1, 17q11.2, 17q12, 17q21.31, 19p13.13, and 22q11.2. Table 1. Description of the CNV data from the 29 macrocephalic patients with de novo <500 kb CNVs reported in the DECIPHER database and encompassed genes (in bold, known macrocephaly genes; ↓ known microcephaly genes; in red, potential candidate genes for macrocephaly).

Discussion
Understanding the mechanisms of brain growth and development underlying macrocephaly can shed light to the complex process of neurodevelopment [65]. CNVs affect up to 10% of the human genome and are mostly not deleterious [66,67]. Nonetheless, in neuropsychiatric disorders, such as autism and intellectual disability-which are commonly associated with alterations in head size-there is a notable enrichment of recurrent typical CNVs, resulting from nonallelic homologous recombination of hotspots flanked by paired low copy repeats [23,68]. In fact, copy number changes often can lead to protein imbalance of the affected genes, resulting in a pathogenic phenotype in case of dosagesensitivity [23,28,66,68]. As expected, based on theory prediction and observation in model organisms, deletions (haploinsufficiency) are more common and penetrant than duplications (triplosensitivity) for extreme developmental phenotypes [28,69]; in the investigated DECIPHER cohort, over 75% of the CNVs that met our criteria were deletions. This present study and our previous review of CNVs in microcephaly [70] generated a map of CNV loci associated with alterations in head size (Figure 2). A total of 67 loci were gathered, harboring 77 CNVs (58 deletions and 19 duplications), reinforcing the relevance of CNVs, mostly deletions, in neurodevelopmental phenotypes [28,69]. Revisiting the list of genomic loci linked to macrocephaly, compiled through examination of the scientific literature available at PubMed and other aforementioned public databases, three of these categories were discernible: (a) reciprocal CNVs leading to a mirror phenotype-15 out of the 28 (~53%) known recurrent CNVs identified in this study The phenotypes presented by reciprocal CNVs can be allocated in four major categories: mirrored (when deletions and duplications have opposite effects), identical (both deletion and duplication result in the same phenotype spectrum), overlapping (some clinical features are present in both types of CNVs), and unique (exclusive for the deletion or duplication) [68]. Despite many cases having major driver genes responsible for the main clinical features, because of the large size of the CNVs, several genes can be affected, possibly contributing to the variability of the phenotype presented through synergistic or additive epistatic effects [66,68].
Mirror phenotypes are not universal, but frequently are present at reciprocal CNVs when head size is involved; as an example, we can cite the chromosome 5q35 region. The deletion of this region, including the NSD1 gene, results in macrocephaly, one of the phenotypes of the Sotos syndrome, while the duplication leads to a microcephalic phenotype, likely due to gene dosage effect [71]. Identical phenotypes are probably a result of a disruption in the same developmental pathways, with either LoF mutations or overexpression and enhanced gene activity leading to similar clinical features due to downstream alterations [66,68].
Revisiting the list of genomic loci linked to macrocephaly, compiled through examination of the scientific literature available at PubMed and other aforementioned public databases, three of these categories were discernible: (a) reciprocal CNVs leading to a mirror phenotype-15 out of the 28 (~53%) known recurrent CNVs identified in this study present opposite head sizes depending on the CNV type; (b) CNVs associated exclusively with macrocephaly, constituting about 25% of the syndromes identified (seven cases), and (c) the same CNV type resulting in macro and microcephaly, as presented in the seven remained cases (25%). The latter category can be illustrated by the patient 412759, who carried a pathogenic ADNP intragenic deletion, leading to LoF, which has been previously associated with the ADNP syndrome, with "large head" amongst its clinical findings, as described by Li et al. (2017) [72] and Gozes (2020) [73]. On the other side, studies using animal models and mutant embryonic stem cells were able to correlate ADNP deficiency with downregulation of the homeobox gene PAX6, which has a crucial role in neuronal progenitor cells migration and differentiation in the developing brain and has already been described in association with microcephaly [74,75].
Through examination of the DECIPHER cohort, we observed that 14 of the 29 patients (~48%) presented a CNV encompassing a known macrocephaly gene that could explain the phenotype. Particularly, the gene NFIA was found to be affected by a heterozygous intragenic deletion in two patients; its haploinsufficiency is considered a main driver to the phenotypes resulting of the chromosome 1p32-p31 deletion syndrome (OMIM #613735), especially macrocephaly and intellectual disability [76].
We found a remarkably case of a conflicting phenotype in the DECIPHER patient 269967, who presented a complete deletion of the RPL11 gene. This gene encodes a protein that is part of the large ribosomal unit, and its haploinsufficiency (amid other ribosomal proteins, such as RPL5 and RPL26, that impair the processing of pre-RNAs and the subsequent maturation of the ribosomal subunits) is the most common causative mechanism of the autosomal dominant disorder Diamond-Blackfan anemia (OMIM #612562). Almost 1/3 of the individuals with Diamond-Blackfan anemia show a degree of growth deficiency, and microcephaly is one of the present craniofacial features [77]. This is no surprise, considering that RPL11 is one of the ribosomal proteins that can interfere on the TP53-signaling pathway when in a deficiency state, leading to suppression of cell cycle progression and apoptosis due to a nucleolar stress response [78]. There is a subgroup of patients carrying chromosomal rearrangements and large deletions of other gene (RPS19) who present macrocephaly instead of microcephaly as one of the reported craniofacial abnormalities [79]; however, we found no description of RPL11 LoF and overgrowth, as presented by the patient here discussed. Considering that RPL11 is the only affected gene in the deleted segment, further genetic analysis would be needed to ensure if this is the only pathogenic variant carried by this patient and establish if this variant is indeed the cause of macrocephaly.
For the assessment of potential new candidate genes for macrocephaly, we evaluated the genomic content of the remaining 14 patients who did not carry CNVs encompassing known macrocephaly genes, but still presented protein-coding genes within the affected region. From this group, we were able to further cluster them in two categories based on similarities between them.
Eight DECIPHER patients presented a CNV that encompassed a known microcephaly gene; one of these patients (DECIPHER 339955) carried a partial duplication of TRAPPC9. Although there are duplications reported in this region in the normal population (DGV database), they do not completely overlap the distal sequence of TRAPPC9, which is duplicated in this patient, mainly exon 7. TRAPPC9 LoF is associated with a rare recessive neurodevelopmental syndrome with obesity and postnatal microcephaly as the most prominent signs [80,81], the latter likely due to the role of this protein in postmitotic neurons. It acts in the vesicular protein trafficking between the Golgi apparatus and the endoplasmic reticulum, and like several genes of this group, it is related to the proper development of the nervous system. It may also play a role in the NF-κB signaling, a crucial pathway to neuronal cell differentiation and myelin formation [82,83]. Another interesting aspect is its parent-of-origin expression bias in the brain, being predominantly expressed from the maternal allele [83]. It is important to mention that the CNV data deposited in the DECIPHER is mainly based on chromosomal microarray analysis, which hampers structural evaluation of the copy number alteration. Therefore, it is not possible to determine, in case of duplications, if an intragenic or partial duplication variant is located in tandem or elsewhere in the genome. However, it is plausible to argue that, if the duplication is in tandem, in a direct or inverted orientation, a LoF effect would be expected, though further analysis is needed to corroborate this hypothesis.
Three patients harbored a CNV affecting non-OMIM genes (RALGAPB, RBMS3 and ZDHHC14), whose functions, when disturbed, could potentially lead to abnormal head size. One of them carried a partial deletion of RALGAPB, a known tumor suppressor [84], which inhibits cell proliferation and tumor growth. Studies using animal models also emphasized the importance of the RalGAP complex for neuronal development and differentiation [85], and both knockdown and overexpression of RALGAPB in mammalian cells lead to an increase in mTORC1 activity [86]. This gene has little evidence for haploinsufficiency [87], but the inactivation of the multiprotein RalGAP complex has been proposed as a causal factor for microcephaly [88]. More studies are necessary to endorse RALGAPB as a possible candidate for macrocephaly. We observed a DECIPHER patient carrying a RBMS3 partial duplication, similar to the TRAPPC9 case previously mentioned. In vitro and in vivo studies demonstrated that rbms3 inhibits cell proliferation and promotes apoptosis due to regulation of gene transcription or RNA metabolism, and its expression is reduced in several cancers [89,90]. Defects in RNA-binding proteins, such as the aforementioned, may lead to craniofacial abnormalities. Jayasena & Bronner (2012) [39] performed a study to analyze the consequences of rbms3 LoF during zebrafish development; the mutants had a variety of abnormalities when compared to the wild type, including smaller body size and craniofacial defects due to improper cartilage formation. The reported association of rbms3 with craniofacial abnormalities in animal models and RBMS3 pHaplo of 0.89 (an ensemble machine-learning model, designed by Collins (2022) [27], that reflects the probability of haploinsufficiency for autosomal genes) indicate that this gene could be a strong candidate for the macrocephaly phenotype, even though further analysis is required to validate this assumption. ZDHHC14 expression is also reduced in several cancer types, including brain tumors; induced in vitro overexpression in gastric cancer cell lines promoted cancer cell migration and cell attachment, in addition to stimulating cell invasion [91]. In vitro studies by Yeste-Velasco et al. (2014) [92] demonstrated that whereas the overexpression promoted apoptosis through activating of the classic caspase-dependent pathway, heterozygous deletion increased colony formation. Therefore, RBMS3 and ZDHHC14 are both classified as tumor suppressor genes, and the CNVs identified in these DECIPHER patients have a probable LoF effect (intragenic duplication and entire gene deletion, respec-tively). Considering that the reduced expression of these two genes is reported to increase cell proliferation and/or inhibit apoptosis, they are interesting candidates and additional studies are required to provide functional support to their potential causal correlation with macrocephaly.
Finally, we found two CNVs encompassing non-coding genes that are worth mentioning. One of them was a deletion including only part of the sequence of the long intergenic non-protein coding RNA 1162 (LINC01162), mapped to 7p15.3 (patient 288535); however, despite being a validated lncRNA, there is no information regarding its function. The second CNV was an 8p23.1 duplication (patient 314265), which encompassed three protein coding genes, including part of the known microcephaly gene MCPH1, and the full sequence of its antisense lncRNA (MCPH1-AS1), a validated gene with high expression in brain tissues [93]. It is difficult to anticipate the potential impact of CNVs harboring lncRNAs; both variants were classified as likely benign according to the clinical guidelines.
Studies focusing on pathogenic variants disrupting the mechanisms that control head size are an extremely important source of information about biological pathways underlying these processes. This study reviewed the genes and CNV loci previously associated with macrocephaly in the literature as well as suggested novel potential candidate genes deserving further evaluation.
Supplementary Materials: The following supporting information can be downloaded at: https://www. mdpi.com/article/10.3390/genes13122285/s1, Table S1: Genes with a recognizable association with macrocephaly, retrieved from OMIM and the scientific literature; Figure S1: Enriched biological processes of the gene set related to macrocephaly.

Data Availability Statement:
This study makes use of data generated by the DECIPHER community. A full list of centers who contributed to the generation of the data is available from https://deciphergenomics.org/about/stats and via email from contact@deciphergenomics.org. Funding for the DECIPHER project was provided by Wellcome. Those who carried out the original analysis and collection of the data bear no responsibility for the further analysis or interpretation of them.