The Polygenic Nature and Complex Genetic Architecture of Specific Learning Disorder

Specific Learning Disorder (SLD) is a multifactorial, neurodevelopmental disorder which may involve persistent difficulties in reading (dyslexia), written expression and/or mathematics. Dyslexia is characterized by difficulties with speed and accuracy of word reading, deficient decoding abilities, and poor spelling. Several studies from different, but complementary, scientific disciplines have investigated possible causal/risk factors for SLD. Biological, neurological, hereditary, cognitive, linguistic-phonological, developmental and environmental factors have been incriminated. Despite worldwide agreement that SLD is highly heritable, its exact biological basis remains elusive. We herein present: (a) an update of studies that have shaped our current knowledge on the disorder’s genetic architecture; (b) a discussion on whether this genetic architecture is ‘unique’ to SLD or, alternatively, whether there is an underlying common genetic background with other neurodevelopmental disorders; and, (c) a brief discussion on whether we are at a position of generating meaningful correlations between genetic findings and anatomical data from neuroimaging studies or specific molecular/cellular pathways. We conclude with open research questions that could drive future research directions.


Introduction
Specific Learning Disorder (SLD) is a complex disorder with varying manifestations and considerable differences in interpersonal characteristics, albeit present worldwide. According to DSM-5 and the National Joint Committee on Learning Disabilities (NJCLD), SLD is a general term that refers to a group of disorders [1][2][3], which may involve difficulties in reading (dyslexia), written expression (dysgraphia) and/or mathematics (dyscalculia), albeit not accounted for by low intelligence (IQ), sensory acuity (visual problems), poor learning opportunities, or developmental delay (e.g., intellectual disability). Learning disabilities may co-occur with the aforementioned impairments, but are not the result of these conditions [1,4].
The prevalence of SLD varies between 3-12% among the general population, depending on factors such as stringency of measurement cut-offs used for identification [5][6][7], country and level of phonological transparency of the spoken language, sex (male:female ratio 2-3.7:1) [8][9][10], age of assessment, different theoretical perspectives as regards causality, and assessment tools criteria used [6,11]. DSM-5 describes SLD as a neurodevelopmental disorder with a biological origin, which includes an interaction of genetic, epigenetic, and environmental factors. SLD is readily apparent in the early school years in most individuals; symptoms are usually detected when students show a learning profile which is qualitatively lower than their chronological and mental age. However, in some cases, difficulties may become obvious at a later age, when the academic demands rise and exceed Table 1. Earlier studies  presenting evidence for association of genomic loci with SLD and/or related traits.

Linkage Screens in Pedigrees
A significant number of dyslexia candidate genes were identified through linkage studies in pedigrees. Reports on familial aggregation of dyslexia, characterized by an autosomal dominant inheritance pattern, continue to become published. These newer reports use a modern approach which combines traditional chromosomal mapping, using dense SNP-based-rather than microsatellite-based-genome-wide genotyping and linkage analysis, coupled with genome-wide gene expression and NGS technologies.
For instance, in 2017, Einarsdottir et al. reported the identification of NCAN (19p13), a putative novel dyslexia susceptibility gene. It is important, with this example, to highlight that, with the advent of new technologies of greater analytical potential, previously reported families with a clearly defined phenotype, but without a specific genetic diagnosis, can be revisited to offer novel findings. This dyslexia pedigree of Finnish origin, with eight affected cases across three generations [55], was anew subjected to genetic analysis using genotyping and linkage methods, in concert with next-generation whole-exome sequencing (WES) [69]. NCAN is expressed in several tissues, including several brain areas ( Figure 1); its expression was significantly correlated with that of two other dyslexia candidate genes, namely GRIN2B and KIAA0319 (Table 2).
An impressive three-generation pedigree of Indian origin was reported by Naskar et al. in 2018; all alive individuals from generation II (n = 3), all of their offspring in generation III (n = 7) and almost all, but two, of the offspring in generation IV (n = 7) were affected with dyslexia in a pattern compatible with autosomal dominant inheritance. Genome-wide SNP genotyping combined with WES revealed several variants in the protocadherin gamma (PCDHG) gene cluster (5q31.3) which encodes for alternative PCDHG transcripts owing to a large number of alternative 5 exons. All identified variants clustered in the variable 5 exons, which encode for the extracellular protocadherin domain. Protocadherins are predominantly expressed in the developing human brain and are known to play a role in neuronal connectivity, thus ensuring formation and maintenance of neural circuits [70].
One of the latest reports of this kind is by Grimm et al., who identified a novel dyslexia-associated gene, namely SPRY1 (4q28), after studying six out of nine affected individuals across a four-generation pedigree of German origin [71]. SPRY proteins are negative regulators of the Ras/MAPK/ERK pathway but, even though the authors showed that SPRY1 is expressed in all brain regions, it was not possible to explore the status of mutant SPRY1 expression in affected cases [71].

Candidate Gene Association Studies
There are two types of candidate gene association studies that have been published during the last two decades regarding genes that underlie genetic susceptibility to reading and mathematical abilities and disabilities. The first approach aims to explore established SLD genes in case-control cohorts of various ethnic origins, typically of Caucasian ancestry. The other approach aims to examine whether genes previously associated with reading and/or mathematical abilities in the general population can be valid in the context of an SLD diagnosis. Table 2 provides an updated list of past and recent publications that followed this study design, while summarizing their major findings.
For the purposes of this review, it is worth highlighting relatively recent studies that employed large sample sizes or were carried out by multicentered cross-linguistic initiatives. For instance, the European consortium NeuroDys performed a cross-linguistic case-control association study of dyslexia with data from more than 950 dyslexic individuals using targeted genotyping of selected markers in DYX1C1, DCDC2, KIAA0319, and the MRPL19/C2orf3 locus [72]. No SNP or haplotype surpassed statistical significance level, and none was associated with dyslexia in samples from more than one population, including populations speaking the same language (e.g., German). This may be potentially explained by differences in diagnosis between countries, genetic architecture heterogeneity among different populations, missing analysis of relevant traits, insufficient power due to the phenotypic heterogeneity of the disease, or combinations of the above [72].
In 2016, Müller and coworkers analyzed 16 SNPs in five genes affecting reading and spelling in the general population, in a German dyslexia case-control cohort [38]. On a single-marker level, no associations survived correction for multiple testing, but the observed risk alleles in KIAA0319 were in agreement with associations from both the Brain Sci. 2021, 11, 631 6 of 43 general population, as well as other dyslexia studies [72]. No gene-specific haplotypes were associated with dyslexia in KIAA0319, DYX1C1, or DCDC2. When performing polygenic analysis, an increased number of risk alleles was observed within dyslexic cases compared to controls. The authors also demonstrated by in silico analyses on publicly available eQTL data that the SNPs residing in DCDC2, KIAA0319 and DYX1C1 affect the corresponding genes' expression, as well as the expression of a gene in the vicinity of DCDC2, namely MRS2 [38].
In the study of Sánchez-Morán et al., the authors explored associations of three variants, one in each of three established dyslexia genes in 286 dyslexic children versus 1197 controls. Again, no single-marker association reached statistical significance, but pairwise SNP interaction between rs2274305 in DCDC2 and rs4504469 in KIAA0319 showed significant association with dyslexia as well as with dyslexia plus comorbid ADHD. In addition to the case-control design, these candidate SNPs were also associated with cognitive traits in the general population (n = 3357): rs2274305-DCDC2 with phoneme awareness (PA) and rapid automatized naming (RAN), DYX1C1 with word-reading and RAN, and rs4504469-KIAA0319 with word-reading, RAN, and syllable discrimination [73]. DCDC2 and KIAA0319 reside on the same locus, yet they are not in linkage disequilibrium; this points to independent, but synergistic, association, since a DCDC2 risk haplotype interacts synergistically with a KIAA0319 haplotype, conferring higher risk in reading disability when both risk haplotypes occur together rather than separately [74].
A large cohort of more than 1500 unimpaired individuals was recently analyzed for genetic variants across 14 genes previously associated with dyslexia. Doust et al. performed gene-set-based analysis for reading impairment candidate genes and for the Gene Ontology biological pathway genes for 'axon guidance' and 'neuron migration'. The lack of replication of previous associations in this carefully characterized, yet unselected for SLD/dyslexia, cohort could be true or could be attributed to a number of other reasons: lack of statistical power to detect variants of small effect size, despite being one of the largest cohorts analyzed for reading abilities thus far, or sampling bias owing to participants' recruitment from a twin registry [75].
The abovementioned studies are used as examples to illustrate that despite their undoubtedly careful design, statistically significant associations were still not reached or were, at best, nominal. Improvements, such as the incorporation of much larger numbers than past candidate gene studies and the recruitment of extremely carefully scrutinized participants across a number of reading and mathematical traits, left much of the genetic susceptibility puzzle of a common disease-common variant hypothesis unanswered. The field had to move on to hypothesis-free approaches; this advancement is reviewed in the following section.

Genome-Wide Association Studies (GWAS) and Polygenic Risk Scores (PRSs)
GWA studies are not hypothesis-driven, unlike candidate gene association studies that are designed with specific questions in mind, interrogating particular genes or genomic loci implicated in specific molecular pathways or biological processes hypothesized to be involved. Nevertheless, GWAS proved less successful than originally expected in helping to pinpoint SLD susceptibility loci, partly owing to the heterogeneous dyslexia phenotype and diagnostic/recruitment criteria used or to the small sample numbers analyzed compared to other neurodevelopmental/psychiatric phenotypes. Small sample sizes confer low detection power for common variants with small effect sizes, especially considering the stringent statistical correction for multiple testing over hundreds of thousands or millions of variants that needs to be taken into account. To compensate, genome-wide screening of the general population for DNA variants associated with reading, arithmetic and language abilities as heritable traits attracted intense research interest; these were viewed as "intermediate phenotypes", or quantitative traits acting as endophenotypes, determined by a genetic background that potentially also underlies SLD etiology.
Reading skill as a quantitative trait was explored for the first time by applying a GWAS approach using the extremes of its continuous distribution. Two groups, low versus high reading ability, comprising a total sample of 1500 children, were genotyped using a low-density SNP microarray (~100 k). Top candidate SNPs showing the largest allele frequency differences between extreme-ends groups were validated in an independent sample of 900 age-matched children. Of those, ten SNPs showed nominally significant association with continuous variation in reading ability [106]. Since this seminal effort, a significant number of studies have been conducted, several of which focused on variants with pleiotropic effects in both reading and language traits (Table 3) [107][108][109]. We believe that the most recent one deserves highlighting for two reasons. First, the authors studied reading disability predictors, namely RAN and rapid alternating stimulus, in a sample of more than 1300 Hispanic-American and African-American young individuals. Second, they found, for the first time in a GWAS design, genome-wide significance for a variant located on the upstream region of a long non-coding RNA (lncRNA) gene, namely RPL7P34, 30kb upstream of RNLS (10q23.31). It was suggested that this variant resides on an enhancer element that potentially interacts with an active RNLS transcription start site in the hippocampus, owing to chromatin's three-dimensional structure. The variant was further associated with structural variation (cortical volume) in the right inferior parietal lobule of an independent multi-ethnic sample [110]. Currently, it remains largely unknown how non-coding regions of the genome may impact reading traits; the identification of variants in gene regulatory regions, as recently demonstrated for ARHGEF39 in SLI [111], or the role of post-transcriptional (e.g., miRNA-based) regulation of gene expression, is undoubtedly an exciting new field of research.
Coming to the context of dyslexia, one of the first GWAS, albeit of a very small scale in comparison to current standards (200 cases for discovery and 186 for replication, tested for a limited number of markers (300k)), identified rs4234898 on chromosome 4 as a transacting regulatory variant for SLC2A3 which resides on chromosome 12. SLC2A3 codes for a glucose transporter in neurons, and its reduced expression in lymphoblastoid cell lines was shown to be significantly associated with the minor rs4234898 allele. It was suggested that SLC2A3 might act as a susceptibility gene for an electrophysiological endophenotype in dyslexic children with glucose transport deficits, namely mismatch negativity (MMN) or mismatch response. MMN serves as a measure for speech perception and automatic speech deviance which has been found impaired in dyslexic children [97]. This mismatch response endophenotype was later shown to associate with common variants in DYX1C1 [112], unlike common variants in DCDC2 and KIAA0319 [113].
The largest GWAS for dyslexia-specific traits was recently published, with data generated for almost 3500 reading-impaired and typically developing children of European ancestry from nine countries speaking six different languages. Genome-wide significance was observed with RAN for four variants on 18q12.2, within MIR924HG (rs17663182), and a suggestive association on 8q12.3 within NKAIN3. It is of note that MIR924 is predicted to regulate candidate dyslexia susceptibility genes like MRPL19 and KIAA0319L, as observed via in silico analysis of putative miR-924 binding sites [114]. The same group performed a polygenic risk score (PRS) analysis between eight reading traits and different neuropsychiatric disorders (ADHD, ASD, major depressive disorder and schizophrenia), educational attainment, and neuroimaging phenotypes (seven brain areas) and found a significant genetic overlap between some of these reading traits and educational attainment and, to a lesser extent, with ADHD [114]. This initiative led to an even larger dyslexia case-control GWAS of almost 2300 cases and 6300 controls, a subset of which overlapped with the same authors' 2019 paper [26]. No novel genome-wide significant associations emerged at single-marker level; gene-based analysis from the top SNP association signals revealed VEPH1 (3q25) as a top candidate gene, but no specific pathways showed significant enrichment [26].
Actually, the first study assessing the reading ability of non-dyslexic children and adolescents with the use of PRS analysis was published in 2017. The authors in this study utilized GWAS data from >5800 cases and used educational attainment (=years of education completed) to predict reading performance in English. They calculated a PRS-heritability estimate of reading ability of almost 5%, based only on common variants. This estimate represents approximately 7% of the total heritability for reading ability (h 2 = 70%; 5%/70%) evaluated through twin studies [115]. However, if calculating the PRS-heritability estimate using an SNP-heritability estimate, which was shown to account for 22% of the total genetic variance [116], then the PRS-heritability estimate can explain a significant 23% (5%/22%) of the genetic variance observed for reading ability, an estimate that remained significant after accounting for age-specific cognitive ability and family socioeconomic status [115].
The use of PRSs is a rather young addition to the armor of (statistical) tools to evaluate the genetic component of complex traits, even more so for complex cognitive skills like reading performance; yet, we can already foresee its potential. Given its inherent nature (as DNA variants do not change by age), knowing the individual genetic differences in reading ability perhaps may prove useful in the early prediction of reading problems like dyslexia. This will require large multicentered initiatives of tens of thousands of participants. However, because language transparency is an important issue in assessing dyslexia, perhaps large GWAS with participants using the same language would be powerful enough to explore the applicability of PRS further, an approach already tested by Gialluisi et al. in their 2019 analysis [114].
The first GWAS study conducted to exclusively assess mathematical ability and disability was published ten years ago; two groups of children from the Twins Early Development Study, with high versus low mathematical ability (600 individuals per group), served as the discovery cohort, and 2356 individuals, spanning the entire distribution of mathematical ability, were used for validation purposes. Out of 10 top candidate SNPs, rs11225308 (MMP7), rs363449 (GRIK1), and rs17278234 (DNAH5) were the variants most significantly associated with mathematical ability. Because the effect sizes of these 10 SNPs were small, the authors created an 'SNP-set score' for each of the 2356 individuals, which accounted for 2.9% of the variance in their sample [68]. In fact, by using this SNP-set score, it was shown that one third of children who harbored ≥50% of the identified risk alleles were nearly twice as likely to be in the lowest-performing 15% of the mathematical ability distribution [68]. This score was later correlated with certain environmental factors, demonstrating likely gene × environment interactions [117].
Subsequently, in a sample of almost 700 dyslexic cases and more than 1400 controls, available GWAS data were reanalyzed to associate genetic variation specifically with dyscalculia. The authors found rs133885 in MYO18B to be strongly correlated with mathematical abilities in the dyslexia sample and, to a lesser extent, the general population. A significantly lower depth of the right intraparietal sulcus, an anatomical brain region involved in numerical processing in humans, was associated with rs133885 [118]. However, this association was not supported in the subsequent analysis of a much larger collection of 5144 individuals from four cohorts of European ancestry, 329 of which were diagnosed with dyslexia [119]. A third GWAS aiming to explore the genetic contributions to mathematical ability was conducted in a general population sample of 602 adolescents/young adults with excellent verbal ability but either high or low mathematical ability. The marker with the largest effect size was rs789859, located in the promoter of FAM43A and in high linkage disequilibrium with two SNPs in the adjacent LSG1 gene (3q29), a region previously linked to learning difficulties and autism [120]. Although the encoded protein's function remains obscure, FAM43A was found expressed in the brain, cerebellum and spinal cord [120].
One GWAS was conducted exclusively on the purpose to assess mathematical ability in the general population of Chinese elementary school students in 2017. Two discovery and one replication groups were used, totaling almost 1600 individuals. Sample metaanalysis revealed four linked SNPs in SPOCK1 associated on a genome-wide significance level with a decrease in math scores on two examination periods [121]. Interestingly, mutations in SPOCK1, which encodes for the extracellular proteoglycan testican-1, have been associated with ID and microcephaly in humans, whereas Spock1 mouse models have demonstrated strong gene expression in the brain as well as its role in neurogenesis [121].
By now, it has become clear that because GWAS are designed to target common variants, often in non-coding, regulatory or even intergenic regions, they do not necessarily directly reveal the true effect of likely pathogenic variants, as it would be expected in the case of rare coding variants. On the other hand, initial genome-wide genotyping platforms were designed based on Caucasian genome frequencies and most of what we currently know about reading and mathematical abilities and disabilities originates from studies of individuals of Caucasian ancestry, despite the fact that SLD affects populations globally and irrespective of language. Thus, we are largely unaware of the genetic architecture of SLD across populations and ethnic ancestries. GWAS, despite setting the grounds for unbiased genome-wide interrogations, most often than not, have returned results that could be hardly replicated. This has been attributed either to small effect sizes of common variants, especially for quantitative traits such as reading-associated traits, small sample sizes to reveal statistically powerful associations or even to lack of consensus in SLD diagnosis. Hence, alternative yet complementary methods, as those described in the next paragraphs, have significantly contributed in the delineation of the genetic architecture of SLD during the last years.

Copy-Number Variants (CNVs)
Part of the missing heritability of SLD may be also caused by structural variants. CNVs have been extensively explored in other neurodevelopmental disorders, such as ASD, ID [122][123][124], Tourette Syndrome [125,126], and SLI [127]; results for SLD have been inconclusive. On one hand, recent analyses of dyslexia cohorts indicate that rare, large CNVs may not confer a significant burden [122,128]. On the other hand, rare de novo or inherited deletions or duplications, such as the Xq21.3 region bearing PCDH11X [129], 17q21.31 harboring NSF [130], and 15q11.2(BP1-BP2) harboring four highly conserved genes (Table 3) [43,44], have been reported in cases with SLD. Earlier, a father and his three affected sons were found to carry a submicroscopic deletion (at least~176 kb) on 21q22.3, encompassing the 3 region of PCNT, genes DIP2A and S100B and the 5 upstream sequence of PRMT2. The deletion perfectly segregated with dyslexia and standard scores for phonological decoding and single-word reading of below −1.5 to −2 standard deviations [65]. As described later (Section 3.3), a non-coding variant in S100B was also associated with spelling performance in a German family set [102].
Different loci have been found to harbor deletions and duplications in patients with various clinical presentations and comorbid math comprehension difficulties. Children with the 22q11.2 deletion syndrome show considerable difficulties in procedural calculation and word problem solving due to difficulties in understanding and representing numerical quantities, despite relatively normal reading performance [131]. A 22q11.2 deletion spanning LCR22-4 to LCR22-5 interval was found in an 11-year-old girl with normal intelligence, number sense deficit, normal results in spelling and reading tests and social contact difficulties [132]. A severely affected girl with X-linked myotubular myopathy and math difficulties was found to carry an inherited 661kb Xq28 microduplication with a skewed X chromosome inactivation pattern [133]. If we exclude syndromic cases, reports on individuals presenting exclusively with mathematical impairments who bear rare or novel de novo or inherited CNVs are truly scarce. An increase of CNVs of the Olduvai protein domain on 1q21 (NBPF15), previously known as DUF1220, appear to be involved in human brain size and evolution and may determine the mathematical aptitude ability of both sexes [134]. This genetic locus is highly expressed in brain regions with high cognitive function [135], but it has not been studied in the context of mathematical disabilities.
Last but not least, a recent study from the Icelandic population investigated the effect of 15q11.2(BP1-BP2) deletion in cognitive, structural and functional correlations of dyslexia and mathematical disabilities. This CNV was previously associated with cognition deficits in non-neuropsychiatric cases with a history of SLD [43]. Later, Ulfarsson et al. showed that the deletion conferred high risk in either dyslexia or dyscalculia, but the risk was even higher in the combined dyslexia plus dyscalculia phenotype; all deletion carriers performed worse on a battery of tests assessing reading and mathematical abilities. In the same sample, structural magnetic resonance imaging (sMRI) and functional MRI (fMRI) were performed, demonstrating that smaller left fusiform gyrus and altered activation in the left fusiform and left angular gyrus also associated with the 15q11.2 deletion [44]. These brain areas are involved in the retrieval of mathematical facts, the usage of learned facts and the performance of arithmetic operations [136][137][138]. This anatomical and functional brain differentiation could be one cause of the greater risk observed for the combined phenotype in deletion carriers.
Either de novo or transmitted, these structural variations may produce a yet unknown spectrum of disturbances on genomic, transcriptomic and proteomic level, for instance haploinsufficiency in the case of deletion or overexpression in the case of duplication [139,140], consequently also affecting subsequent protein-protein interactions; these are hypotheses that warrant further investigation. Interestingly, the 15q11.2(BP1-BP2) duplication carriers do not show significant cognitive impairments, compared to 15q11.2(BP1-BP2) deletion carriers, and are comparable to no-CNV controls [44]. This fact supports the role of haploinsufficiency for the genes mapped on this region, particularly CYFIP1, which was shown to be involved in neuronal development [141].

Next-Generation Sequencing
It is unclear how much of the missing heritability of SLD could be attributed to rare or de novo variants of moderate or high effect, even though this issue has been extensively studied with respect to ID, ASD and developmental delay [142][143][144]. With the emergence of NGS technology, the identification of rare variants could help fill in some of the missing pieces of the puzzle. Sequencing data have only recently begun to emerge for SLD, supporting the influence of certain genomic regions on reading performance and related disabilities. As expected, the first efforts concentrated and sources were allocated on the validation of previously established or suspected dyslexia genes in various populations.
Originally mapped through a submicroscopic deletion on 21q22.3 in a dyslexia family [65], S100B was one of 11 genes to be scrutinized for rare variants using targeted NGS in more than 900 dyslexia cases from Finland and Germany; a 3 UTR variant (rs9722), located on or adjacent to in silico predicted miRNA target sites, was associated with spelling performance in the German family set. Moreover, a nonsynonymous variant in DCDC2 (rs2274305) was associated with severe spelling deficiency in the same sample set [102]. A similar approach was applied to a subsequent next-generation targeted sequencing effort by Adams et al., who selected dyslexia-associated candidate genes to be screened in 96 affected, unrelated subjects of European ancestry from the Colorado Learning Disability Research Center (CLDRC). These cases were selected based on a CLDRC-derived discriminant score indicating impairment in reading ability [145]. The authors searched for rare, likely disrupting, variants and calculated a statistically significant increase in the frequency of observed mutations in dyslexia cases-compared to data from 1000 Genomes Project-in two loci: 7q32.1 harboring the adjacent genes CCDC136 and FLNC (19 missense variants) and 6p22 harboring DCDC2 and KIAA0319 (74 missense variants). The data indicate that these regions must have an influence on reading performance, even though not all of the above-mentioned genes show detectable expression in the brain (Figure 1) [145].
The first whole-exome sequencing (WES) study was published in 2015 by Einarsdottir et al. in an effort to identify the genetic basis of a familial form of dyslexia with likely complete penetrance in an extended three-generation pedigree with 12 confirmed dyslexic and four uncertain cases. Through several filtering steps on WES data, a small heterozygous in/del variant was identified in CEP63, namely c.686-687delGCinsTT; its transmission was compatible with autosomal dominant inheritance. This rare variant codes for a nonsynonymous change in a highly evolutionarily conserved amino acid (p.R229L), which was in silico predicted to alter the protein's tertiary structure [146]. As discussed later (Section 6), CEP63 is a centrosomal protein involved in microtubule organization and, even though it is ubiquitously expressed (Figure 1), brain-specific isoforms may be affected by such rare variants. It still remains to be seen whether CEP63 variants are linked to dyslexia in additional cases.
Several other reports have also demonstrated that dyslexia-associated genes encode proteins with structural and functional roles in cilia (Section 6) [147][148][149][150][151][152][153]. Recently, rare variants were identified in two genes related to motile cilia structure and function, namely dynein axonemal heavy chain 5 (DNAH5) and dynein axonemal heavy chain 11 (DNAH11). This represents the first whole-genome sequencing (WGS) analysis in literature of two unrelated dyslexia cases, with situs inversus and ADHD symptomatology [154]. Even though direct links between visceral and functional brain asymmetry are lacking, visceral asymmetry (e.g., situs inversus) is comorbid, at least in some cases, with psychiatric and neurodevelopmental disorders [155]. Although it could not be proven unequivocally that the identified variants in DNAH5 and DNAH11 cause susceptibility to dyslexia, these two genes represent good candidates for further studies.
Overall, the most recent studies that have used state-of-the-art methodology to look for either likely pathogenic CNVs or rare variants in isolated families have provided clues for the implication of novel genes. Family-based studies continue to be a powerful method to unravel the genetic basis of dyslexia [146]. However, variations in reported loci do not explain, so far, but a small percentage of the genetic component of SLD. Consequently, much of the heritability of learning-related disorders remains unaccounted for. Perhaps the answer is not "hiding" exclusively in single, rare variants that remain yet to be identified, but also in gene × gene and higher-order chromatin interactions or epigenetic regulatory mechanisms and ways that the environment can determine the (epi)genome [156]. It is of note that epigenome-wide association studies have not been reported yet.  Reading ability (word reading) LINC00935 and CCNT1 GWAS (case-control) [157] Mathematical abilities MYO18B GWAS (case-control) [118] Mathematical abilities rs789859 intergenic to LSG1 and FAM43A (3q29) GWAS (high versus low mathematical ability) [120] Mathematical abilities SPOCK1 GWAS (meta-analysis) [121] SLI: specific language impairment, GWAS: Genome-Wide Association Study, WES: whole exome sequencing, CNV: copy number variant, SNP: single nucleotide polymorphism.

Comorbidity and Genetic Correlation with Other Neurodevelopmental Phenotypes
Since the "generalist genes" hypothesis was proposed [41], it has become common ground, and recent emerging evidence also supports, that neurodevelopmental disorders share, to a certain extent, a common genetic background. High-impact studies support the pleiotropic or even antagonistic actions of genes and their variation on complex phenotypes, with a particular focus on psychiatric disorders [158,159]. Cross-disorder analyses aim at identifying transdiagnostic variants that could point eventually toward common underlying traits (e.g., cognitive, imaging), molecular pathways, and even symptoms or environmental risk factors [160]. Pleiotropy is mainly manifested via loci harboring genes that show brain-specific expression; thus, these genes are expected to be particularly important in neuronal development, with potential implications for better disease classification and management or future treatment interventions. Prominent examples in the field include schizophrenia and bipolar disorder [161], ASD and ADHD [162,163], Tourette Syndrome (TS) and Obsessive-Compulsive Disorder (OCD) [164,165], and, more recently, OCD and anorexia [166], or TS and ADHD/ASD [167].
As highlighted in the introduction, individuals with SLD show symptoms of ADHD, SLI, or other conditions, but it remains unclear whether these comorbid with SLD or are secondary problems deriving from the impairments caused by SLD. Reading and language are both viewed as highly heritable traits that are likely to share common genetic and/or neurobiological influences [168]. Shared genetic contributions between reading and language performance have been explored in several studies using candidate gene association analyses or GWAS meta-analysis [101,103,108,109]. For instance, Luciano et al. found strong associations with variants in 21q11.2 (ABCC13 pseudogene), 19p13.3 (DAZAP1), 1p36.33 (CDK11B, CDK11A) and 1p36.11 (RCAN3) [108]. Gialluisi et al. identified suggestive associations in 7q32.1 (CCDC136/FLNC) and 22q12.3 (RBFOX2) [109]. Others failed to find supportive evidence [103].
As mentioned earlier, in their latest report, Gialluisi et al. interrogated GWAS data from a very large sample of dyslexic cases and controls and apart from identifying VEPH1 (3q25) as the top candidate gene, their analysis highlighted the association of dyslexia with ADHD, and an even stronger association with intelligence, bipolar disorder and schizophrenia [26], further supporting the notion of cross-disorder susceptibility between psychiatric and neurodevelopmental phenotypes. Of course, the hypothesis of a shared genetic background between dyslexia and ADHD, which occurs in approximately 25-40% of dyslexic individuals [169], has been a subject of extensive study. Comorbid cases exhibit more extensive and severe neuropsychological weakness and symptoms manifestation [170,171]. It was also shown that the heritability of reading disabilities was significantly higher in dyslexic individuals who also met criteria for ADHD [171]. Numerous recent studies support the SLD-ADHD common etiology hypothesis: Field et al. reported common loci implicated in both dyslexia and ADHD [67]. Mascheretti et al. found evidence for a DCDC2 SNP (rs793862) via gene × gene interaction with KIAA0319 with hyperactivity/impulsivity, a finding replicated in two independent samples [172], that was soon after also reported for the inattentive subphenotype [73].
Taking a step further, Verhoef et al. interrogated ADHD-related PRSs in relation to reading-related abilities in a large sample of children (~6000 individuals) from the UK Avon Longitudinal Study of Parents and Children (ALSPAC) in an effort to find evidence for shared genetic factors between ADHD and reading. Notably, polygenic ADHD risk was associated not only with reading but also with language-related abilities, further strengthening the hypothesis of shared genetic etiology between reading, language and ADHD [173]. In a GWAS study of~2300 dyslexia cases and~6300 controls, PRS analysis highlighted anew the correlation of ADHD with dyslexia and an even stronger association of dyslexia with two psychiatric disorders (schizophrenia and bipolar disorder) [26]. Price et al. performed a similar analysis starting from a GWAS on two children's cohorts (~5250 individuals) aiming to explore the genetic architecture of reading; they used PRS from publicly available datasets on neurodevelopmental and psychiatric disorders and found a statistically significant association between ADHD and reading, as well as an overlap of 22 reading-associated genes previously implicated in ASD [157]. In fact, the relationship between dyslexia and ASD has not been extensively studied and data on the prevalence of ASD in cohorts ascertained for reading disabilities are most likely nonexistent [174].
Despite preliminary evidence, however, it is too soon to say whether the observed shared genetic susceptibility between dyslexia and ADHD can be also reflected in brain's disease-related anatomical structures and functional alterations. In two recent sMRI metaanalyses on grey matter differences in isolated ADHD versus dyslexia, no shared neural correlates were found [175,176]. On the other hand, when ADHD and dyslexia coexist, alterations (decreased cortical thickness) can be observed in brain regions relevant for both disorders, supporting the common etiology hypothesis; the same can be said for comorbid cases who exhibit reduced brain activity (during fMRI tasks) in regions associated with deficits in either isolated ADHD or dyslexia [176].
In Table 5 we provide the updated list of genes that have been, so far, implicated in different SLD domains, along with basic information on their biological role (Section 6). In parallel, we indicate which candidate SLD genes have shown association with other neurodevelopmental disorders, as curated in public databases (e.g., SFARI Gene database; [177]) and in literature.

Emerging Data from Neuroimaging Genetic Studies
Brain scans using modern technologies have provided ground-breaking insights into the workings of the human brain. Various MRI techniques have been most popularly used to visualize and explore: (a) structural abnormalities [e.g., cortical surface area (cSA) and cortical thickness; grey matter (GM) and white matter (WM) density and volumes] (sMRI), (b) alterations in structural connectivity between brain areas (DTI), and (c) functional abnormalities either in resting state or while performing (a) task(s) (reading-related, phonological, auditory, semantic, working-memory, visual-spatial, attentional, mixed) (fMRI).
Dyslexia has been associated with various anatomical and functional changes in the brain. In brief, total brain volume, GM and WM volume, total intracranial volume, cortical thickness and cSA, global and local brain asymmetries, level of gyrification, and to a lesser extent sulci configuration, have been under intensive research, not necessarily reaching an agreement regarding how these global brain measures are affected in dyslexia [15]. Regarding brain activity alterations, fMRI analyses show that cerebral hypoactivation seems to prevail over hyperactivity [37,178].
Interestingly, alterations seen in pre-reading children at risk for dyslexia are in agreement with results from children diagnosed with dyslexia [37]. This favors the idea that atypical brain development likely associated with dyslexia could be present within the first years of life and that dyslexia deficits may result from altered structural connectivity [179]. Moreover, faster WM development was observed in good versus poor readers from prereading to beginning-to-read and to fluent-reading stages, as well as a positive association between WM maturation and reading development [180]. Such data from neuroimaging studies in infants and pre-reading children, in concert with the high heritability estimates for reading abilities and disabilities, could suggest that dyslexia susceptibility genes may be involved in atypical neural migration and/or axonal growth during early (even in utero) brain development.
In the recently published, massive neuroimaging genetics meta-analysis study of the ENIGMA Consortium, it was shown that general cognitive function and educational attainment are the two cognitive traits that exhibit the most significant positive genetic correlation with cSA. According to the radial unit hypothesis, the expansion of cSA is driven by the proliferation of neural progenitor cells. Common variants explained 34% of the variation in total cSA; importantly, these variants have been associated with altered gene regulatory activity in neural progenitor cells during fetal development [181]. However, no GWAS and sMRI data from learning (dis)abilities and/or dyslexia studies were used in this meta-analysis, presumably because ENIGMA does not host an SLD working group.
Nevertheless, an extremely informative review on the neuroimaging genetics of dyslexia was published in 2017 by Mascheretti and co-workers; therein, the authors have done meticulous work to compile all available information from neuroimaging genetic association studies in established and candidate dyslexia genes, either in dyslexic cases or in the general population, covering studies published between 2010 and 2016 [37]. Thus, it is beyond the scope and the allocated space of the present article to review all dyslexia neuroimaging genetic studies anew. Instead, we have summarized findings published only in the last five years, with a focus on dyslexia and reading abilities (Table 4).
Among the most recent studies that led to the identification of novel dyslexia candidate genes, it is interesting to highlight that an intronic SNP located in CEP63 was associated with WM volume in both right and left hemispheres of healthy individuals, as well as with reading comprehension scores [146]. The cluster of significant effect overlapped with a brain region previously found to be significant for SNPs within DYX1C1 and KIAA0319 [182]. Moreover, the right temporoparietal region associated with rs1064395 in NCAN and also overlapped with a region previously associated with the dyslexia susceptibility genes KIAA0319, DYX1C1 and MRPL19, as well as CEP63 [69,183]. The 15q11.2(BP1-BP2) deletion CNV, previously associated with a larger corpus callosum [43], was also associated with a smaller left fusiform gyrus as well as with altered activation; decreased activation was also observed for the left angular gyri, regions shown to associate with language and arithmetic tasks (Table 4) [44].
Pinel and Dehaene used fMRI to investigate heritability for brain activation while participants performed mental calculations. Posterior superior parietal lobules (SPL), right intraparietal sulcus (IPS), a left superior frontal region and left inferior parietal cortex (IPC) were under genetic influence [184]. Regarding dyscalculia, it was shown that dyscalculic children have decreased GM and WM volumes in the frontoparietal network, which might be associated with impaired arithmetic processing skills, whereas the WM volume decrease in parahippocampal areas may have an influence on fact retrieval and spatial memory processing [185,186]. Brain activation patterns of children with dyslexia, dyscalculia and comorbid dyslexia/dyscalculia were highly similar in how they deviated from neural activation patterns in control children when performing arithmetic tasks while undergoing fMRI [187]. Bulthe et al. recently revealed a significant deficit in number representations in temporal, parietal and frontal regions and a hyper-connectivity in visual brain regions in adults with dyscalculia [188].
Despite the progress in unravelling the polygenic nature of SLD, even with the latest molecular genomics approaches, combined with unprecedented technological advances in neuroimaging, we still lack a comprehensive and united understanding of SLD, whereas the field of neuroimaging genetics is in its infancy. One proposal to utilize neuroimaging genetics to identify biological causes of dyslexia would be to perform MRI imaging before the onset of reading acquisition, ideally in populations enriched with children at-risk of dyslexia (due to family history or parents or siblings with dyslexia). Given that the individual's genetic makeup does not change in lifetime, a longitudinal design that would allow neuroimaging follow-up of these at-risk children until they reach reading (dis)abilities could be ideal in determining both the predictive role of brain scanning and the causal role of genetics. We reproduce this idea by Ramus et al. and expand it by adding genetics into the picture, yet we cannot but emphasize all the increased demands and challenges such a study design would impose [15]. However, it is of equally crucial importance to more deeply comprehend the neurobiology underlying these complex phenotypes and how established and emerging genes, and their variation, determine and affect neuronal development, respectively; we briefly touch on this subject in the following section. Table 4. Recent (2015-presently) neuroimaging genetic studies reporting associations between genes and genomic loci associated with reading and mathematical (dis)abilities. The list is ordered based on evidence of association for genomic loci previously associated with SLD (that is, from replicated associations to newer evidence).

A Glimpse on the Biological Background of SLD
The polygenic nature of SLD points to the existence of multiple causal pathways, much like most other neurodevelopmental disorders, where each variant contributes by a small effect to the total phenotypic variation. As observed via electrophysiological and neuroimaging studies in infants and pre-reading children, brain alterations predate reading ability or reading impairment, supporting the hypothesis that variants functioning in dyslexia susceptibility genes lead to atypical neural migration and/or axonal growth during early, most likely in utero, brain development [193,194].
However, the underlying neurodevelopmental causes of dyslexia are not fully understood. Original post-mortem neuroanatomical studies on dyslexia cases, conducted almost 35 years ago, were later followed by neuroimaging studies in humans and functional (knock-down and knock-out) animal studies. These studies lend support to the hypothesis that neuronal migration disturbances during development lead to misplacement of neurons, likely resulting in changes in white and grey matter [35,195]. The pathways that have emerged by now are relevant to neuronal migration and positioning, axon guidance regulating brain connectivity, dendritic growth, synaptic plasticity/transmission, cell adhesion, and sex hormone biology (Table 5) [36]. ROBO1, KIAA0319, DCDC2, DYX1C1 gene products are mostly implicated in neurite outgrowth, neural connectivity, migration and development ( Figure 2).
Although prior evidence from functional studies lend support to the idea that abnormal neuronal migration constitutes the neurobiological basis of dyslexia, which largely explains why this has been the most often cited hypothesis, in their recent review Guidi et al. advocate otherwise. The authors critically evaluated the hypothesis of neuronal migration and concluded that the evidence from histopathological and imaging studies in humans and functional studies in animal models is not robust enough to support it. The readers are encouraged to consult Table 1 from Guidi et al. for a thorough review on functional studies on key dyslexia genes conducted in several animal species and cell lines; therein, the authors have compiled data from reports in favor of the neuronal migration hypothesis as well as from studies refuting it [196].
Original studies also failed to find associations supporting the neuronal migration, axon guidance or steroid hormone-related pathways [75,104,109,128]. Thus, it emerges that although researchers have been keen to place many of the dyslexia candidate genes in a theoretical molecular/cellular model network involved in neuronal migration and neurite outgrowth, it seems unlikely that there is just a single explanatory model that connects all dyslexia-associated proteins on the molecular level. Rather, several etiological cascades contributing to dyslexia are likely to exist [35].
In fact, several reports have demonstrated that many dyslexia candidate genes, such as DYX1C1 and DCDC2, have a reported structural or functional role in cilia [147,149,197]. Loss-of-function mutations in DYX1C1 and DCDC2 have been found in patients with ciliopathies: DYX1C1 in cases of primary ciliary dyskinesia, with ciliary defects also confirmed in mouse and zebrafish models [148], and DCDC2 in patients with nephronophthisis-related ciliopathy, inherited deafness and neonatal sclerosing cholangitis [150][151][152][153]. Conversely, we are unaware whether patients with such ciliopathies, caused by DYX1C1 and DCDC2 mutations, show symptoms of SLD or other cognitive impairments.
Other dyslexia candidate genes, such as PCNT, CEP63 and TUBGCP5, are involved in centrosome and basal body biology (Table 5) [65,146,198]. TUBGCP5, PCNT and CEP63 are three of many centrosomal proteins involved in microtubule organization and even though they are ubiquitously expressed, brain-specific isoforms may be affected by rare variants. Centrosomal proteins are important in proper cell cycle progression; PCNT and CEP63 deficiencies were separately shown to cause microcephalic primordial dwarfism in humans [199,200], and a Seckel syndrome-like phenotype in mice, characterized by mitotic errors leading to p53-dependent neuronal progenitor cell death [201]. Bieder et al. used human iPSCs to derive a neuroepithelial stem cell line and showed that genes related to cilia were significantly enriched among genes upregulated during neuronal differentiation; importantly, a significant number of dyslexia-associated genes were detected by RNAsequencing, of which seven, including DYX1C1, were upregulated, adding further support to the hypothesis of cilia dysregulation [202].
Left-right brain asymmetry defects have been proposed as an anatomical basis to neurodevelopmental disorders, such as ASD and dyslexia, possibly mediated by ciliary dysfunction [155]. Although the aforementioned proteins are associated structurally or functionally with primary cilia, microtubules and centrosomes, it remains unclear by which molecular mechanisms aberrations in their expression can lead to cognitive impairment. For an extensive presentation on the role of genes associated with cilia homeostasis/function and neurodevelopment/brain development, the readers are referred to excellent past reviews on the subject [36,155,203].
A gene's expression or protein function is subject to genetic variation, and current methodologies allow us to observe this level of complexity with unprecedented detail by using genome-wide approaches. Still, genes do not act alone; they form pathways that interwind, creating higher-order networks and determining biological processes that are difficult to disentangle, especially in the case of complex traits lacking clear-cut diagnostic definitions, like dyslexia. Looking expectantly into the future, the ultimate goal for unravelling the biological mechanisms that contribute to and/or define SLD is the presymptomatic identification and development of age-adjusted precision intervention strategies, tailored to each individual's language, educational demands and other social and psychological factors [110]. Table 5. Expression status in brain, cellular localization, and biological role of established and suspected genes associated with SLD susceptibility; the list is sorted by chromosome.        3 Information on association with other neurodevelopmental disorders was obtained from SFARI Gene Database for ASD [177] taking into consideration all scoring levels (from high confidence to suggestive evidence) and ADHDgene database [227] which was last updated in February 2014. 4 Information for brain expression status, subcellular localization and biological role retrieved from 'The Human Protein Atlas' [228], and UniProt [229]. For annotation please refer to 'The Human Protein Atlas'. 5 Reference provided in addition to information retrieved from 'The Human Protein Atlas'.
functionally with primary cilia, microtubules and centrosomes, it remai which molecular mechanisms aberrations in their expression can lead to cog ment. For an extensive presentation on the role of genes associated with c sis/function and neurodevelopment/brain development, the readers are ref lent past reviews on the subject [36,155,203].  Table 5, generated in GTEx portal for a multi-gene quer areas (basal ganglia and hypothalamus are excluded) [204]. SLC2A3 on chromos cluded as an indirectly associated gene (potentially being trans-regulated by a dir variant on chromosome 4) (see text in Section 3). PCDHG represents a whole gene cluded from the query. TPM: Transcripts per kilobase million (expresses RNA-se normalized for gene length and sequencing depth).  Table 5, generated in GTEx portal for a multi-gene query in seven brain areas (basal ganglia and hypothalamus are excluded) [204]. SLC2A3 on chromosome 12 was included as an indirectly associated gene (potentially being trans-regulated by a directly associated variant on chromosome 4) (see text in Section 3). PCDHG represents a whole gene cluster, thus excluded from the query. TPM: Transcripts per kilobase million (expresses RNA-sequencing reads normalized for gene length and sequencing depth).
A gene's expression or protein function is subject to genetic variation, and current methodologies allow us to observe this level of complexity with unprecedented detail by using genome-wide approaches. Still, genes do not act alone; they form pathways that interwind, creating higher-order networks and determining biological processes that are difficult to disentangle, especially in the case of complex traits lacking clear-cut diagnostic definitions, like dyslexia. Looking expectantly into the future, the ultimate goal for unravelling the biological mechanisms that contribute to and/or define SLD is the presymptomatic identification and development of age-adjusted precision intervention strategies, tailored to each individual's language, educational demands and other social and psychological factors [110].   (Table A1).

Future Research Directions and Open Questions
It is of interest that increased frequency of sex chromosome aneuploidies (SCA) was reported among SLI and SLD individuals, albeit not statistically significant for the SLD group compared to the general population [230]. Individuals with SLI and SLD do not routinely undergo cytogenetic analysis, so their karyotype remains unknown. On the other hand, it is well-established that individuals with SCAs often show cognitive impairments, including speech and language, learning and mathematical disabilities [231,232]. At this point, it remains unclear whether the underlying biological defect for learning impairment in SCA cases is the deviation from X or Y chromosome gene(s) dosage alone, the coinheritance of additional structural variations, such as CNVs [233], specific changes in brain anatomy affecting cognition [234], or a combination of those. Overall, these data highlight the importance of combinatorial evaluation of such neurodevelopmental phenotypes that can benefit from early detection and appropriate management, especially considering the large proportion of cases with SCAs that remain undiagnosed [235].
Another open question is why SLD seems to be more prevalent in males than in females worldwide [236][237][238][239][240], as it is also observed for other neurodevelopmental disorders [241][242][243]. SLD sex ratios range from about 1.5-3.3:1 in epidemiological samples to 3:1 to 5:1 in referred samples [239]. If this universal sex bias cannot be attributed to factors such as ascertainment bias, definitional or measurement variation, severity of disability, language transparency and alphabet, educational practices or unequal opportunities, race, or socioeconomic status [1,9,244], then what is the remaining underlying causal factor? Arnett et al. suggested that it could be partially explained by cognitive correlates emerging prior to schooling, such as reading ability (slower processing speed in males), which could serve as a proxy for the sex difference in brain development [239]. From the biological perspective, however, convincing genetic evidence to explain the sex bias observed in SLD is still lacking or is at least contradictory [15]. In one twin study, males had greater heritability estimates (h 2 ) than females in word recognition deficit [245], whereas in another the sex-specific h 2 estimates did not reach statistical significance [246]. In a Chinese cohort of dyslexic children and adolescents analyzed for CNTNAP2, two common variants were found to confer protection against dyslexia in females; one of these variants was marginally associated with the environmental factor of scheduled reading time in female homozygotes showing lower risk for dyslexia [247]. This type of associations will require extensive approaches on genome-wide level, before we begin to speculate which molecular mechanisms underlie sex-specific brain functions. According to the liability threshold model [248], females who meet a diagnostic threshold for ASD or ADHD are expected to carry a higher genetic burden than males and male relatives of females with ASD or ADHD are more likely to be also affected than relatives of affected males [241,249]. To date, although being the subject of great debate, it remains unclear whether hormonal, genetic, epigenetic, cognitive, neurological, anatomical or environmental factors or combinations of the above contribute to sex-biased susceptibility to any of the aforementioned disorders [250], including SLD.

Conclusions
In the quest for unraveling the genetic architecture of as complex a phenotype as SLD, various methodological approaches have been applied since the first dyslexia-associated genes were identified back in the 1990s (Table 1). In the time that lapsed since, classical linkage studies in unique, large pedigrees-segregating rare, private mutations-chromosomal aberrations, genetic associations, and lately large-scale high-throughput genome-wide genotyping and sequencing studies (Tables 2 and 3) have continued to shape our understanding of this highly complex disorder. The list is continuously populated with novel gene associations whose protein products participate in a variety of biological processes (Table 5). Whether their relevance to SLD manifests via alterations in brain anatomy, connectivity and function (assessed via neuroimaging techniques- Table 4) or via perturbed cellular mechanisms (assessed via functional studies) raises the need for more research in order to reach confidence that these associations hold true. The nature of SLD, unique to our humankind and to properties of the human brain, renders the in vivo experimentation in other species suboptimal. With new technologies and analytical tools, including fourth-generation sequencing and neuroimaging, we will continue to search for the missing heritability with the ultimate hope that at least some genetic findings will translate into predictive and/or preventive measures. To do so, we will need to bridge the knowledge gaps between genomics, molecular pathways, cellular communication, neuronal circuits, neuroimaging data, with human cognition and brain function. This is a long but intriguing path to take for scientists approaching SLD from different scientific disciplines, yet 'intriguing' has always been the driving force.