Prevalence of Developmental Dyslexia in Primary School Children: A Systematic Review and Meta-Analysis

Background: Developmental dyslexia (DD) is a specific learning disorder concerning reading acquisition that may has a lifelong negative impact on individuals. A reliable estimate of the prevalence of DD serves as the basis for diagnosis, intervention, and evidence-based health resource allocation and policy-making. Hence, the present meta-analysis aims to generate a reliable prevalence estimate of DD worldwide in primary school children and explore the potential variables related to that prevalence. Methods: Studies from the 1950s to June 2021 were collated using a combination of search terms related to DD and prevalence. Study quality was assessed using the STROBE guidelines according to the study design, with study heterogeneity assessed using the I2 statistic, and random-effects meta-analyses were conducted. Variations in the prevalence of DD in different subgroups were assessed via subgroup meta-analysis and meta-regression. Results: The pooled prevalence of DD was 7.10% (95% CI: 6.27–7.97%). The prevalence in boys was significantly higher than that in girls (boys: 9.22%, 95%CI, 8.07–10.44%; girls: 4.66%, 95% CI, 3.84–5.54%; p < 0.001), but no significant difference was found in the prevalence across different writing systems (alphabetic scripts: 7.26%, 95%CI, 5.94–8.71%; logographic scripts: 6.97%, 95%CI, 5.86–8.16%; p > 0.05) or across different orthographic depths (shallow: 7.13%, 95% CI, 5.23–9.30%; deep: 7.55%, 95% CI, 4.66–11.04%; p > 0.05). It is worth noting that most articles had small sample sizes with diverse operational definitions, making comparisons challenging. Conclusions: This study provides an estimation of worldwide DD prevalence in primary school children. The prevalence was higher in boys than in girls but was not significantly different across different writing systems.


Introduction
Developmental dyslexia (DD) is a specific impairment characterized by severe and persistent problems in the acquisition of reading skills; these problems are not caused by mental age, visual acuity problems, or inadequate schooling [1,2]. DD, also referred to as specific reading disability or specific reading disorder, is by far the most common type of learning disability, accounting for approximately 80% of all learning disabilities [3]. Due to their frustration with reading, a great number of dyslexic children are also at increased risk

Overall Pooled Prevalence of DD
Before pooling the prevalence estimates, the variance of raw prevalence from each included study was stabilized, using the Freeman-Tukey double arc-sine transformation [40]. All estimates were presented after back transformation. We assessed the heterogeneity of prevalence estimates among studies using the Cochran Q test and I 2 index [41,42]. For the Cochran Q test, p < 0.05 represented significant heterogeneity. For the I 2 index, values of 25% or lower corresponded to low degrees of heterogeneity, 26% to 50%, to moderate degrees of heterogeneity, and values greater than 50% to high degrees of heterogeneity [41,42].
Because of high heterogeneity (as expected and observed), a random-effect metaanalysis (following the DerSimonian and Laird method) was used to calculate the overall pooled prevalence of DD with 95% CIs throughout this study [40]. To examine whether single studies had a disproportionally excessive influence, we applied a "leave-1-out" sensitivity analysis for each meta-analysis [43]. Publication bias in the meta-analysis was detected qualitatively by a visual inspection of funnel plots and quantitatively by the Egger linear regression test and the Begg rank correlation test when more than 10 estimates were available in a single analysis [44][45][46].

Subgroup Meta-Analysis and Meta-Regression of DD Prevalence
We conducted subgroup meta-analyses to determine potential sources of heterogeneity. As a rule, at least three studies should be available per subgroup.
Multiple data points were generally reported in a single study. To assess the associations among various sample characteristics and the prevalence of DD, we first conducted a univariable meta-regression, if possible, followed by a multi-variable meta-regression [47]. As a rule, at least 10 data points should be available for each variable in univariable metaregression, and 20 in multivariable meta-regression [48,49]. Data were analyzed using RStudio, version 2021.09.1-372 (R Foundation for Statistical Computing).

Study Selection and Characteristics
As outlined in Figure 1, our initial literature search identified a total of 6564 records. After applying the eligibility criteria, a final set of 56 articles, featuring 58 studies, were included in our quantitative synthesis. A list of the 56 included articles is given in Table A3.
The detailed characteristics of the included articles can be found in Table A3. In all, 41 of the 58 studies (70.69%) reported prevalence data for both boys and girls. Of the 58 studies, 27 (46.55%) were conducted among children using alphabetic scripts, while 31 (53.45%) were conducted among children using alphabetic scripts. In addition, grade 3 was the most-studied grade (21,36.21%) and random sampling was the most-used method (37,63.79%), while only four studies (6.90%) had a sample size greater than 10,000. Moreover, more than half of the 58 studies (33, 56.90%) were conducted in the Western Pacific area and in middle-income countries (40,68.97%). Table 1 illustrates the results of overall and subgroup meta-analyses. Regarding DD, the pooled prevalence was 7.10% (95% CI: 6.27-7.97%), as ascertained using random-effects meta-analysis ( Figure 2). The detailed characteristics of the included articles can be found in Table A3. In all, 41 of the 58 studies (70.69%) reported prevalence data for both boys and girls. Of the 58 studies, 27 (46.55%) were conducted among children using alphabetic scripts, while 31 (53.45%) were conducted among children using alphabetic scripts. In addition, grade 3 was the most-studied grade (21,36.21%) and random sampling was the most-used method (37,63.79%), while only four studies (6.90%) had a sample size greater than 10,000. Moreover, more than half of the 58 studies (33,56.90%) were conducted in the Western Pacific area and in middle-income countries (40,68.97%). Table 1 illustrates the results of overall and subgroup meta-analyses. Regarding DD, the pooled prevalence was 7.10% (95% CI: 6.27-7.97%), as ascertained using randomeffects meta-analysis ( Figure 2).

Sensitivity Analysis and Publication Bias
The "leave-1-out" sensitivity analysis showed that the pooled prevalence of DD varied from 6.93% (95% CI: 6.13-7.78%) to 7.21% (95% CI: 6.38-8.09%) after removing a single study at one time ( Figure A1), indicating that no individual study significantly influenced the overall pooled prevalence in the meta-analysis. Publication bias was established based on the funnel plot ( Figure A2), Egger test (t = 6.25, p < 0.001), and Begg test (z = 1.96, p = 0.05). Table 1 and Figure 3 showed the prevalence of DD in different genders, writing systems, operational definitions, grades, sample sizes, sampling methods, sub-deficits, WHO regions, WB regions, and the forest plot for the difference in these factors.

Sensitivity Analysis and Publication Bias
The "leave-1-out" sensitivity analysis showed that the pooled prevalence of DD varied from 6.93% (95% CI: 6.13-7.78%) to 7.21% (95% CI: 6.38-8.09%) after removing a established based on the funnel plot ( Figure A2), Egger test (t = 6.25, p < 0.001), and Begg test (z = 1.96, p = 0.05).  Figure 3 showed the prevalence of DD in different genders, writing systems, operational definitions, grades, sample sizes, sampling methods, sub-deficits, WHO regions, WB regions, and the forest plot for the difference in these factors.

Discussion
This systematic review and meta-analysis estimated the worldwide prevalence of DD in primary school children, with a prevalence of 7.10% (95% CI: 6.27-7.97%). There was a significant gender difference, and the gender ratio of boys to girls was about 2:1. However, there was no language-specific difference in the prevalence of DD. In addition, the prevalence was influenced by operational definition and sample size, but not by subdeficits, grade, sampling method, WHO region or WB region. To our best knowledge, this is the first synthesized analysis on the prevalence of DD.
The pooled prevalence of 7.10% (95% CI: 6.27-7.97%) that is estimated in the present study is within the range of previous selective reviews, which have suggested that the prevalence of DD was in the range of 5-17.5% [14,15]. This is likely due to the similar diagnostic criteria of DD in most of the previous studies, in which DD was mainly defined as the low end of a normal distribution of word-reading ability [50]. Many disorders do not represent categories but instead the extremes on a continuous distribution that ranges from optimal outcomes to poor outcomes, with the underlying causal mechanisms being similar across the whole distribution. Essentially, most behaviorally defined disorders, including DD, are continuous disorders. In the present study, we were able to pool the prevalence of DD in children based on the available evidence, which allowed our systematic review and meta-analysis to provide a more comprehensive estimate of the prevalence of DD.
Interestingly, our calculation of the gender ratio regarding DD of boys to girls is about 2:1 (boys: 9.22%; 95% CI: 8.07-10.44%; girls: 4.66%; 95% CI: 3.84-5.54%) (p < 0.001). This result is consistent with previous studies that reported a higher prevalence of DD for boys than for girls [31,35,51]. One explanation for this gender difference in DD prevalence is that some teachers are more likely to refer boys for assessment as having special problems because boys are often perceived as being more disruptive than girls [52]. However, focusing on large-scale epidemiological studies that were not based on school-referred samples, Rutter and his colleagues (2007) also found that boys were more likely than girls to have a reading disability, indicating that teacher bias cannot account entirely for gender difference [53]. A similar phenomenon is also found in logographic writing systems [54,55]. Other explanations come from biological and environmental hypotheses, including genetic causes [56,57], immunological factors, perinatal complications, differences in brain functioning due to differential exposure or sensitivity to androgens [58], and differential resilience to neural insult [59]. Our current study cannot provide enough evidence to support or reject any of the above hypotheses; therefore, more studies on DD in both boys and girls are needed in the future. At the same time, the current findings suggest that teachers may need to pay more attention to boys who exhibit reading difficulties or disorders.
Another important finding is that the prevalence of DD did not differ significantly when stratified by writing system (alphabetic scripts: 7.26%, 95% CI, 5.94-8.71%; logographic scripts: 6.97%, 95% CI: 5.86-8.16%; p = 0.74). This is an unexpected result since logographic scripts are very distinctive (such as arbitrary mapping between the graphic and sound forms of words) relative to alphabetic scripts from the perspective of language; therefore, some experts believe that DD may be absent or rare in logographic scripts [26]. Research on DD has been initially and mainly conducted among the users of alphabetic scripts. Until the 1980s, researchers examined large samples of fifth-grade children in Japan, Taiwan, and the United States using a reading test and a battery of 10 cognitive tasks. However, the results showed that the prevalence of DD in Japan, Taiwan, and the United States was 5.4%, 7.5%, and 6.3%, respectively, suggesting that there is no significant difference in the prevalence of DD among different writing systems [27]. One explanation for this and our current findings is that the similarity in DD prevalence across different writing systems may be related to cross-cultural universality in the neurobiological and neurocognitive underpinnings of DD [15]. Some Western researchers and writers believed that Chinese characters are derived from pictographs, but this is not true. Instead, Chinese orthography is not primarily pictographic [27].
In addition, we found that DD prevalence did not differ across languages with different orthographic depths (shallow: 7.13%, 95% CI, 5.23-9.30%; deep: 7.55%, 95% CI, 4.66-11.04%; p > 0.05). These findings support the psycholinguistic grain size theory rather than the orthographic depth hypothesis [28,29]. When the orthography of the language is relatively shallow, readers can focus exclusively on the small psycholinguistic grain size of the phoneme. Otherwise, they will learn additional correspondences for larger orthographic units, such as syllables, rhymes, or whole words. Therefore, the prevalence of DD is very similar in both consistent and inconsistent orthographies, but its manifestations may vary according to orthographic depth.
Remarkably, operational definitions significantly affected the prevalence of DD. The present study found that studies with stricter operational definitions reported lower prevalence. Specifically, DD prevalence was significantly lower when using 1.5 SD and 2SD as the cut-off values than when not reporting SD (1.5 SD: 5.36%, 95% CI, 4.28-6.55%; 2 SD: 5.32%, 95% CI, 4.56-6.13%; without reporting SD: 9.10%, 95% CI, 7.18-11.21%; both p < 0.05, FDR-corrected). This finding is consistent with a recent selective review, suggesting that the prevalence depends on the severity of the reading problem-with lower rates for more severe problems [16]. Although the recognition of DD dates back over a century, no consensus has been reached regarding its diagnostic criteria. Therefore, many studies even use scores below 20% [60], scores in the bottom 10% [61], using different materials, and many other cut-offs for convenience. Essentially, all behaviorally defined disorders, including DD, are continuous disorders, and their operational definitions are found to be confusing in the current study. Perhaps now is not the time for change, with the continuous development of theoretical and empirical research; perhaps there will be a more appropriate operational definition for DD in the future.
It is worth noting that studies with more than 10,000 subjects reported a lower average prevalence of DD when compared to studies with 500-1000 and 1000-1500 subjects. By reviewing these studies, we found that the large sample-size studies have a common feature: that is, the diagnostic criteria were relatively strict. Only students who scored 1.5 or even 2 SD below the average on diagnostic tests were diagnosed as having DD [35,62,63]. Because of their strict diagnostic criteria, the prevalence was significantly lower than that of other subgroups [18,20]. Interestingly, in studies on other disorders, such as Tourette's syndrome, epidemiological investigations also demonstrated that studies with larger sample sizes tended to report a relatively lower prevalence [64,65], although the reason is not clear.
There was no grade difference in DD prevalence. In the literature, the association between grade and DD prevalence remains unclear. Some studies reported that DD prevalence was lower in higher grades than in lower grades [66], and explained this finding with the argument that DD symptoms improve through systematic learning [14]. Several studies, however, have shown a higher DD prevalence in higher grades, relative to that observed in lower grades [67]. In addition, most studies reported no difference in DD prevalence among different grades [68][69][70]. Studies have shown that the level of reading ability in the first few years of school will continue in the following years and that the DD prevalence during schooling does not change greatly [20,37]. Most previous studies only studied the prevalence of DD in specific grades, mainly in grades 3 to 5, which makes it difficult to directly and empirically address the above issue [55,70,71]. In order to examine whether and how DD prevalence changes with progression through grades, future studies need to include all grades of elementary school and make the sample sufficiently representative. There was also no difference in the prevalence of sub-deficits. This shows that different tests and different indicators have no effect on the prevalence rate. That is, when there is a problem with accuracy, there is usually a problem with fluency or comprehension, and dyslexia shows no obvious differentiation.
As expected, we found significant heterogeneity when pooling the prevalence rates of DD. Thus, we performed sensitivity analyses, subgroup analyses, and meta-regression on many variables. After omitting each study one at a time (leave-1-out forest), the pooled prevalence of DD was shown to be robust and consistent. That is, no one study in this meta-analysis exerted a very high influence on our overall results. Under this condition, we further explored the patterns of effect sizes and heterogeneity in our data through a graphic display of heterogeneity (GOSH) plots [72] and found that all included studies had a low effect size and high heterogeneity ( Figure A3). This result was consistent with the results of subgroup analysis, i.e., each subgroup had high heterogeneity (Table 1). In meta-regression, only the p-value of the sample size reached a significant level, which could explain the 39.56% heterogeneity (R 2 = 39.56%). This indicates that the large variations in sample size among different studies may be an important reason for their heterogeneity. Another reason for heterogeneity may be that children were drawn from studies performed in a wide variety of countries with differing cultural, ethnic, social, and economic characteristics.
In conclusion, such high heterogeneity in epidemiological meta-analysis is not unexpected. However, the results of this study should be interpreted with caution.
The strengths of this study include the comprehensive search strategies, a double review process, and stringent selection criteria. In our systematic review, we included only studies that were conducted in standard primary schools so that the generalizability of our results could be fully guaranteed. Moreover, we were able to pool the prevalence of DD in the included children based on the available evidence, which allowed our systematic review and meta-analysis to cover a broad scope regarding the prevalence of childhood DD.
Several intrinsic limitations of this study should also be acknowledged. First, the pooled prevalence of DD in the studied children might be affected by publication bias. We tried to minimize publication bias by searching for non-English literature and conference abstracts. Unfortunately, we could not completely rule out publication bias because of the observational nature of our study. Second, there were inherent disadvantages in pooling prevalence reports from disparate studies. For DD, sufficient data were available to pool the prevalence estimates. However, our subgroup analysis on the prevalence of any DD according to grade group, region group, and income group were only based on a limited number of studies that provided corresponding prevalence numbers. Third, ten variables across the included studies were systematically assessed, and only those studies with a large sample size were identified as showing a lower prevalence of DD. Previous studies [73,74] have suggested that socioeconomic factors were likely to contribute to disparities in DD prevalence rates in different subgroups. However, only high-and middle-income countries were assessed in the current study. Future studies are needed to explain the heterogeneity. More high-quality epidemiologic investigations on DD appear to be necessary, especially regarding different grades and in low-income countries.

Conclusions
This systematic review and meta-analysis is the first study to estimate the worldwide prevalence of DD. The results suggested that DD represents a considerable public health challenge worldwide (with a prevalence of 7.10%, 95% CI: 6.27-7.97%) and boys seem to be more affected than girls. There was no significant difference in the prevalence of DD either between logographic and alphabetic writing systems or between alphabetic scripts with different orthographic depths. However, a clear operational definition is urgently needed for the diagnosis of DD.
Author Contributions: L.Y. and J.Z. conceived and designed the protocol. L.Y. drafted the protocol manuscript. C.L., X.W. and J.Z. critically revised the manuscript for methodological and intellectual content. X.L., M.Z., Q.A. and Y.Z. participated in the development of the search strategy and data analysis. All authors have read and agreed to the published version of the manuscript.

Conflicts of Interest:
The authors declare no conflict of interest.
Appendix A Table A1. Search strategy.

Database Search Strategy
Springerlink TI("dyslexia" OR "reading disabilit*" OR "reading disorder*" OR "word blindness" OR "specific reading retardation" OR "backward reading" OR "reading difficult*" OR "learning disabilit*") AND AB("prevalence" OR "detectable rate" OR "incidence rate" OR "epidemiology") EMBASE ((dyslexia OR 'reading disabilit*' OR 'reading disorder*' OR 'word blindness' OR 'specific reading retardation' OR 'backward reading' OR 'reading difficult*' OR 'learning disabilit*'):ti) AND ((prevalence OR 'detectable rate' OR 'incidence rate' OR epidemiology):ab) "*" was used to replace zero, single or multiple characters.  (2) those children whose residuals were ≥ 1.0 SD were described as "underachievers"   (1) A score of PRS < 65; (2) the Chinese score lags behind the average score of the same class by more than 1 SD, with LD lasting more than one year, and it was difficult to complete the class and homework independently; (3) the reading test score was less than 2 SD of the mean of group test scores; According to ICD-10, the total score of PRS was less than 60, or the score of verbal type (factor A and B) was less than 20, or the score of non-verbal type (factor C, D and E) was less than 40  (1) The total score of DCCC was more than 2 SD higher than the mean score; (2) a score of PRS < 65; (3) academic achievement was at the bottom 10% of the class; (4) IQ > 80; (5)  (1) An analysis of Persian reading ability (APRA); (2) Wechsler intelligence scale for children-third edition (WISC-III) (1) IQ ≥ 85; (2) reading scores in three trimesters of one academic year were more than 1.5 SD below that expected from their math scores; (3) no history of brain damage, hearing or visual problems (1) Achieved a result equal to or less than the percentage 5 in the TIL; (2) a result below the PRP mastery criteria; (3) normal IQ; (4) the phonological awareness score was significantly lower than those presented by control groups (1) A score of PRS < 65; (2) Chinese scores were in the bottom 10 of the class. According to the head teacher's evaluation, they had learning difficulties lasting more than one year, and had difficulties in completing the classroom and homework independently; (3) IQ > 80; (4) the converted T-score of DCCC was lower than the mean plus 2 SD;  (2) the score on the BCL scale was greater than or equal to 18; (3) IQ ≥85; (4) subjects performed 1 SD lower than the average level of the same grade in one-minute word reading task, Chinese word reading task, literacy task, and fast naming task; (5)  (2) the 10 students tested the self-compiled "One-minute Chinese Word Reading Test", and then selected children whose scores were lower than the percentile grade corresponding to 1.5 SD from the average score of the grade norm; (1) The score of DCCC was 2 SD higher than the mean score of all the students in the same grade; (2) a score of PRS < 65;   (1) A questionnaire derived from the validated questionnaire "RSR-DSA"; (2) a 4th-grade dictation task; (3) the DDE-2 battery (battery for the assessment of developmental dyslexia and dysorthographia-2); (4) the Wechsler intelligence scale for children (WISC-III); (5) battery for the evaluation of developmental dyslexia and dysorthography-2 (DDE-2); (6) the MT battery (prove di lettura MT per la scuola elementare-2); (7) Raven's progressive matrices (PM47); (8) a strengths and difficulties questionnaire (SDQ) (1) The total score was > 85% or the score on two subgroups of questions specifically addressing dyslexia > 90%; (2) children scoring ≥ 90% in the dictation task; (3) children failed in at least one of four scores in DDE-2; (4) WISC-III weighted score > 7; (1) The scores of the last three Chinese mid-term and final exams were lower than the grade average level and the math scores were normal; (2) the evaluation results of Chinese teachers on students' Chinese reading performance; (3) no brain damage or intellectual, visual or hearing impairment;  (2) the pupil rating scale-revised screening for learning disabilities (PRS); (1) No brain diseases such as visual and hearing impairment, brain trauma, epilepsy, etc.; (2) the Chinese score was in the last 10% of the class; (3) one subscale or total score in the DCCC was 2 SD higher than that of children of the same age; (4) the score of the PRS was < 65 (2) the pupil rating scale-revised screening for learning disabilities (PRS); (1) No brain diseases such as visual and hearing impairment, brain trauma, epilepsy, etc.; (2) the Chinese score was in the last 10% of the class; (3) one subscale or total score in the DCCC was 2 SD higher than that of children of the same age; (4) score of the PRS < 65 (2) Raven's intelligence test (1) Students whose reading level was considered by the teacher to be at the bottom 25% of the class; (2) the score of "one-minute word reading test" was 1 SD lower than the grade average;      Figure A3. GOSH plot. Figure A3. GOSH plot.