Do Girls Have an Advantage Compared to Boys When Their Motor Skills Are Tested Using the Movement Assessment Battery for Children, 2nd Edition?

This study aims to investigate sex-related differences in raw item scores on the Movement Assessment Battery for Children, 2nd Edition (MABC-2) in a large data set collected in different regions across the world, seeking to unravel whether there is an interaction effect between sex and the origin of the sample (European versus African). In this retrospective study, a secondary analysis was performed on anonymized data of 7654 children with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Since country-specific norms were not available for all samples, the raw scores per age band (AB) were used for analysis. Our results clearly show that in all age bands sex-related differences are present. In AB1 and AB2, girls score better on most manual dexterity and balance items, but not aiming and catching items, whereas in AB3 the differences seem to diminish. Especially in the European sample, girls outperform boys in manual dexterity and balance items, whereas in the African sample these differences are less marked. In conclusion, separate norms for boys and girls are needed in addition to separate norms for geographical regions.


Introduction
The Movement Assessment Battery for Children, Second Edition (MABC-2) is a normreferenced measurement tool designed to assess motor competence in children between the ages of three and sixteen, inclusive [1,2]. More specifically, the primary function of the test is to help identify children "at risk of" or presenting with a definite motor impairment [1,2]. Since the test covers a large age range, it comprises three age bands (age 3-6, age 7-10 and age [11][12][13][14][15][16]. Within each age band a range of both gross and fine age-specific motor skills are assessed. These are grouped under three headings: manual dexterity (fine motor, three items), aiming and catching (gross motor, two items) and balance (gross motor, three items) [1]. The test is used in many countries around the world for many different purposes and is recommended in the international guidelines on Developmental Coordination Disorder (DCD) as one of the tests for use as part of the diagnostic process [3].
The development of any assessment instrument is an ongoing process, with updating of norms recommended every 10 to 15 years. Once a test has been published, further development takes many forms. For example, there will always be aspects of the psychometric properties of a test relating to reliability and validity that can be further explored. A good example of this is the question of test-retest reliability, which is often only evaluated over a small range of time intervals. Similarly, assessing validity is not an all-or-nothing process. explained by the onset of puberty, which occurs earlier in girls compared to boys [43], as it is known to trigger improvements in neurological, muscular, skeletal, and endocrine systems [44]. The differences shown in Table 1, however, mainly concern preschoolers (age 3-6), whereas early onset of puberty occurs well beyond this age [44]. This indicates that other environmental factors may be in play, such as family, school context and cultural expectations (religion and local sports), that may be different for boys and girls [45]. For instance, even though, worldwide, boys are better at ball skills, even as young as 15-23 months old [46], Aboriginal girls throw better than children from other cultures, which can be explained by their cultural belief that throwing for hunting and defense is important for both sexes [47]. Another aspect that has been suggested to impact motor skill development is socioeconomic status (SES). Several authors indicate that low SES has a negative impact on motor skills in children [20,48,49], whereas others do not report differences [50]. Whether or not SES has a negative impact on skills seems to depend upon the region and the type of skills being assessed [13,[51][52][53], but the interaction effect with sex remains under debate [49,53].

Participants
In this retrospective study, a secondary analysis was performed on anonymized data collected during several previous projects [4,54,55]. The registered data were delivered anonymized by Pearson and the co-authors to the first author (BSE) and could not be linked to the participants in any of the countries. The sample consisted of 7654 children with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Some of the participants formed parts of carefully stratified samples of children involved in countryspecific test standardizations, whereas other participants were randomly chosen from larger samples to explore test validity.
The characteristics of the entire sample are summarized in

Participants
In this retrospective study, a secondary analysis was performed on anonymized data collected during several previous projects [4,54,55]. The registered data were delivered anonymized by Pearson and the co-authors to the first author (BSE) and could not be linked to the participants in any of the countries. The sample consisted of 7654 children with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Some of the participants formed parts of carefully stratified samples of children involved in countryspecific test standardizations, whereas other participants were randomly chosen from larger samples to explore test validity.
The characteristics of the entire sample are summarized in

Participants
In this retrospective study, a secondary analysis was performed on anonymized data collected during several previous projects [4,54,55]. The registered data were delivered anonymized by Pearson and the co-authors to the first author (BSE) and could not be linked to the participants in any of the countries. The sample consisted of 7654 children with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Some of the participants formed parts of carefully stratified samples of children involved in countryspecific test standardizations, whereas other participants were randomly chosen from larger samples to explore test validity.
The characteristics of the entire sample are summarized in Table 2. As the table shows, most children (75.2%) were European (mean (SD) age: 8.1 (3.5)). Half of this group came from the Netherlands (50.5%), 20.1% from the UK, 19.2% from Belgium and 10.2% from Czechia. The remaining 25% of the children live in Africa and in this sample the youngest age group (3-5) was missing, making this African sample significantly older (24.8%, mean (SD) age: 10.0 (2.8)). The African subsample consisted of children from South Similar results for boys and girls.
This study therefore aims to investigate sex-related differences in raw MABC-2 item scores in a large data set collected in different regions across the world. As such, we seek to unravel whether there is an interaction effect between sex and the origin of the sample (European versus African). Answers will be sought to the following research questions:
How do children on different continents perform, and are there sex-related differences in raw MABC-2 item scores on different continents? 3.
Are the sex-related differences in raw MABC-2 item scores age-dependent?
Since the domain scores are the scaled sums of the standardized item scores, sexrelated differences at an item level should be explored first. We hypothesize that, overall, girls will outperform boys on the manual dexterity and balance items, and that boys will be better at aiming and catching than girls. Because of the daily activities of the children and their sports participation, we expect that these differences will be clearly present in the European sample.

Participants
In this retrospective study, a secondary analysis was performed on anonymized data collected during several previous projects [4,54,55]. The registered data were delivered anonymized by Pearson and the co-authors to the first author (BSE) and could not be linked to the participants in any of the countries. The sample consisted of 7654 children with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Some of the participants formed parts of carefully stratified samples of children involved in country-specific test standardizations, whereas other participants were randomly chosen from larger samples to explore test validity.
The characteristics of the entire sample are summarized in Table 2. As the table shows, most children (75.2%) were European (mean (SD) age: 8.1 (3.5)). Half of this group came from the Netherlands (50.5%), 20.1% from the UK, 19.2% from Belgium and 10.2% from Czechia. The remaining 25% of the children live in Africa and in this sample the youngest age group (3)(4)(5) was missing, making this African sample significantly older (24.8%, mean (SD) age: 10.0 (2.8)). The African subsample consisted of children from South Africa (70.9%), Ghana (20.6%) and Nigeria (8.5%). Both sexes were equally represented in all subsamples ( Table 2). The distribution across the age bands within each subsample is depicted in Table 2.

Movement Assessment Battery for Children, Second Edition
The original and Dutch versions of the MABC-2 were used in the present study [1,56]. In content, the UK and Dutch versions are identical. The test consists of three age bands (AB1 for 3-to 6-year-olds, AB2 for 7-to 10-year-olds, and AB3 for 11-to 16-year-old children), for which eight age-specific items have been defined as representative of three domains: manual dexterity (MD; 3 items), aiming and catching (A and C; 2 items) and balance (B; 3 items). In all cases, test administration followed the standardized procedure in the manual. Since country-specific norms were not available for all samples, for the purpose of this study, the raw scores were used for analysis.

Statistical Analysis
Statistical analyses were performed with SPSS 28.0 for Windows. The sample was described using demographic data (age, sex) and the distribution across the MABC-2 age bands and continents (Europe versus Africa) from which the children were recruited. The Shapiro-Wilk test was used to check for normality. The data for the total sample ( Figure 1) and for the European and African subsamples were extremely skewed for items 2, 3, 4, 6, 7 and 8.  To compare the children s age between the subsamples (European versus African) an independent Student s t-test was applied. To compare the sex distribution and age To compare the children's age between the subsamples (European versus African), an independent Student's t-test was applied. To compare the sex distribution and age-band distribution across the subsamples, a Chi-squared test was used. Subsample differences (boys versus girls or European versus African subsamples) within each age band were explored with the Mann-Whitney U test. Subsets were composed for each age band to explore sex differences within each subsample. For analyses comparing performances between European and African subsets, only the 6-year-old children were selected, as the AB1 African subsample consisted exclusively of children aged 6 years old.

Age Band 1
Sex differences were present for all items, except for posting coins with the preferred (p = 0.374) and non-preferred (p = 0.627) hand ( Table 3). The girls were better at threading beads (p < 0.001), drawing a trail (p < 0.001), standing on one leg on the preferred and non-preferred leg (p < 0.001), walking with heels raised (p < 0.001) and jumping on mats (p < 0.001) than the boys. The boys outperformed the girls on catching (p = 0.025) and throwing a bean bag (p < 0.001). Table 4 shows the differences between the sexes within each subsample. Across all subsamples, girls were better at drawing a trail and jumping on mats. The other items differed depending on the subgroup. The girls usually outperformed the boys in the European subsample, except for the aiming and catching items, where they performed similarly. In the African sample, no differences were found between the sexes for most items, except for drawing a trail (girls better than boys), throwing a bean bag (boys better than girls) and jumping on mats (girls better than boys). Details on item performances for the subsamples are provided in Table A1.

Participants
In this retrospective study, a secondary analysis was performed on anonymized da collected during several previous projects [4,54,55]. The registered data were deliver anonymized by Pearson and the co-authors to the first author (BSE) and could not linked to the participants in any of the countries. The sample consisted of 7654 childr with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Some of t participants formed parts of carefully stratified samples of children involved in countr specific test standardizations, whereas other participants were randomly chosen fro larger samples to explore test validity.
The characteristics of the entire sample are summarized in Table 2. As the tab shows, most children (75.2%) were European (mean (SD) age: 8.1 (3.5)). Half of this grou came from the Netherlands (50.5%), 20.1% from the UK, 19.2% from Belgium and 10.2 from Czechia. The remaining 25% of the children live in Africa and in this sample t youngest age group (3)(4)(5) was missing, making this African sample significantly old (24.8%, mean (SD) age: 10.0 (2.8)). The African subsample consisted of children from Sou Africa (70.9%), Ghana (20.6%) and Nigeria (8.5%). Both sexes were equally represented all subsamples ( Table 2). The distribution across the age bands within each subsample depicted in Table 2.

Participants
In this retrospective study, a secondary analysis was performed on anonymized data collected during several previous projects [4,54,55]. The registered data were delivered anonymized by Pearson and the co-authors to the first author (BSE) and could not be linked to the participants in any of the countries. The sample consisted of 7654 children with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Some of the participants formed parts of carefully stratified samples of children involved in countryspecific test standardizations, whereas other participants were randomly chosen from larger samples to explore test validity.
The characteristics of the entire sample are summarized in Table 2. As the table shows, most children (75.2%) were European (mean (SD) age: 8.1 (3.5)). Half of this group came from the Netherlands (50.5%), 20.1% from the UK, 19.2% from Belgium and 10.2% from Czechia. The remaining 25% of the children live in Africa and in this sample the youngest age group (3)(4)(5) was missing, making this African sample significantly older (24.8%, mean (SD) age: 10.0 (2.8)). The African subsample consisted of children from South Africa (70.9%), Ghana (20.6%) and Nigeria (8.5%). Both sexes were equally represented in all subsamples ( Table 2). The distribution across the age bands within each subsample is depicted in Table 2.

Participants
In this retrospective study, a secondary analysis was performed on anonymized data collected during several previous projects [4,54,55]. The registered data were delivered anonymized by Pearson and the co-authors to the first author (BSE) and could not be linked to the participants in any of the countries. The sample consisted of 7654 children with a mean age of 8.6 (range 3 to 16; SD: 3.4), 50.0% of whom were boys. Some of the participants formed parts of carefully stratified samples of children involved in countryspecific test standardizations, whereas other participants were randomly chosen from larger samples to explore test validity.
The characteristics of the entire sample are summarized in Table 2. As the table shows, most children (75.2%) were European (mean (SD) age: 8.1 (3.5)). Half of this group came from the Netherlands (50.5%), 20.1% from the UK, 19.2% from Belgium and 10.2% from Czechia. The remaining 25% of the children live in Africa and in this sample the youngest age group (3)(4)(5) was missing, making this African sample significantly older (24.8%, mean (SD) age: 10.0 (2.8)). The African subsample consisted of children from South Africa (70.9%), Ghana (20.6%) and Nigeria (8.5%). Both sexes were equally represented in all subsamples ( Table 2). The distribution across the age bands within each subsample is depicted in Table 2.

Age Band 2
As shown in Table 3, boys and girls performed significantly differently on all items (p < 0.001). The girls were better than the boys at all manual dexterity and balance items (p < 0.001). The boys outperformed the girls on the aiming and catching items (p < 0.001). Within the European subsample identical results were found to those for the entire group (Table 4). In the African subsample the results deviated from the entire group, as there were no differences between boys and girls for placing pegs with the non-preferred hand (p = 0.186), drawing a trail (p = 0.426) and walking heel-to-toe forward (p = 0.711) ( Table 4). Details for the differences between the European and African subsamples are provided in Appendix A.

Age Band 3
In this age band the sex differences were less marked ( Table 3). The girls were better at turning pegs with their preferred hand (p < 0.001) and drawing a trail (p < 0.001). The boys outperformed the girls on catching a ball with their preferred and non-preferred hand (p < 0.001), throwing a ball at a wall-mounted target (p < 0.001) and walking heel-totoe backwards (p = 0.013). In the subsamples, manual dexterity and aiming and catching were similar in the European sample to the entire group, whereas drawing a trail was not significantly different between boys and girls in the African subsample (p = 0.663). For the Not applicable to the age band.

Age Band 2
As shown in Table 3, boys and girls performed significantly differently on all items (p < 0.001). The girls were better than the boys at all manual dexterity and balance items (p < 0.001). The boys outperformed the girls on the aiming and catching items (p < 0.001). Within the European subsample identical results were found to those for the entire group (Table 4). In the African subsample the results deviated from the entire group, as there were no differences between boys and girls for placing pegs with the non-preferred hand (p = 0.186), drawing a trail (p = 0.426) and walking heel-to-toe forward (p = 0.711) ( Table 4). Details for the differences between the European and African subsamples are provided in Appendix A.

Age Band 3
In this age band the sex differences were less marked ( Table 3). The girls were better at turning pegs with their preferred hand (p < 0.001) and drawing a trail (p < 0.001). The boys outperformed the girls on catching a ball with their preferred and non-preferred hand (p < 0.001), throwing a ball at a wall-mounted target (p < 0.001) and walking heel-to-toe backwards (p = 0.013). In the subsamples, manual dexterity and aiming and catching were similar in the European sample to the entire group, whereas drawing a trail was not significantly different between boys and girls in the African subsample (p = 0.663). For the balance subscale, the sex differences between the European and African subsamples were most divergent. In the European subsample, girls outperformed boys in the twoboard balance task (p = 0.023) and in zig-zag hopping on the non-preferred leg (p = 0.044), whereas in the African subsample, boys were better than girls at the two-board balance task (p = 0.031) and walking heel-to-toe backwards (p < 0.001).

Discussion
The aim of this study was to unravel whether there is an interaction effect between sex and the origin of the sample. As hypothesized, our results clearly show that in all age bands, sex-related differences are present. In AB1 and AB2, girls are superior on most items, with the exception of A and C, whereas in AB3 the differences seem to diminish. Within the subsamples, these differences are not as straightforward, but are still present. Especially in the European sample, girls outperform boys in manual dexterity and balance items, whereas in the African sample these differences are less marked.
Since the international guidelines on DCD recommend the test's use in the diagnostic process [3], sound normative data are a cornerstone, as the result plays a crucial role in determining whether or not a child receives additional support (e.g., at school) or even treatment (e.g., physiotherapy or occupational therapy). However, our results clearly show that the normative raw data are extremely skewed (Figure 1), indicating that children either can or cannot perform the tasks, so the use of standard scores following a bell-shaped distribution is highly questionable. For example, one mistake less or more on the drawing trail can have a huge impact on the meaning of the result. This lack of distribution in the data not only adversely affects the diagnostic accuracy of a test, but also impacts the test's ability to detect changes. Furthermore, similar to what has been reported in the literature in the form of domain and total scores (summarized in Table 1) [7,11,12,19,39,40], our results reveal a marked difference at the item level in raw scores between boys and girls, indicating that separate sex-specific norms are imperative. When boys systematically perform worse compared to girls and combined normative data are applied, the chance of boys being identified as performing below the norm is higher compared to girls, with a potential risk of false positive identification of motor-skill deficits. On the other hand, girls are at risk of remaining unidentified, especially when the motor difficulties are subtler. If so many items favor girls, total scores will also be misleading, since no corrections for sex differences have been implemented so far. Either the items in the MABC-2 are less suited for boys or too much linked to skills in which girls tend to be superior, which directly reflects the test's content validity.
This does raise questions about the test's composition and, therefore, the item choices, besides building on earlier versions of the test (TOMI). For example, why was it decided that jumping in a square, as in hopscotch, is a more important task to include than jumping over a ditch (as in a long jump) or jumping towards a hoop (closer to a vertical jump); why did we choose threading beads and not pressing phone keys or building a tower of small blocks; why was aiming chosen and not throwing; and why is there no item intended to measure agility or items close to having the skills needed for personal care? These choices have very important consequences, since the types of tasks are prone to sex-related differences [13,14,[57][58][59].
There seems to be a consensus about what comprises "fundamental motor skills," and that there are three categories of items that need to be included (locomotor, ball skills and balance). For example, regardless of their age, boys usually perform better on the object control skills in the second edition of the Test of Gross Motor Development (TGMD-2) [13,14], and girls perform better on the TGMD-2's locomotor skills [14]. When children are assessed using the Athletic Skills Track (AST), where they have to balance, hopscotch, do traveling jumps, slalom, roll, run, alligator crawl and clamber as quickly as possible, requiring good physical fitness, boys outperform girls, regardless of age [60,61]. When children are asked to maintain a posture for a predefined time period (e.g., standing on one leg for 30 s), girls tend to outperform boys, whereas boys seem to do better in more dynamic situations such as balance during walking or performing reaching tasks [59]. Interestingly, most items of the MABC-2 emphasize accuracy and precision, which require both motor control and sustained attention, or enforces an accuracy-speed tradeoff for the manual dexterity items. Girls are more likely to have superior manual control abilities for performing novel tasks [62] and overall better inhibition control, which increases their selective attention, whereas boys are usually faster [63]. As such, the MABC-2 items seem easier for girls to perform, as they tap into their strengths. This also raises the question of whether a comparable percentage of boys would be diagnosed with DCD if there were more emphasis on gross motor skills and actual performance rather than accuracy (e.g., running, picking up an object and sprinting back, or a high or broad jump instead of an accuracy jump), or whether there would be a higher prevalence of DCD girls instead of boys.
One of the key environmental factors in the emergence of such differences is the sex stereotypes or gender role models that influence motor development from toddlerhood onwards [58]. Girls and boys are often encouraged to practice different types of sports, spend their leisure time differently, and perform different types of activities [45]. A first step towards a more accurate formal assessment using the MABC-2 would be to establish sex-specific norms. Yet, the sex differences in our sample were quite distinct depending on the child's background (European versus African), which emphasizes the important role of the environment. It seems that the emphasis in European culture is more on fine motor play for girls, such as coloring and fine motor games, which may increase the (natural?) differences between genders. Hence, to apply motor tests in an environmentally valid way, it would make more sense to either develop contemporary regional gender-specific MABC-2 norms, or to incorporate tasks that are closer to children's actual daily activities.
Our results make clear that when we state that a child has "poor motor skills," this assumption depends upon the items in the test (some having items that boys excel in and others having more items that favor girls) and secondly on the samples used for the norms. Motor competence is defined as the ability to perform a wide range of motor skills. However, what should be in this "wide range" is less obvious. In Europe, hitting a ball with a baseball bat or eating with chopsticks would not be seen as culturally appropriate test items to evaluate motor skills, as they do not reflect children's daily activities. On the other hand, how many boys spend time playing with beads or ministacks? Items in the MABC-2 were chosen to be as culturally independent as possible, which to a large extent was successful, by avoiding sports-related skills (jumping or throwing for distance). Given the fact that recent training paradigms are task-oriented and focus on the identification of activities children struggle with, standardized tests should also consider implementing those elements that are relevant to a child's wide range of daily tasks. For instance, tasks that require running fast without falling over or stepping on an object can be considered motor skills that belong in this wide range. Given the decreasing level of daily physical activity, such items should get a higher priority and should be integrated into future norms.
Norm-referenced tests are calibrated carefully on the representative sample for which the norms are intended, which was done optimally for the UK, the Netherlands and Flanders samples in our study [1,56]. Even between very similar societies, differences were found with the UK sample, warranting separate norms for the Netherlands [4]. It is clear that motor development and competency depend on many factors (SES, cultural beliefs, exposure, availability). Moreover, the association between SES and functional motor-skill levels seems to be culturally dependent; children from lower SES levels may participate more in active transportation and develop better locomotor skills but may have less PE, less well-equipped sports facilities and less formal sports participation, and thus be less skilled in sport-related movement skills. In developed countries, motor competence is related to SES starting from preschool age [20,48,49,64], while in developing countries, such as South Africa, results are diverging and sometimes in favor of children with low SES [65][66][67].

Limitations of the Study
Many factors known to influence motor development have not been reviewed for this study because data were anonymized and only country, gender and age were available for all children in the sample.

Conclusions
Our results show that when researchers and practitioners choose a measurement instrument to evaluate motor performance, it is important that they consider possible cultural and gender bias of the items included in that test to measure the ability to perform a wide range of motor skills, as well as the cultural background of the normative sample to which they are comparing the tested children. Clinicians should also be aware that the MABC-2 contains more items that focus on assessing motor skills in which girls tend to be superior, and as such may lead to overrepresentation of boys with lower motor competency even when separate norms are available. Informed Consent Statement: For each of the original studies, ethical clearance was gained from the local ethical committees. The data used in this study are anonymized and therefore no longer require ethical approval, as the data cannot be linked to any of the participants.

Data Availability Statement:
The data were collected in previous studies commissioned by Pearson and remain their property. As such, the data cannot be made available by the authors of this paper.

Acknowledgments:
We would like to thank Pearson for providing us with the anonymized data from the norm sampling.

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the analyses, the interpretation of data, the writing of the manuscript, or the decision to publish the results.