Evaluation of the Psychometric Properties of the Movement Assessment Battery Second Edition (M-ABC2): A Cross-Sectional Study

This cross-sectional study assessed the psychometric properties of the movement assessment battery for children second edition (M-ABC2) in Italian children with typical development, focusing on reliability and percentile rank establishment. The M-ABC2 is widely utilized but lacks validation in Italian populations. One hundred and eight children were recruited. Test-retest reliability was evaluated using intra-class correlation coefficients (ICCs), indicating good to excellent reliability across age groups, albeit with outliers requiring further scrutiny. Standard scores and percentile ranks were established, revealing age-related variations in motor performance. Noteworthy differences in completion times and error rates were observed among the age groups, highlighting the dynamic nature of motor development. While the findings provide valuable insights for clinicians and researchers, limitations such as sample size and geographic representation should be addressed in future studies. This study underscores the importance of employing reliable assessment tools for comprehensive understanding and effective intervention in pediatric motor development.


Introduction
Movement skills are essential for children's daily activities.They are fundamental, for interacting with friends or participating in school activities [1].
During the early stages of development, motor skills are necessary for exploring the surrounding environment, while later in life, both gross and fine motor skills are crucial for performing various tasks, including self-help skills like dressing or writing [2].
Children with motor impairments may have challenges performing activities of daily living (ADL), such as academic, social, and house-related activities.Low self-esteem, low self-confidence, anxiety, and social isolation are some of the problems that children with atypical motor development may experience [3].
Different tools to assess movement performance in childhood and adolescence are available [4][5][6].
To discriminate the motor performance of typical and atypical children, choosing the appropriate test that evaluates motor skills adequately and fairly is necessary based on its validity and reliability.Validated measurement tools are essential in the assessment stage to learn about the child's initial status, build an evidence-based diagnosis, determine the goals and treatment plan, and, finally, measure treatment effectiveness.Additionally, the use of standardized and validated instruments enables the assessment of the level of development of the population with neurodevelopmental disorders compared to that of the general population.Validating a tool on a healthy population before its use on an affected population is crucial for several reasons.Firstly, it ensures the stability and reliability of the scale by assessing its effectiveness in measuring the intended abilities or characteristics among individuals without health conditions [7].Additionally, data collected from healthy subjects can establish reference values, aiding in interpreting the results from the affected population [8].Evaluating the acceptability and practicality of the assessment tool on healthy individuals helps determine its usability in clinical or research settings [9].Lastly, validating the scale on a healthy population before its use on an affected population minimizes the risk of bias, ensuring that the results are accurate and reliable [10].This ensures that assessment tools are valid and reliable before making clinical or diagnostic decisions.
The movement assessment battery for children (M-ABC) is one of the most widely studied and validated assessment scales in several languages [11][12][13][14][15][16].The M-ABC2 is a battery that comprehensively measures motor skills, which was developed based on a normative sample in the UK by Henderson et al. as a revision of the previous version [17].It has undergone numerous validation studies in various populations and cultural contexts.The M-ABC2 has been translated into many languages and is widely used in clinical and research studies worldwide, making it one of the most reliable and valid scales for assessing motor skills in children.The M-ABC2 has been studied in various populations, including children without motor issues.For instance, its validity and reliability have been investigated in Iranian children [12].The tool was also assessed in a cohort of children in the Netherlands, indicating that the updated assessment can effectively evaluate motor performance in typically developing three-year-olds [18].The study by Jaikaew et al. established typical scores achieved by German children on the M-ABC to provide a basis for comparison for professionals in interpreting results and making motor diagnoses in German children [14].A recent study also analyzed the applicability of the tool in children aged 3-6 years in Taiwan, comparing the results with those of a standardized sample population in the United Kingdom [11].
Validation and cultural adaptation have shown that this tool is valid and reliable in all languages it has been translated into; however, item differences depend on the country.For example, the test administration time differed by ten seconds for each item between Chinese and Greek children.Test-retest reliability was performed in both typical children [19,20].The results indicated a difference of 1 standard deviation despite the children's ability to complete the tests.It is, therefore, critical to verify performance with normative data [21].
These studies demonstrate the usefulness of the M-ABC in assessing motor skills in children with typical development and providing valuable baseline data for interpreting scale results in different populations.
Currently, the M-ABC2 is available in Italian, but there are no validation studies of the psychometric properties in children with typical development.
For this reason, this cross-sectional study aimed to assess the psychometric properties of the M-ABC2 in a population of Italian children with typical development, measuring the test-retest reliability and establishing percentile ranks for the different subscales.Reference percentiles were calculated for the different subscales of the M-ABC to provide a normative context for evaluating children's motor performance.Using the data collected from the reference population, reference scores corresponding to specific percentiles were determined, enabling a comparative assessment of children's motor skills based on their age.

Participants
The participants were enrolled at the Human Neurosciences department of Sapienza Università di Roma from May 2023 to October 2023.The inclusion criteria were ages between 3 and 16 years, the absence of known medical conditions or disorders, and written informed consent obtained from the children's parents.All children were assessed individually according to the test rules.The parents or guardians provided written informed consent, indicating that the children participated voluntarily and could withdraw from participation at any time without providing a reason.It also stated that anonymity was guaranteed and that the data would be protected.The participants were treated according to the principles outlined in the Helsinki Declaration, which ensured ethical conduct throughout the study.
In the literature, sample size recommendations range from 2 to 20 subjects per item [22].In the articles analyzed in a systematic review of sample size used to validate a scale, the mean subject-to-item ratio was 28, with a minimum of 1 and a maximum of 527 [22].Furthermore, Shoukri et al. [23] reported, "However, in many cases, values of the reliability coefficient under the null and alternative hypotheses may be difficult to specify.Under such circumstances, one can safely recommend only two or three replications per subject".Consistent with previous studies of the M-ABC2 and according to the recommendations available in the literature, the authors of this study considered a minimum sample size of 30 subjects adequate.The sample estimation was decided considering the validations of the M-ABC 2 in the other languages [11,12,14,16,[19][20][21].

Movement Assessment Battery Second Edition (M-ABC2)
The movement ABC test is designed to be administered individually within a setting with specific characteristics [17].
The assessment room should measure at least 6 m × 4 m and have a smooth, white wall.Part of the floor surface should be relatively hard and smooth, and the space should also be equipped with at least one table and two chairs to adequately perform manual dexterity tasks.
It is recommended that appropriate physical education clothing be used so that the movements are not impaired and are easily observed.
The test aims to classify children between the ages of 3 and 16 years and 11 months according to the degree of motor impairment; when performing the movement ABC-2 battery test, the child or young person is asked to perform a series of eight standardized motor tasks.The battery consists of three subtests, each of which consists of several items that measure three different areas: -Manual dexterity (three items); -Aiming and grasping (two items); -Balance (three items).
The item scores should be transformed into standard scores (mean = 10).Some tasks must be performed first with the preferred hand/leg and then with the non-preferred one.The score for these tasks is calculated by averaging the two attempts.
The total test score (TTS) is calculated by summing the standard scores of the eight items (range = 8-152).Standard scores are provided in the TTS manual (mean = 10; SD = 3) both for the three sections of the test and the total score, adjusted for age and percentiles and broken down by year of age from 5 to 16 years and by semester of age from 3 to 4.11 years.
It also provides quantitative and qualitative information on how the child or young person copes with and performs these tasks and, thus, on movement skills.
The test is divided into 3 age groups: -Age band 1: 3 to 6 years old; -Age band 2: 7 to 10 years old; -Age band 3: 11 to 16 years old.
The administration time of the eight tasks varies from 20 to 40 min, depending on the subject's age, the degree of difficulty, and the examiner's experience.

Procedures
The initial administration, conducted by two independent operators who were not authors, played a crucial role in confirming the scale's independence from raters and the non-influence of the administration differences on the obtained scores.The scale's reliability and validity were evaluated using the "Consensus-Based Standards for the Selection of Health Status Measurement Instruments" (COSMIN) [24] checklist.
The test-retest reliability was evaluated by measuring the stability of the single items when carried out at different times (test-retest), at the end of which the intra-class correlation coefficient (ICC) was calculated.A time interval of 48 h was considered appropriate for the current population in accordance with previous validation and cultural adaptation studies of the same test.Based on the 95% confidence interval of the ICC estimate, values less than 0.5, between 0.5 and 0.75, between 0.75 and 0.9, and greater than 0.90 were indicative of poor, moderate, good, and excellent reliability, respectively [25,26].
Standard scores are used to assess an individual's relative position within a reference distribution; they are calculated so that the mean is 100 and the standard deviation is typically 15 or 16, allowing for a uniform comparison of the results across individuals or groups.Percentiles are a statistical concept used to describe the position of a particular value within a dataset relative to the rest of the values.They divide a distribution into one hundred equal parts.Percentiles help us to understand how an individual's performance compares to others in the same population.
All statistical analyses were performed using IBM SPSS Statistics for Windows (version 23.0;IBM Corp., Armonk, NY, USA).

Results
As reported in Table 1, a cohort of 108 participants, comprising 69 males, was enlisted.The sample under scrutiny mirrored the Italian populace concerning key sociodemographic traits such as ethnicity and parental education levels.However, it is crucial to acknowledge that the geographical dispersion might need full representation, considering the confinement of recruitment to two territorial services within the Rome region.The test-retest reliability results are delineated for each age group, facilitating a nuanced examination of the measurement consistency across different age spans.As indicated in Table 2, the intra-rater test-retest reliability exhibited a good level, with an interclass coefficient (ICC) ranging from 0.823 to 0.975 within the 3 to 6.11 age group, and similarly within the 7 to 10.11 age group (ICC range: 0.839 to 0.940), except for the outliers identified in items 3 (ICC = 0.111) and 6 (ICC = 0.039).In the third age band, correlation coefficients varied from moderate to good (0.517 to 0.887), excluding items 3 and 8, which yielded correlations of 0.233 and 0.140, respectively.
Table 3 illustrates the inter-rater test-retest reliability across each band, demonstrating excellent reliability.In band 1, the ICC ranges from 0.964 to 1, while in band 2, it varies from 0.902 to 1, and in band 3, it ranges from 0.943 to 1.   Table 4 presents the results for the first age group (3-6.11years).The maximum time to complete the first task was 30 s for the preferred hand and 32 s for the non-preferred hand.Regarding the second item (bimanual manual dexterity activity), the maximum completion time for this age group was 150 s; in comparison, the minimum time was 16.9 s, indicating that almost half of the population completed the task in less than a minute.Regarding the third item on manual dexterity, half of the population made fewer than five errors, with a maximum of seventeen errors recorded.As for the "Aiming and Grasping" subscale, the minimum score was two, as 10% of children scored zero on both items in the domain.Finally, concerning the last subscale, "Balance," the scores for item 6 indicate no difference between the preferred and non-preferred leg, while item 8 highlights that 40% of the population achieved the maximum score of five.
Table 5 presents the standard scores for the second age group (7-10.11),which includes 17 children.In contrast to "Band 1", the first item of the "Manual dexterity" subscale demonstrates, for the minimum score of one, a noticeable difference in terms of time between the preferred and non-preferred hand.A child aged between 7 and 10.11 years achieving a time of 50 s with the preferred hand received a score of one, whereas achieving the same time with the non-preferred hand yielded a score of eight.Similarly, 45% of children made fewer than five errors within the second age group.Regarding the "Aiming and Grasping" subscale, akin to the first age group, the minimum attributable score was two, as only 10% of the population scored zero.The last subscale underscores that, for the final item, a score lower than two could not be assigned, as no child scored 0.
Table 6 presents the results of the third age group, comprising 20 children aged between 11 and 16.11 years.In this instance, no significant difference in timings between the preferred and non-preferred hands is observed.Concerning the third item of manual dexterity, it is noteworthy that 11 out of 20 of the reference population committed zero errors while performing task number three.Regarding the "Aiming and Grasping" subscale, the minimum attributable score for item 4 executed with the preferred hand was four, as the minimum score obtained was five, achieved by 4 out of 20 of the population.Conversely, the minimum attributable score for the non-preferred hand was one, as 10% of the popu-lation obtained a score of three.The "Balance" subscale shows that three-quarters of the population scored five in performing item 8 with the preferred leg, equating to a standard score of nineteen; this score differs by 10% when performed with the non-preferred leg.
Table 7 shows the scores grouped by percentiles.

Discussion
This cross-sectional study aimed to assess the psychometric properties of the M-ABC2 in a population of Italian children with typical development, measuring test-retest reliability and establishing percentile ranks for the different subscales.
The study demonstrates promising test-retest reliability across various age groups, as indicated by the inter-class correlation coefficients (ICCs) ranging from 0.823 to 0.975.These findings suggest consistent measurement outcomes over time, reinforcing the reliability of the assessment tool.However, the presence of outliers in specific items necessitates further investigation into potential sources of variability and the validity of these measures.Notably, outliers in the ICCs, indicative of potential variability in test-retest reliability, were predominantly observed within age groups two and three.This phenomenon may stem from the limited sample size or the possibility of healthy children exhibiting immediate performance enhancements upon repeating the same task, a phenomenon commonly observed in psychometric assessments [27,28].Previous research has also suggested that individual differences in motor development and learning capabilities may contribute to variations in test-retest reliability across different age groups [29].However, further investigation is warranted to fully elucidate the underlying factors influencing the presence of outliers in correlation coefficients within specific age cohorts.
While the M-ABC2 is globally used, some psychometric issues have been identified in various countries.For example, in Germany, Wagner et al. [30] found problematic models in four motor tasks, raising concerns about discriminant and convergent validities.Similarly, in China, authors concluded that the reproducibility and validity of age band 1 in the M-ABC2 were poor, suggesting the need for adjustments to improve the test's psychometric properties [31].In Brazil, an analysis of M-ABC2 multidimensionality for children aged 7-10 years indicated the necessity of excluding three subtests to achieve a better-adjusted model [15].However, there is a gap in the literature, particularly in Brazil, regarding a detailed adequacy analysis of the M-ABC2 for children aged 3-5 years.This gap was emphasized by Brown [16], who highlighted issues regarding context, item transition, and the assessment of one age group at a time.
The analysis reveals age-related differences in performance across various tasks, emphasizing the importance of considering developmental trajectories in assessing motor skills and dexterity.Differences in completion times and error rates were observed across different age bands, underscoring the dynamic nature of motor development during childhood and adolescence.The interpretation of the outcomes must consider the study limitations related to the small sample size and inadequate sample stratification within each age group.The findings suggest the need for further refinement of assessment tools to accommodate a broader range of skill levels.
Future research endeavors could explore additional psychometric properties of the M-ABC2, such as internal consistency and concurrent validity.Longitudinal studies tracking motor development across multiple age points could provide valuable insights into the trajectory of motor skill acquisition and its relationship with other developmental domains.Additionally, efforts should be made to recruit larger and more diverse samples to enhance the generalizability of the findings and address potential psychometric limitations.

Conclusions
The standardized scores offer invaluable insights for clinicians and researchers engaged in pediatric assessments.Professionals can effectively pinpoint developmental delays or abnormalities by comprehending normative performance levels within specific age ranges, tailoring intervention strategies, and monitoring progress over time.Moreover, identifying the ceiling effects observed in certain items underscores the necessity for further refinement of assessment tools to accommodate a broader range of skill levels.While the sample characteristics align with key sociodemographic traits of the Italian population, the limited geographical distribution may pose challenges to the generalizability of the findings.Future studies should prioritize recruiting participants from diverse geographical regions to ensure broader representativeness and enhance the external validity of the results.
Expanding upon the current findings, future research endeavors could delve into additional psychometric properties of the assessment tool, such as internal consistency and concurrent validity.Additionally, longitudinal studies tracking motor development across multiple age points could yield valuable insights into the trajectory of motor skill acquisition and its interplay with other developmental domains.
In summary, the findings presented in this study significantly contribute to our understanding of motor development and emphasize the critical importance of employing reliable and valid assessment tools in both pediatric practice and research settings.

Table 1 .
Demographic characteristics of the included participants.
show the standard scores for each age group.