Criterion Validity and Applicability of Motor Screening Instruments in Children Aged 5–6 Years: A Systematic Review

The detection of motor developmental problems, especially developmental coordination disorder, at age 5–6 contributes to early interventions. Here, we summarize evidence on (1) criterion validity of screening instruments for motor developmental problems at age 5–6, and (2) their applicability. We systematically searched seven databases for studies assessing criterion validity of these screening instruments using the M-ABC as reference standard. We applied COSMIN criteria for systematic reviews of screening instruments to describe the correlation between the tests and the M-ABC. We extracted information on correlation coefficients or area under the receiver operating curve, sensitivity and specificity, and applicability in practice. We included eleven studies, assessing eight instruments: three performance-based tests (MAND, MOT 4–6, BFMT) and five questionnaires (DCD-Q, PQ, ASQ-3, MOQ-T-FI, M-ABC-2-C). The quality of seven studies was fair, one was good, and three were excellent. Seven studies reported low correlation coefficients or AUC (<0.70), four did not report these. Sensitivities ranged from 21–87% and specificities from 50–96%, with the MOT4–6 having the highest sensitivity and specificity. The DCD-Q, PQ, ASQ-3, MOQ-T-FI, and M-ABC-2-C scored highest on applicability. In conclusion, none of the instruments were sufficiently valid for motor screening at age 5–6. More research is needed on screening instruments of motor delay at age 5–6.


Introduction
Motor developmental problems in children have a rather high prevalence, with several underlying causes. One of the most prevalent causes is development coordination disorder (DCD), with prevalence ranging from 5% to 15% [1]. Other causes of motor developmental problems are cerebral palsy and neuromuscular diseases [2], autism, attention deficit hyperactive disorder, intellectual and learning disabilities, and anxiety disorders [3]. Finally, an increasingly common cause of a motor developmental problem is a lack of opportunities to learn or practice motor skills in the home or school situation due to restrictive environmental factors [4].
Motor developmental problems can have detrimental consequences and its early detection and treatment can counteract these consequences. Detrimental consequences may be academic, emotional, and behavioral problems, such as anxiety, depression, low self-perception, and low self-perceived motor competence [5]. Motor developmental problems can also lead to an inactive lifestyle, thereby decreasing the level of physical fitness and increasing the risk of overweight [3,[5][6][7]. These problems may in turn interfere with participation in play and sport, leading to further deterioration of motor skills. The prognosis of motor problems in children is rather poor [8][9][10], but timely interventions have been shown to improve motor performance and limit adverse consequences [11][12][13]. Early detection and treatment are thus warranted.
The early assessment of motor developmental problems is particularly relevant at the age of 5-6 years. It is the youngest age at which DCD can be diagnosed reliably based on its diagnostic criteria [1]. Moreover, at this age, motor developmental problems may become more urgent because these skills are necessary for participation in motor activities, such as sports. From this age onward, a positive school environment can provide opportunities to stimulate children with a motor developmental problem to prevent further deterioration. This is relevant for both the children with a motor developmental problem due to a biological or physical cause, and the children who have fallen behind because they have been understimulated at home. Cumulative effects may arise especially after this age, because of inducing a self-reinforcing cycle of understimulation and lower involvement in physical activities that increases the deviation from normal motor development at later ages [14].
Early detection of motor developmental problems requires a screening instrument with sufficient sensitivity and specificity, as well as practical applicability, that is validated by comparison to a standard diagnostic instrument. These standard instruments are usually motor performance-based tests, such as the Movement Assessment Battery for Children (first or second version, M-ABC(-2)) or the Bruininks-Oseretsky Test of Motor Proficiency-2 (BOT-2). Internationally, the M-ABC(-2) is the most commonly used performance-based diagnostic test in both clinical and research settings. Moreover, the M-ABC(-2) has been studied more extensively than the BOT-2 [3,[15][16][17][18]. However, both tests are too timeconsuming for routine early screening in the general population. Screening tests should have an adequate criterion validity (i.e., the degree to which the scores of screening instruments are an adequate reflection of a reference standard [19]) and be easily applicable in routine community-based practice, i.e., administration time and costs for materials and required training of professionals should be low. Unfortunately, a summary for the evidence on the criterion validity of screening instruments for motor developmental problems in children aged 5-6 years is lacking. Therefore, we conducted a systematic review to summarize (1) the evidence on the criterion validity of screening instruments for motor developmental problems in 5-6-year-olds with the M-ABC(-2) as reference, and (2) the applicability of these instruments in community-based settings.

Materials and Methods
To summarize the evidence on criterion validity, we followed the protocol for systematic reviews of measurement properties from Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN), [19] as described below. The review was registered in PROSPERO (ID 302069).

Key Elements of the Research Question
Our research question had four key elements. First, the construct of interest, i.e., motor developmental problems. Second, the population of interest, i.e., children aged 5-6 years from a community-based population. Third, the type of measurement instrument of interest, i.e., all possible screening instruments to assess motor development, be it questionnaires or performance-based tests. Fourth, the measurement properties on which the review focuses, i.e., criterion validity and applicability.

Search Strategy
The key elements of our research question were the basis of our search strategy. We systematically searched all published literature up to June 2021, using the following databases: Embase, MEDLINE (Ovid), Web of Science, PsycINFO-Ovid, CINAHL EBSCO, Cochrane, and Google Scholar. We used Endnote X9 (Clarivate, Philadelphia, PA, USA) for reference management.

Eligibility Criteria
Studies were included if they met the following criteria: (1) the screening instrument was used to measure both gross and fine motor development; (2) the instrument was cross-sectionally compared to the M-ABC or M-ABC-2 test as reference standard; (3) the study reported or enabled us to determine (a) correlation coefficient or AUC, and (b) sensitivity and specificity; (4) the results were reported for children born in high-income countries (according to the definition by Statistics Netherlands [20]) aged 5-6 years and from a community-based population; and (5) the studies were published in English or Dutch. We excluded studies described in textbooks, studies with indirect evidence of measurement properties (such as randomized controlled trials), conference abstracts, and unpublished dissertations.

Data Extraction
For data extraction, we used the form for validity studies as proposed by the COS-MIN group, which includes descriptive characteristics of the study population, methods, and reported statistical findings of all investigated measurement properties [21]. We extracted information on two aspects of criterion validity, using the criteria proposed by Terwee et al. [22] and modified by Prinsen et al. [23]. The first aspect regards information on correlation coefficients or AUC. The second aspect concerns information on sensitivity and specificity, which is relevant for being able to study the effects on population level of screening tests [19,22,23]. For the data synthesis, we used the system originally developed as guideline for systematic reviews of trials in the Cochrane Collaboration Back Review Group [24] and adapted by Terwee et al. for use in systematic reviews of measurement properties [19]. Three authors (J.d.B., M.H., N.v.D.) independently applied the criteria for the findings reported in each study and for the data synthesis, as described under data analysis and reporting. In case of disagreement, discussion with the last author (M.d.K.) followed until consensus was obtained. To assess the applicability of the screening instruments, we obtained information on the screening instrument regarding (1) the number of items, (2) administration time, and (3) costs (i.e., demanded training for professional and material). This information was derived from the identified studies and other sources (i.e., manuals) if needed.

Quality Assessment of Studies
We assessed the quality of each screening instrument, using the COSMIN checklist developed by Terwee et al. [25]. This checklist includes items related to design, methods, and reporting. Each item can be rated with a 4-point scale from poor to excellent [25,26]. In Table 1, we show this checklist as used in our study. Three authors (J.d.B., M.H., N.v.D.) independently applied this checklist. In case of disagreement, discussion with the last author (M.d.K.) followed until consensus was reached.

Data Analysis and Reporting
First, we described the study flow and characteristics of the included studies. Second, we reported their methodological quality. Third, we summarized the criterion validity and applicability of all studies and addressed their overall suitability. We considered an instrument sufficiently suitable for screening purposes if the methodological quality of included studies was strong (Table 2) and the criterion validity was high (i.e., a correlation coefficient of the test with the criterion >0.70 as well as appropriate sensitivity and specificity for screening [19]). According to the norms of the American Psychological Association, a sensitivity of at least 80% and a specificity of at least 90% is preferable [27]. Table 2. Data synthesis based on (a) the methodological quality of the study and (b) the statistical evidence on the concurrent validity for measurement instruments, according to the COSMIN criteria [25,26].

Methodological Quality of Studies on One Instrument Rating Criteria
Strong +++ or − − − Only studies of poor methodological quality

Criterion Validity, Applicability, and Overall Suitability
In the following sections, we report on the key elements regarding criterion validity (in Table 4) and applicability (in Table 5) of included performance-based tests and questionnaires aimed at screening for motor developmental problems. Review / not original (n = 7) Full text not available (n = 1) Not purpose screening of gross and/or fine motor development (n = 7) No correlation/AUC and sensitivity/specificity (n = 12) No reference standard or not the Movement-ABC(-2) (n = 38) No Western population, community-based, aged 5-6 (n = 20) No specific results reported of community-based population (n = 7) Figure 1. Flow of studies.

Criterion Validity
In Table 4, we present that the criterion validity was poor for seven out of the eleven included studies because the correlation with the reference standard was smaller than 0.70 [28,30,33,34,[36][37][38]. For the other four studies, the statistical findings were indeterminate [29,31,32,35]. In six out of eleven studies, the AUC was reported, ranging from 0.59 for the DCD-Q [29] to 0.91 for the SkSc-8 [28]. In Table 4, we also present that the sensitivities and specificities of the screening instruments varied widely: sensitivities ranged from 21% for the DCD-Q [31] and ASQ-3 [38] to 87% for the MOT4-6 [34] and specificities ranged from 50% for the MOQ-T-FI [37] to 96% for the ASQ-3 [38] at the applied cut-off points. Only the MOT4-6 met the American Psychology Association's requirements of diagnostic accuracy based on the reported study results, with a sensitivity of 87% and a specificity of 90%, but had a correlation of <0.70 [34].

Applicability
Regarding applicability, questionnaires scored better than performance-based tests ( Table 5). The screening instruments largely differed with respect to number of items, administration time, and costs (i.e., demanded training for professionals and material). The number of items of the screening instruments varied from six (PQ) to thirty (ASQ-3 (with 12 questions about motor function). Administration time varied from about 10 min for the M-ABC-2-C and BFMT to about 25 min for the MAND. Noteworthy is that the DCD-Q and M-ABC-2-C provide information about the child's participation in daily life, academics, and sport, while the other screening instruments do not. Regarding material costs, the MAND, MOT4-6, ASQ-3, M-ABC-2-C, and the BFMT had to be purchased, while the DCD-Q and the MOQ-T-FI were freely available (online). The PQ was used in the Danish National Birth Cohort Study and was freely available upon request from the authors. Test-retest reliability n = 45. Criterion age 5-7 years n = 19 (9 boys and 10 girls).
Parents filled in (1)         Cut-off 15th percentile of reference population 79%; 78% Cut off 15th percentile of study population 78%; 81% * Correlation: a positive or high (+) rating means that (1) the reported correlation coefficient with the reference standard is high (≥0.70) and (2) that there are convincing arguments that the reference standard is indeed a true reference standard (which is the case for the M-ABC(-2)). A negative (-) rating means that the reported correlation coefficient with the reference standard is low (<0.70) and/or the reference standard cannot be considered as gold. An indeterminate (?) rating means that there was no reported correlation with, or AUC related to, the reference standard.

Discussion
To our knowledge, our study offers the first summary of the evidence regarding criterion validity and applicability in community-based settings of screening instruments for motor developmental problems in children aged 5-6 years. We identified eleven relevant studies, in which nine different screening instruments were investigated that met our inclusion criteria. We found the validity of the identified instruments to be insufficient, being either poor [28,30,33,34,[36][37][38] or indeterminate [29,31,32,35], with widely varying sensitivities and specificities. With respect to applicability, we found a large variation between the screening instruments. The overall quality of studies varied considerably from fair to excellent. From the data synthesis on the methodological quality and the statistical findings of the studies, ewe determined that none of the screening instruments have proven to be suitable for screening purposes.
We found criterion validity to be poor; none of the studies performed well. This may be partly due to the different balances between the subitems, representing different motor skills (e.g., fine versus gross motor skills) of the included screening instruments and the reference standard. For example, in the MAND, half (5 out of 10) of the tests are related to fine motor skills [33], whereas in the MOQ-T-FI, this only applies to 22% (4 out of 18) [36]. As in motor impairment, the deficits may vary between domains; a balance between the subitems in a screening instrument that differs from the reference standard may influence validity outcomes [39]. These findings underpin the conclusion of Fransen et al. [40], that different motor tests should be used depending on the specific aims, especially whether the focus should be either on gross or on fine motor skills. Differences in the share of fine and gross motor test items between tests and the M-ABC(-2) and differences in motor construct between the tests may account for the poor findings regarding criterion validity [28].
We found that the sensitivity and specificity at predefined current cut-off points varied widely, with the most favorable results for the performance-based tests. The better sensitivities and specificities of performance-based tests may be due to performance giving a more objective impression of the children's skills than parent-reports on questionnaires, as has also been reported for language development in children [41]. Another explanation may be that no optimal cut-off points of the instruments have been chosen, as the reported AUCs of five instruments varied much less (i.e., from 0.59 to 0.67) than the sensitivities and specificities of these instruments (i.e., from 21% to 67%, and from 54% to 93%, respectively). For the BMFT, the optimal cut-offs were determined to obtain optimal sensitivity and specificity, rather than pre-set [28]. An optimization of the cut-offs of the various motor screening instruments for use in community-based settings should therefore be considered.
Overall, the MOT4-6 [33] and the BFMT [28] showed the most favorable measurement properties, i.e., rather high correlations and favorable balances between sensitivity and specificity (after optimizing the cut-offs of the BFMT), almost meeting the levels required for screening [27]. However, the quality of the only study on the MOT4-6 instrument was low (e.g., a very small sample size, n = 48), implying that the study results on the MOT4-6 must be considered with caution [34]. On the other hand, it should be noted that the defined criteria for validity, i.e., high specificity and corresponding sensitivity as proposed by the American Psychology Association [27] and high AUC or correlations of >0.70, may be too strict when assessing screening instruments for preventive child healthcare. Unlike population-based screening for diseases such as cancer, screening for motor developmental problems may be an ongoing process, with professionals following up on abnormal test scores with an extra consultation. Multiple consultations with repeated screening tests may enhance validity.
In summary, regarding sensitivity and specificity, the included performance-based tests perform better than the included questionnaires. Performance-based tests may provide a more objective view of motor developmental problems. Nonetheless, the evidence regarding screening with performance-based tests is also limited, so high-quality research is needed on these tests, especially the MOT4-6 and the BFMT, to further validate the results in population-based screening, and, additionally, to assess the added value of repeated measurements in clinical practice.
With respect to applicability, we found a large variation in relevant aspects of the screening instruments, such as administration time and costs (i.e., demanded training for professional and material). Performance-based tests generally scored worse on all aspects of applicability than questionnaires. This is largely due to performance-based tests requiring more administration time by a (trained) professional and test kits being more expensive than questionnaires. Performance-based motor screening tests may thus perform better than questionnaires, but at higher costs. However, one could also argue that performance-based tests have the disadvantage that they often only provide information at one time point, whereas parents or school teachers can monitor the child continuously.
Regarding the quality of studies, this was sufficient to high for six studies (BFMT [28], DCD-Q [30], ASQ-3 [38], M-ABC-2-C [36]). For these studies, results can be interpreted with a fair amount of certainty. For the other studies, the quality was low mostly due to lack of clarity in reporting or small methodological issues; results of these studies should be interpreted with caution. Future research on motor screening instruments should incorporate high-quality standards for all motor screening instruments on developmental problems. Future research may also incorporate populations from developing countries, as these underserved populations may have different economic, social, and educational conditions affecting a child's development.

Strengths and Limitations
Important strengths of this systematic review are that we systematically searched in a broad range of databases, that we used the COSMIN checklist to assess the methodological quality to study criterion validity, and that we used predefined criteria for rating the statistical findings. We further included both performance-based tests and questionnaires, enabling care-providers to make a well-considered choice between choosing one of these types of tests. A limitation of our study might be that we only searched for studies published in English and Dutch. We may have missed additional studies in other languages that met our other inclusion criteria, though these are likely rare. Finally, the COSMIN criteria have been suggested to be somewhat too strict, at the detriment of some medium-level evidence.

Conclusions
We conclude that the included studies provide insufficient evidence that the screening instruments are sufficiently valid as screening instruments for motor developmental problems in children aged 5-6 years. We therefore need better quality studies that may indicate needs for better quality screening instruments to timely identify motor developmental problems or the application of repeated measurements to realize sufficient sensitivity and specificity. Given the urgency of identification of motor developmental problems at age 5-6, especially DCD, we advise continued screening using the current best options.

Data Availability Statement:
No new data were created or analyzed in this study. Data sharing is not applicable to this article.