Efficiency and Validity of the AI-Based rGMFM-66 in Assessing Gross Motor Function in Children with Cerebral Palsy

Stefanie Steven; Karoline Spiess; Leonie Schafmeyer; Jonathan Buggisch; Eckhard Schoenau; Kerstin Luedtke; Ibrahim Duran

doi:10.3390/app15126527

,

and

¹

Center of Prevention and Rehabilitation, Medical Faculty and University Hospital, University of Cologne, 50931 Cologne, Germany

²

Department of Pediatrics, Medical Faculty and University Hospital, University of Cologne, 50937 Cologne, Germany

³

Institute of Health Sciences, Department of Physiotherapy, Pain and Exercise Research Luebeck (P.E.R.L.), Universität zu Lübeck, 23562 Lübeck, Germany

^*

Author to whom correspondence should be addressed.

Appl. Sci.2025, 15(12), 6527;https://doi.org/10.3390/app15126527

This article belongs to the Special Issue Artificial Intelligence for Pediatric Monitoring, Diagnosis, and Treatment

Version Notes

Order Reprints

Review Reports

Featured Application

This study demonstrates that the rGMFM-66 can be used as a reliable and time-saving alternative to the original GMFM-66 for assessing gross motor function in children with cerebral palsy. Its use in clinical settings may significantly reduce assessment time while maintaining scoring accuracy. Additionally, this study confirms the feasibility and validity of retrospectively applying the rGMFM-66.

Abstract

Assessing gross motor function in children with cerebral palsy (CP) is critical for monitoring progress and planning interventions. While the Gross Motor Function Measurement (GMFM-66) is an established tool, its application can be time-intensive. This study aimed to evaluate the efficiency and validity of a reduced version, the reduced Gross Motor Function Measurement (rGMFM-66). In a single-center study, 52 children with cerebral palsy were assessed using both the GMFM-66 and rGMFM-66 within a 1 to 2-day interval. The rGMFM-66 utilized an artificial intelligence algorithm to streamline the assessment process. Assessment duration and scores were compared, and agreement between the two measures was evaluated using intraclass correlation coefficients (ICC) and Bland–Altman analysis. The rGMFM-66 significantly reduced assessment time (mean 17.2 vs. 38.5 min; p < 0.001; effect size = 1.9). It showed excellent agreement with the GMFM-66 (ICC = 0.970; 95% CI: 0.942–0.983). Scores were slightly higher for the rGMFM-66 (mean 52.2 vs. 50.7; p = 0.008), which might be influenced by factors such as test familiarity or order of administration. In conclusion, the rGMFM-66 is a valid and time-efficient alternative for evaluating gross motor abilities in children with CP. Its application may support more efficient assessments in clinical settings without compromising accuracy. Further external validation in larger and more diverse populations is recommended to confirm these findings.

Keywords:

cerebral palsy; GMFM-66; rGMFM-66; artificial intelligence; motor assessment; time efficiency

1. Introduction

Cerebral Palsy (CP) is a neurological disorder characterized by sensorimotor movement issues, affecting posture and movement control [1]. It encompasses various motor impairments that result in permanent, yet modifiable, dysfunctions. With an incidence of 2–3 per 1000 live births, CP is one of the leading causes of physical disabilities in children globally [2].

Early therapeutic interventions during childhood development are crucial for minimizing impairments and preventing secondary complications [3]. Lifelong therapy is necessary to reduce the impact of these impairments and to improve quality of life. This therapy should be regularly evaluated and adjusted to align with individual therapeutic goals [4]. Therapy planning should be based on diagnoses and resources, considering feasibility and cost-effectiveness [5]. Regular motor assessments, such as the Gross Motor Function Measurement (GMFM), are vital for tracking progress and refining therapy strategies. The focus of targeted therapy is on promoting independence and social participation. In order to be able to adequately assess the effectiveness of the therapy, it is essential to discuss the individual therapy goals beforehand [4].

The GMFM-66 (GMFM-66) is recognized as the reference standard for assessing motor abilities in children with CP due to its high reliability and validity [6]. It is a shortened version of the GMFM-88 with 88 items, derived through Rasch analysis to reduce the number of items while preserving the core measurement properties and clinical interpretability. The GMFM-66 includes 66 items and is specifically developed and standardized for children with CP. The administration of this assessment typically takes 45 to 60 min [7]. During the assessment, children’s motor abilities are evaluated across five dimensions (lying and rolling; sitting; crawling and kneeling; standing; walking, running, and jumping), and a total score is calculated using specialized scoring software [6]. This score reflects the children’s motor abilities and can be used both to evaluate progress in motor function and to measure the success of therapeutic intervention [8]. In recent years, digital innovations such as tablet-based motion capture systems, wearable inertial sensors, and computer vision-based scoring techniques have been explored to further reduce clinician workload and enhance objectivity in motor assessments [9]. These technologies aim to complement traditional assessments like the GMFM-66 while preserving clinical relevance and ensuring efficient data collection in various settings.

The use of AI in healthcare, particularly for the treatment of children with developmental and neuro-motor conditions, is becoming increasingly common, with AI technologies supporting diagnostics and assessments [9]. The integration of AI-based systems into motor assessments reduces the workload for healthcare professionals and helps minimize the duration and intensity of testing procedures [10]. This is particularly beneficial for children with CP, who exhibit significantly greater fatigue during submaximal activities compared to typically developing peers. This increased fatigue has been attributed to a reduced ability to develop neurophysiological compensation mechanisms [11]. This approach not only enhances the precision of diagnostics but also contributes to more efficient resource allocation in the management of conditions like CP [12].

In this context, in order to increase the efficiency of the application of the GMFM-66, the reduced GMFM-66 (rGMFM-66) was developed. The rGMFM-66 utilizes AI technology to reduce the number of test items required. A previous study based on retrospective data demonstrated that the rGMFM-66 can predict the GMFM-66 very precisely in a study population of 365 children with CP (mean age: 8 years 9 months, SD: 3 years 10 months), with an intraclass correlation coefficient (ICC) of 0.997 (95% CI: 0.996–0.997). An average of only 34 test items was required to achieve these results. Since it was a retrospective analysis, the exact time savings could not be calculated with the available data [13].

This single-center, prospective validation study aims to evaluate the time efficiency of the rGMFM-66 and its agreement with the GMFM-66 in children with CP [13]. By comparing both assessment tools, the study seeks to determine whether the AI-based rGMFM-66 can provide a time-saving alternative without compromising the validity of motor function scores. The findings are intended to inform clinical workflows and support more efficient yet accurate therapy planning.

2. Materials and Methods

2.1. Study Design

The study was conducted at a single center and followed a prospective observational design. The data were collected from children with CP between February to November 2024. All children in the study population participated in the rehabilitation program called ‘Auf die Beine’ (‘on your feet’) at the Center of Prevention and Rehabilitation (University of Cologne, Germany). Participation in the program was funded by the German public health system.

The rehabilitation program ‘Auf die Beine’ includes therapy sessions using whole body vibration training (WBV), along with additional individualized physiotherapy. At the beginning of the program, all children are assessed for their motor abilities using the GMFM-66 to evaluate potential therapy outcomes. Data collection for the present study was conducted at the start of the rehabilitation program.

The inclusion criteria for this study were a diagnosis of CP [14], a written consent form provided by the patient’s caregivers, and an age between 3 to 18 years. At the start of their rehabilitation stay, children participating in the ‘Auf die Beine’ therapy program at UniReha GmbH Cologne underwent a GMFM-66 assessment conducted by rehabilitation professionals. The rGMFM-66 was administered 1 to 2 days later. This brief interval between tests was designed to provide sufficient recovery time while ensuring that any therapeutic interventions during the rehabilitation stay did not immediately influence the rGMFM-66 results. In addition to these objectives, the time required to complete both the GMFM-66 and rGMFM-66 assessments was recorded. This was done to facilitate a direct comparison of the time efficiency between the two testing protocols. Moreover, the study was partially blinded. The GMFM-66 assessments were conducted by multiple physiotherapists and sports scientists, all of whom received the same internal training to ensure consistency. These therapists were blinded to whether the individuals they were testing were study participants or not. In contrast, the rGMFM-66 was conducted by a single physiotherapist who also received the same internal training as those performing the GMFM-66 assessments. This blinding for the GMFM-66 was implemented to reduce potential bias and ensure that the results were not influenced by the therapists’ knowledge of the study. The physiotherapist who carried out the rGMFM-66 could not be blinded due to the method. Therefore, there was partial blinding. The assessor was aware of the general purpose of the study, including the evaluation of time efficiency and score agreement. A possible limitation could be that this physiotherapist may make an effort to perform the rGMFM-66 more quickly.

All assessments were conducted by therapists with a minimum of one year of professional experience in the treatment of children with cerebral palsy. Prior to participation in the study, all assessors underwent standardized internal training aligned with the official GMFM-66 manual. This training included joint assessment sessions, video-based practice tasks, and interrater reliability exercises to ensure consistency in scoring and standardized administration procedures.

In the literature, a median sample size of n = 42 is specified for studies on the estimation of intraclass correlation coefficients (ICC) [15]. In addition, a sample size estimation was carried out using Bonett’s formula in order to determine the ICC with a predefined accuracy [16]. This resulted in a required sample size of 61 participants (with α = 0.05, two raters, an expected ICC value of 0.99 based on preliminary investigations, and a desired confidence interval width of 0.01 for the 95% CI). Taking this preliminary information and practical considerations into account, it was decided to include a total of n = 50 children in the study.

2.2. Study Implementation

Both tests, the GMFM-66, and the rGMFM-66, were conducted under consistent conditions to ensure comparable testing environments. The assessments took place in similar rooms and at similar times of day, ensuring that external factors such as fatigue and exhaustion were minimized. All assessments were conducted at the Center for Prevention and Rehabilitation of the University of Cologne. Additionally, the testing process followed the same internal standardized manual, with therapists using identical verbal cues and instructions as outlined in the manual. This adherence to standardized procedures ensured that both assessments were conducted under equal conditions, minimizing variability and allowing for an accurate comparison of the two test protocols.

As a first step, the rGMFM-66 uses an AI-driven algorithm to classify children’s motor abilities into three groups (low, medium, and high) by analyzing 7 pre-defined test items. Depending on the group, the AI algorithm selects 15 to 33 relevant test items from the GMFM-66, ensuring that children with lower abilities are tested with fewer, focused items, while those with higher abilities complete a broader range. Within each group, the same predefined set of items is used for all children, rather than selecting individual items for each child. The rGMFM-66 generates a score directly comparable to the GMFM-66. The prediction of group membership (low, medium, high GMFM-66 score) based on the 7 pre-defined test items is performed using an artificial neural network, and the prediction of the rGMFM-66 is based on a support vector machine model. In summary, the complete data set of the previous publication describing the generation of the rGMFM-66 algorithm (1217 GMFM-66 assessments) was split into a training set (≈two-thirds) and a validation set [13,17]. The GMFM-66 score, based on all 66 items, was used to divide the training data into three groups (low, medium, and high GMFM-66 scores). A random forest (RF) model was applied within each group to assess the importance of individual GMFM-66 test items for predicting the overall score. Iteratively, the least important item was removed, and the model’s mean squared error (MSE) was recalculated, repeating this process until only 16 test items remained. This allowed us to identify the maximum number of items that could be excluded without significantly increasing the MSE, resulting in one reduced item set per group (low, medium, high). Since the correct group was unknown prior to assessment, the intersection of all three reduced sets (7 items in total) was used to predict the likely group using three machine learning models (Random Forest, Support vector machine, and feed-forward neural net). The selection of these 7 items was fully data-driven and based on their consistent predictive strength across all three subgroups. The items were not chosen manually but selected by an automated machine learning process using feature importance analysis from random forest models. The corresponding reduced item set could then be selected accordingly. This is only a brief description of the complex creation process of the rGMFM-66 algorithm, as it is not the subject of this study. However, a detailed description, including the exact tuning process with the optimal hyperparameters of the various prediction models tested, is given in the two publications mentioned above. Examples of the algorithmic item selection process and its clinical application are illustrated in the Supplementary Materials of Duran et al. (2022), which the present study builds upon [13,17]. The software for the rGMFM-66 can be obtained by requesting it from the corresponding author.

2.3. Statistical Analyses

The GMFM-66 was manually recorded, and these data were then entered into the Gross Motor Ability Estimator (version 2) scoring software for the GMFM to calculate the score [18]. In contrast, the rGMFM-66 was recorded directly in digital form. The software allowed for immediate calculation of the score upon completion of the test.

Both sets of results, including the time taken for each assessment, were documented in an Excel spreadsheet and stored for further data analysis. The statistical analysis was performed using RStudio Version 2024.04.2+764 in conjunction with R version 4.1.3 (R Foundation for Statistical Computing, Vienna, Austria). To compare test results for both assessments, intraclass correlation models (ICC (two-way model, agreement, and single measurement)) [19] and Bland-Altman plots were used.

The time used to complete both test procedures and the test scores were compared using Student T-tests after confirming that data were normally distributed (Shapiro-Wilk test). An alpha level of <0.05 was set as statistically significant. To evaluate the effect size Cohens d for dependent groups was used.

3. Results

The primary outcome of this study was the practicability of the rGMFM-66, assessed by comparing the time required for completion with that of the original GMFM-66. The secondary outcome was its criterion validity, measured by the agreement of rGMFM-66 scores with the full GMFM-66 scores.

3.1. Study Population

The characteristics of the study population are presented in Table 1. A total of 56 children were assessed. Three patients were excluded due to missing time measurements, and one patient was retrospectively excluded based on the study’s exclusion criteria. The majority of the children included in the study were diagnosed with spastic tetra- and diplegia (Table 1).

Table 1. Characteristics of the study population (n = 52).

Children categorized as having unilateral CP in this study exhibited spasticity (i.e., unilateral spastic CP). Terminology was harmonized according to the Surveillance of Cerebral Palsy in Europe (SCPE) classification system to ensure consistency [20].

3.2. Practicability of the rGMFM-66

The rGMFM66 significantly reduced the time required to assess participants’ gross motor function (p < 0.001), decreasing from 38.5 min to 17.2 min. With an effect size of 1.9 (Cohen’s d for paired samples), this demonstrates a large effect (Table 2 and Figure 1A).

Table 2. Results.

Figure 1. Time Efficiency (A) and Scores (B) of GMFM-66 vs. rGMFM-66. The difference of the mean time for rGMFM-66 was significantly (* p < 0.001) lower than for GMFM-66 (Subfigure (A)). The subfigure (B) shows the agreement between the two scores.

(a): Boxplot illustrating the application times for the GMFM-66 and rGMFM-66. The rGMFM-66 required significantly less time for administration compared to the GMFM-66 (p < 0.001).
(b): Scatter plot showing the correlation between GMFM-66 and rGMFM-66 scores. Each point represents one participant. The dashed line represents the line of equality, indicating strong agreement between both measurements.

3.3. Criterion Validity of the rGMFM-66 Compared to the GMFM-66

The mean score obtained with the rGMFM-66 (52.2 points ± 18.1) was slightly higher than the mean score of the GMFM-66 (50.7 points ± 17.4), with the difference being statistically significant (p = 0.008, Table 1, Figure 2). The agreement between the two methods was evaluated using the Intraclass Correlation Coefficient (ICC), which yielded a value of 0.970 (95% CI: 0.942–0.983, Table 1). The scatter plot between the GMFM-66 and rGMFM-66 scores is given in Figure 1B. The results of the Bland-Altman statistics are given in Table 2 and Figure 2.

Figure 2. Bland–Altman plot showing the agreement between the GMFM-66 and rGMFM-66 scores. The differences between the two measurements are plotted against their mean. The middle dashedline represents the mean difference, and the other dashed lines indicate the limits of agreement (mean ± 1.96 SD).

4. Discussion

The study results showed that the rGMFM-66 reduced the time required for the assessment by approximately 50% (Table 2, Figure 1A). The study population consisted of children and adolescents with all GMFCS levels (Table 1). The agreement analysis demonstrated an excellent correlation between the GMFM-66 and rGMFM-66 score with an ICC of 0.970 (95%KI 0.942–0.983, Table 2). The ICC was slightly lower (0.970 vs. 0.997) compared to a previously published report (REF). Furthermore, the Bland-Altman statistic in this current study showed a greater spread than in the retrospective analysis by Duran et al. (2022) ((−9.7; 6.5) vs. (−2.7; 2.9)) [13]. These differences are probably due to the fact that the current analysis involved two measurements of gross motor skills on two different days. For the retrospective analysis, the GMFM-66 and the rGMFM-66 were determined for the same measurement of gross motor skills. The current prospective study design therefore accounts, e.g., for diurnal fluctuations in gross motor function which are reflected in the evaluation of the agreement between the two scores

Interestingly, the rGMFM-66 resulted in a slightly higher average score than the GMFM-66 (52.2 vs. 50.7 points, Table 2), with small but potentially clinically significant changes according to Wang et al. (2006) being ≥1.5 points for small, ≥2.5 points for moderate, and ≥3.5–4.0 points for large changes [21]. This systematic deviation between the two scores was also not seen in the retrospective analysis of agreement. It can probably be explained by a learning effect, as the rGMFM-66 measurement always took place after the GMFM-66 measurement. This means that the children learned the test tasks during the rGMFM-66 measurement and were able to perform them somewhat better. The interval between the two measurements was 1 to max. 2 days, hence an effect induced by the therapy received during the rehabilitation program in which the study participants took part is unlikely. From a clinical perspective, this minor score increase should be interpreted cautiously, and future studies should examine whether this trend persists across different subgroups.

The rGMFM-66 seems to be a valid alternative to the GMFM-66, offering a significant reduction in assessment time, and making it more efficient for use in clinical and rehabilitative settings. These findings align with the growing body of literature supporting the implementation of AI-based assessments to enhance efficiency while maintaining validity. Such advancements are particularly valuable in clinical practice, where time constraints often limit the feasibility of comprehensive assessments. Beyond the time savings, the rGMFM-66 offers practical advantages that support its clinical applicability. The reduced number of test items simplifies administration and lowers the burden on clinicians, allowing them to focus more on therapeutic evaluation. The software requires only minimal training for experienced therapists and integrates seamlessly into existing clinical routines. These features may lead to cost savings by enabling more patients to be assessed within the same timeframe. Moreover, the shorter duration of the test may enhance acceptance among children and caregivers, particularly in outpatient or community settings where attention spans and time are limited. These practical advantages support the broader applicability of the rGMFM-66 in everyday practice [22].

At the same time, it is crucial to address the challenge of enabling clinicians to integrate new technologies into clinical practice in a safe, effective, and equitable manner to ultimately improve outcomes for children. Additionally, AI-driven technologies have the potential to democratize access to specialized diagnostic capabilities, improve the quality and efficiency of care, expand global access to healthcare, and enhance both the speed and quality of treatment [23].

A limitation of the study is the lack of a separate analysis of the time-saving effect at each GMFCS level due to the small dataset. Given the known variability in motor function across GMFCS classifications, a subgroup analysis would have provided valuable additional insights into the applicability of the rGMFM-66 across different functional profiles. Such analyses were not feasible in the present study due to the limited sample size but are strongly recommended for future research. Additionally, due to the study design, the determination of GMFM-66 and rGMFM-66 were two separate examinations on different days. This study design limits the evaluation of the agreement between GMFM-66 and rGMFM-66. However, in a previous retrospective analysis, the agreement between the two methods was already analyzed in a significantly larger study group [13]. Furthermore, individual fluctuations in muscle tone, particularly spasticity, and passive range of motion both of which may vary throughout the day could have influenced the children’s performance on specific GMFM-66 items. Although regular neuro-orthopedic assessments were part of the rehabilitation program, these clinical parameters were not systematically recorded for analysis. Another limitation of this study may be the potential influence of therapeutic interventions such as whole-body vibration (WBV), which was part of the standardized rehabilitation program. WBV has been associated with improvements in lower extremity function in children with cerebral palsy, potentially influencing GMFM-66 performance [24]. Although WBV was not performed immediately prior to GMFM assessments in this study, a residual training effect cannot be entirely ruled out and should be considered when interpreting the results [24,25]. However, the time interval between assessments was short (1–2 days), and no therapeutic sessions were scheduled immediately prior to the GMFM assessments. Future research should include such factors, as they may impact the reproducibility of motor performance assessments.

To address this limitation in future research, alternative study designs should be considered. While same-day testing may not be ethically feasible in pediatric rehabilitation due to fatigue and burden on children, future studies could implement a randomized or counterbalanced design in which the order of test administration is varied across participants. This would allow potential order or learning effects to be statistically controlled, thereby improving the robustness of agreement analyses.

Additionally, it is important to note that the majority of participants in this study were diagnosed with spastic cerebral palsy, primarily bilateral spastic forms. This reflects the clinical population typically enrolled in pediatric neurorehabilitation programs and reinforces the relevance of the rGMFM-66 for widespread clinical application in this subgroup [26]. Since the study sample consisted predominantly of children with spastic cerebral palsy, the generalizability of the findings to other CP subtypes such as dyskinetic or ataxic forms is limited. Although the spastic subtype represents the most common form, accounting for approximately 70–80% of all cases [14], future studies are needed to validate the applicability of the rGMFM-66 in non-spastic populations.

Furthermore, a potential learning effect may have influenced the results, as the children had already become familiar with the test items during the first assessment. This familiarity might have enabled them to perform the tasks more quickly during the second examination. However, given the considerable time savings observed in the rGMFM-66, it is unlikely that this learning effect was the primary factor responsible for the reduced duration. Moreover, as the physiotherapist conducting the rGMFM-66 was aware of the study’s purpose, full blinding was not feasible, which may have introduced a bias toward a shorter assessment duration. Although the rGMFM-66 is a standardized test, using a single clinician may limit variability in administration. Although conducting both assessments on the same day may not be ethically justifiable in pediatric rehabilitation due to the potential burden and fatigue for the children, future studies should, whenever feasible, implement a randomized or counterbalanced design in which the order of GMFM-66 and rGMFM-66 assessments is varied between participants. This would allow for the control of possible order or learning effects and thereby strengthen the robustness of agreement analyses [27]. In future studies, it could be beneficial to consider the potential impact of multiple clinicians on the results.

As partial blinding was implemented, the knowledge of the study objective by the assessor conducting the rGMFM-66 may have consciously or unconsciously influenced the test duration. A systematic review and meta-analysis by Pitre et al. (2023) found that outcome assessments without blinding are associated with a moderate risk of overestimating positive effects [28]. Future studies should therefore implement full blinding or introduce appropriate control mechanisms to minimize potential bias in administration time.

While the rGMFM-66 offers substantial time savings and supports efficient clinical workflows, it is essential to emphasize that AI-based assessments should not replace clinical judgment. The clinical expertise and contextual evaluation by therapists remain indispensable in interpreting motor function. AI tools should be regarded as supportive instruments that aid trained professionals in making nuanced decisions, without serving as a replacement for their expertise. Their implementation requires ethical oversight to ensure transparency, accountability, and the respectful use of patient data, especially when guiding therapeutic decisions or evaluating outcomes [9,29]. A detailed summary of the key characteristics and differences between the GMFM-66 and the rGMFM-66 is presented in Table 3.

Table 3. Comparison of the GMFM-66 and the rGMFM-66 regarding structure, administration time, validity, clinical applicability, technological support, and ethical considerations.

Although the sample size does not allow for statistically significant comparisons between the individual GMFCS levels, descriptive trends indicate differences in the extent of time savings achieved through the use of the rGMFM-66. Children with higher GMFCS levels (IV and V) tended to benefit particularly from the shortened test duration, as their motor impairments meant they were only able to complete a limited number of test items, which automatically led to a shorter completion time. Children with lower GMFCS levels (I and II), who were able to complete a wider range of items, showed smaller absolute time reductions but also benefited from the structured and focused design of the rGMFM-66 [30]. Future studies with larger samples stratified by GMFCS level are needed to systematically validate these observations.

Future directions should include stratified analyses by GMFCS level to more precisely evaluate the applicability and time efficiency of the rGMFM-66 across different functional profiles. While descriptive trends from this study indicate greater time savings in children with higher GMFCS levels, larger and more heterogeneous samples are required to systematically validate these observations [21,22]. Moreover, although AI-based assessments contribute to efficiency gains, their use must be accompanied by appropriate ethical oversight. Clinical expertise should remain central to decision-making, and the implementation of such tools must ensure transparency and adherence to data protection standards [23,29,31]. Future applications of AI-based assessments should ensure ethical oversight, particularly as such tools may indirectly influence how therapists and physicians are evaluated. While AI can support clinical decision-making, it must not replace professional judgment. Clear guidelines are needed to safeguard clinical autonomy, transparency, and accountability in the use of AI tools in pediatric rehabilitation [9]. Recent pediatric AI ethics frameworks, such as ACCEPT-AI and PEARL-AI, emphasize the importance of age-appropriate consent, equitable data representation, and transparent algorithm design in studies involving children [32,33]. The present study aligns with these principles by including children aged 3–18 years, obtaining informed consent/assent, and ensuring methodological transparency. To further meet ethical standards of fairness and inclusion, future research should aim to validate AI-assisted tools like the rGMFM-66 across a broader range of CP subtypes beyond the predominantly spastic forms.

5. Conclusions

The rGMFM-66 is a time-saving alternative to the GMFM-66, demonstrating a high level of agreement with the original method. These findings are consistent with studies advocating for the use of AI-based tools to enhance efficiency and maintain validity in clinical practice. The rGMFM-66 can enable the feasibility of standardized gross motor testing in children with CP in clinical situations that were previously not possible due to limited time resources, e.g., in outpatient pediatric neurological care. Further multicenter studies are recommended to validate the rGMFM-66 across diverse clinical environments and populations and to confirm its robustness and applicability in broader settings.

Author Contributions

Conceptualization, S.S. and I.D.; methodology, I.D.; software, I.D.; validation, S.S., J.B. and I.D.; formal analysis, S.S.; investigation, S.S.; data curation, S.S.; writing—original draft preparation, S.S.; writing—review and editing, K.S., L.S., J.B., E.S. and K.L.; visualization, I.D.; supervision, K.L. and I.D.; project administration, I.D.; funding acquisition, I.D. All authors have read and agreed to the published version of the manuscript.

Funding

This research was funded by CEFAM University Hospital Cologne, Science Budget for the 2024.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki, and approved by the Ethics Committee of the University of Cologne (protocol code 22-1380/22. January 2023) for studies involving humans. A detailed description of the registry can be found at www.germanctr.de (DRKS00030419, https://drks.de/search/en/trial/DRKS00030419, access date 1 June 2024).

Informed Consent Statement

Informed consent was obtained from all subjects involved in the study. Written informed consent has been obtained from the patient(s) to publish this paper.

Data Availability Statement

The data presented in this study are available on request from the corresponding author due to privacy and ethical restrictions.

Acknowledgments

We would like to express our gratitude to the participants who supported and made this study possible. Special thanks to our colleagues who facilitated the administration of the GMFM-66, and to all, who performed the calculation of the GMFM-66 score.

Conflicts of Interest

The authors declare no conflicts of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript; or in the decision to publish the results.

Abbreviations

The following abbreviations are used in this manuscript:

AI	Artificial Intelligence
CP	Cerebral Palsy
GMFM-66	Gross Motor Functional Measurement-66
ICC	Intraclass correlation coefficients
rGMFM-66	reduced Gross Motor Functional Measurement-66
WBV	Whole Body Vibration Training

References

Rosenbaum, P. Cerebral palsy: Is the concept still viable? Dev. Med. Child Neurol. 2017, 59, 564. [Google Scholar] [CrossRef] [PubMed]
Ackermann, W.; Stuhlfelder, U. Update Pädiatrie (2): Infantile Zerebralparese—Die häufigste Ursache für das Auftreten körperlicher Behinderungen im Kindesalter. VPT Mag. 2022, 8, 20–21. [Google Scholar] [CrossRef]
Blauw-Hospers, C.H.; Hadders-Algra, M. A systematic review of the effects of early intervention on motor development. Dev. Med. Child Neurol. 2005, 47, 421–432. [Google Scholar] [CrossRef] [PubMed]
Peterson, M.D. Reframing Cerebral Palsy as a Lifelong Physical Disability. N. Engl. J. Med. 2024, 391, 1668–1670. [Google Scholar] [CrossRef]
Strassburg, H.M. (Ed.) Behandlungskonzept bei Kindern mit infantiler Zerebralparese. In Leitlinien Kinder- und Jugendmedizin; Elsevier: Amsterdam, The Netherlands, 2015; pp. R5.1–R5.6. [Google Scholar] [CrossRef]
Russell, D.J.; Avery, L.M.; Rosenbaum, P.L.; Raina, P.S.; Walter, S.D.; Palisano, R.J. Improved Scaling of the Gross Motor Function Measure for Children with Cerebral Palsy: Evidence of Reliability and Validity. Phys. Ther. 2000, 80, 873–885. [Google Scholar] [CrossRef]
Gross Motor Function Measure-66|RehabMeasures Database. 2017. Available online: https://www.sralab.org/rehabilitation-measures/gross-motor-function-measure-66 (accessed on 13 January 2025).
Alotaibi, M.; Long, T.; Kennedy, E.; Bavishi, S. The efficacy of GMFM-88 and GMFM-66 to detect changes in gross motor function in children with cerebral palsy (CP): A literature review. Disabil. Rehabil. 2014, 36, 617–627. [Google Scholar] [CrossRef]
van der Veen, S.; van der Leeden, M.; Geleijn, E.; Vossen, P.; Meskers, C.G.M.; Widdershoven, G.A.M. Artificial intelligence to improve rehabilitation care for children with developmental conditions: Some ethical considerations. Dev. Med. Child Neurol. 2023, 65, 12–13. [Google Scholar] [CrossRef]
Ismail, L.; Materwala, H.; Karduck, A.P.; Adem, A. Requirements of Health Data Management Systems for Biomedical Care and Research: Scoping Review. J. Med. Internet Res. 2020, 22, e17508. [Google Scholar] [CrossRef]
Puce, L.; Pallecchi, I.; Chamari, K.; Marinelli, L.; Innocenti, T.; Pedrini, R.; Mori, L.; Trompetto, C. Systematic Review of Fatigue in Individuals with Cerebral Palsy. Front. Hum. Neurosci. 2021, 15, 598800. [Google Scholar] [CrossRef]
Shih, S.T.; Tonmukayakul, U.; Imms, C.; Reddihough, D.; Graham, H.K.; Cox, L.; Carter, R. Economic evaluation and cost of interventions for cerebral palsy: A systematic review. Dev. Med. Child Neurol. 2018, 60, 543–558. [Google Scholar] [CrossRef]
Duran, I.; Stark, C.; Saglam, A.; Semmelweis, A.; Lioba Wunram, H.; Spiess, K.; Schoenau, E. Artificial intelligence to improve efficiency of administration of gross motor function assessment in children with cerebral palsy. Dev. Med. Child Neurol. 2022, 64, 228–234. [Google Scholar] [CrossRef] [PubMed]
Rosenbaum, P.; Paneth, N.; Leviton, A.; Goldstein, M.; Bax, M.; Damiano, D.; Dan, B.; Jacobsson, B. A report: The definition and classification of cerebral palsy April 2006. Dev. Med. Child Neurol. 2007, 49, 8–14. [Google Scholar] [CrossRef]
Han, O.; Tan, H.W.; Julious, S.; Sutton, L.; Jacques, R.; Lee, E.; Lewis, J.; Walters, S. A descriptive study of samples sizes used in agreement studies published in the PubMed repository. BMC Med. Res. Methodol. 2022, 22, 242. [Google Scholar] [CrossRef] [PubMed]
Bonett, D.G. Sample size requirements for estimating intraclass correlations with desired precision. Stat. Med. 2002, 21, 1331–1335. [Google Scholar] [CrossRef]
Schafmeyer, L.; Losch, H.; Bossier, C.; Lanz, I.; Wunram, H.L.; Schoenau, E.; Duran, I. Using artificial intelligence-based technologies to detect clinically relevant changes of gross motor function in children with cerebral palsy. Dev. Med. Child Neurol. 2023, 66, 226–232. [Google Scholar] [CrossRef]
Berweck, S. Anhang: GMAE-Computerprogramm für die GMFM-66. In GMFM und GMFCS—Messung und Klassifikation Motorischer Funktionen; CD-ROM: The Gross Motor Function, Estimator; Russel, D.J., Rosenbaum, P.L., Avery, L.M., Lane, M., Eds.; Huber: Bern, Switzerland, 2006. [Google Scholar]
McGraw, K.O.; Wong, S.P. Forming inferences about some intraclass correlation coefficients. Psychol. Methods 1996, 1, 30–46. [Google Scholar] [CrossRef]
Surveillance of Cerebral Palsy in Europe. Surveillance of cerebral palsy in Europe: A collaboration of cerebral palsy surveys and registers. Surveillance of Cerebral Palsy in Europe (SCPE). Dev. Med. Child Neurol. 2000, 42, 816–824. [Google Scholar] [CrossRef]
Wang, H.-Y.; Yang, Y.H. Evaluating the Responsiveness of 2 Versions of the Gross Motor Function Measure for Children with Cerebral Palsy. Arch. Phys. Med. Rehabil. 2006, 87, 51–56. [Google Scholar] [CrossRef]
Hanna, S.; Russell, D.; Bartlett, D.; Kertoy, M.; Rosenbaum, P.; Wynn, K. Measurement Practices in Pediatric Rehabilitation. Phys. Occup. Ther. Pediatr. 2007, 27, 25–42. [Google Scholar] [CrossRef]
Kelly, C.J.; Brown, A.P.Y.; Taylor, J.A. Artificial Intelligence in Pediatrics. In Artificial Intelligence in Medicine; Lidströmer, N., Ashrafian, H., Eds.; Springer International Publishing: Cham, Switzerland, 2022; pp. 1029–1045. [Google Scholar] [CrossRef]
Cai, X.; Qian, G.; Cai, S.; Wang, F.; Da, Y.; Ossowski, Z. The effect of whole-body vibration on lower extremity function in children with cerebral palsy: A meta-analysis. PLoS ONE 2023, 18, e0282604. [Google Scholar] [CrossRef]
Adaikina, A.; Hofman, P.L.; Gusso, S. The effect of side-alternating vibration therapy on mobility and health outcomes in young children with mild to moderate cerebral palsy: Design and rationale for the randomized controlled study. BMC Pediatr. 2020, 20, 508. [Google Scholar] [CrossRef] [PubMed]
Sellier, E.; Platt, M.J.; Andersen, G.L.; Krägeloh-Mann, I.; De La Cruz, J.; Cans, C.; Surveillance of Cerebral Palsy Network. Decreasing prevalence in cerebral palsy: A multi-site European population-based study, 1980 to 2003. Dev. Med. Child Neurol. 2016, 58, 85–92. [Google Scholar] [CrossRef]
Schmitt, Y.S.; Hoffman, H.G.; Blough, D.K.; Patterson, D.R.; Jensen, M.P.; Soltani, M.; Carrougher, G.J.; Nakamura, D.; Sharar, S.R. A Randomized, Controlled Trial of Immersive Virtual Reality Analgesia during Physical Therapy for Pediatric Burn Injuries. Burns 2011, 37, 61–68. [Google Scholar] [CrossRef] [PubMed]
Pitre, T.; Kirsh, S.; Jassal, T.; Anderson, M.; Padoan, A.; Xiang, A.; Mah, J.; Zeraatkar, D. The impact of blinding on trial results: A systematic review and meta-analysis. Cochrane Evid. Synth. Methods 2023, 1, e12015. [Google Scholar] [CrossRef]
Adeniyi, A.; Adeniyi, S. Revolutionizing Healthcare: The Impact of Machine learning and Artificial intelligence. E-Health Telecommun. Syst. Netw. 2024, 13, 87–91. [Google Scholar] [CrossRef]
Avery, L.M.; Russell, D.J.; Rosenbaum, P.L. Criterion validity of the GMFM-66 item set and the GMFM-66 basal and ceiling approaches for estimating GMFM-66 scores. Dev. Med. Child Neurol. 2013, 55, 534–538. [Google Scholar] [CrossRef] [PubMed]
Weiner, E.B.; Dankwa-Mullan, I.; Nelson, W.A.; Hassanpour, S. Ethical challenges and evolving strategies in the integration of artificial intelligence into clinical practice. PLoS Digit. Health 2025, 4, e0000810. [Google Scholar] [CrossRef]
Muralidharan, V.; Burgart, A.; Daneshjou, R.; Rose, S. Recommendations for the use of pediatric data in artificial intelligence and machine learning ACCEPT-AI. npj Digit. Med. 2023, 6, 166. [Google Scholar] [CrossRef]
Chng, S.Y.; Tern, M.J.W.; Lee, Y.S.; Cheng, L.T.-E.; Kapur, J.; Eriksson, J.G.; Chong, Y.S.; Savulescu, J. Ethical considerations in AI for child health and recommendations for child-centered medical AI. npj Digit. Med. 2025, 8, 152. [Google Scholar] [CrossRef]

Figure 1. Time Efficiency (A) and Scores (B) of GMFM-66 vs. rGMFM-66. The difference of the mean time for rGMFM-66 was significantly (* p < 0.001) lower than for GMFM-66 (Subfigure (A)). The subfigure (B) shows the agreement between the two scores.

Figure 2. Bland–Altman plot showing the agreement between the GMFM-66 and rGMFM-66 scores. The differences between the two measurements are plotted against their mean. The middle dashedline represents the mean difference, and the other dashed lines indicate the limits of agreement (mean ± 1.96 SD).

Table 1. Characteristics of the study population (n = 52).

Variable	Value (Mean ± SD) or n (%)
Age, years:months	8:11 (±3:2)
Height, cm	126.0 (±19.6)
BMI kg/m²	15.3 (±2.7)
Females	14 (26.9%)
Males	38 (73.1%)
CP subtype
Bilateral spastic	37 (71.2%)
Unilateral spastic	7 (13.2%)
Dyskinetic	3 (5.8%)
Ataxic	2 (3.9%)
Mixed	3 (5.8%)
GMFCS level
Level I	4 (7.7%)
Level II	15 (28.8%)
Level III	11 (21.2%)
Level IV	16 (30.8%)
Level V	6 (11.5%)

The results are presented as mean (SD) or count (relative frequency) unless otherwise stated.

Table 2. Results.

	time saving
	GMFM-66	rGMFM-66	p-value	effect size
implementation time, min	38.5 (13.6)	17.2 (5.2)	<0.001	1.9
	score agreement
	GMFM-66	rGMFM-66	p-value	effect size
Score, points	50.7 (17.4)	52.2 (18.1)	0.008	−0.09
ICC	0.970 (95%KI 0.942; 0.983)
	Upper limit (97.5%)		Lower Limit (2.5%)
Bland-Altman statistics	6.5 (4.5; 8.5)		−9.7 (−11.6; −7.7)

ICC intraclass coefficient. Effect size was assessed using Cohen’s d for dependent groups. All data are given as mean (SD).

Table 3. Comparison of the GMFM-66 and the rGMFM-66 regarding structure, administration time, validity, clinical applicability, technological support, and ethical considerations.

Criterions	GMFM-66	rGMFM-66
Number of items	66 items (full test range)	Approximately 34 items on average (selected individually)
Administration time	Approx. 30–45 min	Approx. 15–25 min
Validity	High; scientifically validated	Strong agreement with GMFM-66; supported by validation studies
Target population	Children with CP	Children with CP
Standardization	Internationally standardized	Based on retrospective data; prospectively validated
Practicality in daily use	Limited due to time requirements	Increased practicality through reduced assessment time
Technological support	Manual item selection and scoring by clinicians; score calculated after manual entry into software	AI-assisted item selection; items reviewed by clinicians; assessment via tablet/laptop with automated scoring
Application context	Research and clinical practice	Primarily clinical practice with focus on efficiency and reduced burden for the child
Ethical considerations	Minimal, as a manual and established procedure	Requires consideration of data protection, transparency, and responsibility in AI use

Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).

Efficiency and Validity of the AI-Based rGMFM-66 in Assessing Gross Motor Function in Children with Cerebral Palsy

Featured Application

Abstract

1. Introduction

2. Materials and Methods

2.1. Study Design

2.2. Study Implementation

2.3. Statistical Analyses

3. Results

3.1. Study Population

3.2. Practicability of the rGMFM-66

3.3. Criterion Validity of the rGMFM-66 Compared to the GMFM-66

4. Discussion

5. Conclusions

Author Contributions

Funding

Institutional Review Board Statement

Informed Consent Statement

Data Availability Statement

Acknowledgments

Conflicts of Interest

Abbreviations

References

Article Metrics

Citations

Article Access Statistics