Reliability, Consistency and Temporal Stability of Alberta Infant Motor Scale in Serbian Infants

Our study aimed to analyze the reliability, consistency, and temporal stability of the Alberta Infant Motor Scale (AIMS) in Serbian infants. Additionally, we aimed to present a percentile distribution of AIMS in the tested population. The prospective study included 60 infants that were divided into three age groups: 0–3 months, 4–7 months, and 8–14 months. The Serbian version of AIMS was tested by two raters on two different occasions (test/retest) with a five day period between tests. The observed inter-rater reliability (intraclass correlation coefficient (ICC)) was more than 0.75 for all AIMS scores, except for standing (ICC 0.655 = moderate) in the age group of 4–7 months on retest between raters. The observed intra-rater reliability (ICC) was more than 0.75 for all AIMS scores except standing (ICC 0.655 = moderate) in the age group 4–7 months in test–retest for Rater One, and for sitting (ICC 0.671 = moderate) and standing (ICC 0.725 = moderate) in the age group between 0–3 months on test–retest for Rater Two. The Serbian version of AIMS was shown to have high consistency and high reliability with good to high temporal stability. Thus, it can be used in the evaluation of infants’ motor development in Serbia.


Introduction
The complexity of the pediatric population within the first year of life refers to the fact that motoric development is rapid and might be influenced by various degrees by environmental, biological, and social factors [1]. The necessity of proper and reliable estimation of motor development in the first year of life is stressed by an increase in the survival rate of more than 85% in very preterm infants [2]. Furthermore, motor, cognitive, and behavioral impairments were found in more than 50% in this group of infants [2].
In previous reports, it was stressed that numerous environmental as well as biological factors could be responsible for developmental delay and challenged motor skills in children [3][4][5]. This is of great importance since the processes that govern motor development might be influenced before and after birth.
Discrepancy between functional and chronological age could classify the degree of developmental delay as mild, moderate, and severe [6]. Therefore, there is a great need not only for detecting the presence of developmental delay, but also the degree for proper and optimal inclusion in the treatment.
Considering major standardized tests, two types of measures are used: norm-referenced and criterion-referenced [1,7]. Motor handicap identification in children could be identified by norm-referenced tests, while interventional programs and their effectiveness by criterion-referenced tests [7].
Alberta Infant Motor Scale (AIMS) is a norm-referenced, observational, and performance-based measure [8]. It has been shown to have high sensitivity and specificity in motor deficit detection [9]. Furthermore, AIMS is used in the evaluation of functional capacities, spontaneous movement activities, and quality of movement [8,9].
Our study aimed to analyze the reliability, consistency, and temporal stability of AIMS in Serbian infants. Additionally, we aimed to present the percentile distribution of AIMS in the tested population.

Study Group
The prospective study included 60 infants that were referred to University Children's Hospital (UCH) for the evaluation of motor development. Participants from the neonatology and pediatric rehabilitation departments of UCH participated in the study, and were recruited on the merit of being at risk for motoric delay by having one or more risk factors. The inclusion criteria were age between 0-14 months and being at risk for motor delay. The risk factors for the motor delay were birth weight below 1500 g, gestational age below 32 weeks, Apgar score lower than 7 at 1 and 5 min, central nervous system infection, intraventricular hemorrhage, and chronic lung disease [10,11]. Both genders were included. The participants were divided into three age groups: 0-3 months, 4-7 months, and 8-14 months [10].

Translation Process
We used the forward-backward method for translation of the AIMS from English into Serbian [12,13]. The translation process for AIMS was done according to the principles of a framework for translation and cultural adaptation [14]. Two board-certified physiatrists with clinical experience of five and more years were engaged in the translation process of the AIMS into Serbian, generating two versions. Under the supervision of the translator, after comparisons, these versions were merged into the final forward translation. The translator who participated in the translation was fluent in English and Serbian. Furthermore, back-translation into English (backward translation) was done by another independent translator who was also fluent in English and Serbian. After consensus had been obtained between the discrepancies of the forward and backward translations, the probationary merged AIMS version was tested on ten infants that were selected from the database of the medical records in the pediatric rehabilitation medicine UCH ambulatory setting. Two infants per day from the five days ambulatory database in one week were chosen by chance. Parents or legal guardians were then contacted and invited to participate. The response rate was 100%. Since AIMS is an observational scale, no feedback was received, and after the expert panel meeting, the final translated version of AIMS was introduced.
Two board-certified physiatrists (Rater 1 and Rater 2) tested the AIMS on the pediatric population. In Serbia, board-certified physiatrists are specialists of physical medicine and rehabilitation, who can independently perform the functional assessment of patients. For the pediatric population, they perform examinations that also include the estimation of motor development in infants. Both raters had five or more years of experience in pediatric rehabilitation practice and neurodevelopment assessment in children, and did not have prior experience in the AIMS scoring. Raters were informed about administration, testing procedures, rating criteria, and scoring of the AIMS. They attended four hours of practical assessment over two days for implementation and scoring of the AIMS. Both raters initially tested 16 children with the AIMS on two occasions with five days between the test and retest. Eligible participants were selected from the database of the medical records in the pediatric rehabilitation medicine UCH ambulatory setting. Four infants per day from the four days ambulatory database in one week were chosen by chance. Parents or legal guardians were then contacted and invited to participate. These children were not included in the study analysis. After initial testing, 60 infants were recruited and tested by the Serbian version of the AIMS twice (test-initial; retest-five days after initial testing). To avoid potential bias and ensure independent scoring, two raters were not allowed to discuss the tested findings.

Statistical Analysis
The obtained results were presented as whole numbers (N), percents (%), mean values (MV) and standard deviation (SD). Differences in mean values between raters on both occasions as well as between the test and retest for every rater were analyzed by the Mann-Whitney U test. Total AIMS scores were also presented as percentiles (10 th , 25 th , 50 th , 75 th , and 90 th percentile) for both raters on both occasions.
The consistency of the AIMS scores between the test and retest for both raters and between raters was done by Cronbach's alpha. A Cronbach's alpha above 0.9 was deemed as excellent, 0.8-0.9 good, 0.7-0.8 acceptable, 0.6-0.7 questionable, and 0.5-0.6 poor internal consistency [16].

Results
In Table 1, the characteristics of the tested participants are presented. There was excellent internal consistency between the test and retest for both raters (Cronbach's alpha Rater1 for group 0-3 months = 0.941; 4-7 months = 0.998, and 8-14 months = 0.998; Cronbach's alpha In Table 2, the mean values of the AIMS scores are presented for every subgroup and as the total score. There is no difference in the mean values between different raters in every age group both on initial examination (p > 0.05) and at retest (p > 0.05) ( Table 2). The highest scores were for the prone position in every age group, and lowest for the standing position. Inter-rater and intra-rater reliability of the AIMS scores are presented in Table 3. The observed inter-rater reliability (ICC) was more than 0.75 for all AIMS scores (range from 0.782 for standing task in the age group between 4-7 months on test to 0.995 for the total in age group between 4-7 months on test), except for standing (0.655 = moderate) in the age group between 4-7 months on retest between raters (Table 3). Differences in the mean values between raters both for the test and retest were measured by the Mann-Whitney U test, and were non-significant (p > 0.05). The observed intra-rater reliability (ICC) was more than 0.75 for all AIMS scores (range from 0.782 for the standing task in the age group between 4-7 months for Rater 2 to 0.997 for the standing task in the age group between 8-14 months for Rater 1 and Rater 2, and in the total score for the age group between 4-7 months for Rater 1), except for standing (0.655 = moderate) in the age group of 4-7 months in the test-retest for Rater 1, and for sitting (0.671 = moderate) and standing (0.725 = moderate) in the age group between 0-3 months on the test-retest for Rater 2 (Table 3). Differences in the mean values for each rater between the test and retest were measured by the Mann-Whitney U test and were non-significant (p > 0.05).
Inter-rater reliability of the AIMS total score on the test for all tested infants between Raters 1 and 2 was presented and evaluated by the Bland-Altman analysis (Figure 1). For Rater 1 and Rater 2, the mean difference was 0.15, the SD of the differences was 0.80, the lower limit was −1.42 (for −1.96SD), and the upper limit was 1.72 (for +1.96 SD). Table 3. Inter-rater and intra-rater reliability of the AIMS scores. Inter-rater reliability of the AIMS total score on the test for all tested infants between Raters 1 and 2 was presented and evaluated by the Bland-Altman analysis (Figure 1). For Rater 1 and Rater 2, the mean difference was 0.15, the SD of the differences was 0.80, the lower limit was −1.42 (for −1.96SD), and the upper limit was 1.72 (for +1.96 SD). In Figure 2, percentile distributions for each rater on both occasions in different age groups are presented. The most consistent percentile distributions for the total AIMS scores were for Rater 1 (test-retest) in the age group between 0-3 months and for Rater 2 (test-retest) in the age group between 8-14 months. Percentile distributions, especially for the 25 th and 90 th percentiles, did not show obvious temporal fluctuations in each age group. In Figure 2, percentile distributions for each rater on both occasions in different age groups are presented. The most consistent percentile distributions for the total AIMS scores were for Rater 1 (test-retest) in the age group between 0-3 months and for Rater 2 (test-retest) in the age group between 8-14 months. Percentile distributions, especially for the 25 th and 90 th percentiles, did not show obvious temporal fluctuations in each age group.

Discussion
In this study, we aimed to translate and analyze the reliability, consistency, and temporal stability of the Serbian version of the AIMS on the Serbian language for the Serbian pediatric population. The final translated version of AIMS followed the forward-backward method for translation to prevent the possibility of biased translation from the different cultures [18]. The possible roles of cultural differences in different populations, particularly for the AIMS scores, were elaborated in the study by De Kegel et al. [19], where it was noticed that sleep position, playing time in prone or supine positions, and a sitting device might be associated with lower motor scores in Flemish infants versus the Canadian reference group between 1990-1992. However, another study

Discussion
In this study, we aimed to translate and analyze the reliability, consistency, and temporal stability of the Serbian version of the AIMS on the Serbian language for the Serbian pediatric population. The final translated version of AIMS followed the forward-backward method for translation to prevent the possibility of biased translation from the different cultures [18]. The possible roles of cultural differences in different populations, particularly for the AIMS scores, were elaborated in the study by De Kegel et al. [19], where it was noticed that sleep position, playing time in prone or supine positions, and a sitting device might be associated with lower motor scores in Flemish infants versus the Canadian reference group between 1990-1992. However, another study on Greek infants showed a similar course of gross motor maturity during the first 18 months when compared to Canadian infants measured by the AIMS, with a few exceptions [20].
Our study demonstrated that the Serbian version of the AIMS had high internal consistency with Cronbach's alpha values ranging between 0.937-0.998, implying the high homogeneity among variables.
Considering inter-rater reliability, there was strong agreement between the two raters for the total scores for each age group of the Serbian version of AIMS (ICC 0.887-0.995), and thus good to high reliability. Furthermore, the mean values of the scores for each subgroup and in total for each age group did not significantly differ between raters. This points to the fact that the results of inter-rater reliability in this version of AIMS could be considered as correct [21].
For the evaluation of the temporal stability of the Serbian version of AIMS, intra-rater reliability for the total scores in each age group was good to high (ICC 0.860-0.997) with non-significant differences between mean values for each subgroup and total of each age group. Furthermore, percentile distributions, particularly the 25 th and 90 th percentiles, did not show obvious temporal fluctuations in each age group. However, in infants, it was previously identified that percentile fluctuations of motor abilities might not necessarily be indicative of motor dysfunction [22]. Moreover, Darrah et al. suggested that for normally developing infants, the motor development rate is not stable [22]. We have presented the lower percentile rank values at the 10 th percentile, even though in the study by Darrah et al., it was suggested that for children of eight months and above, the 5 th percentile should be considered as the optimal value [23]. However, the sensitivity value for a group of infants at eight months is higher for the 10 th percentile than for the 5 th [23]. Therefore, this Serbian version of AIMS was shown to have satisfactory temporal stability.
For intra-rater reliability, in the age group between 0-3 months of life, the sitting and standing values were lower for Rater 2 (ICC 0.671; ICC 0.725, respectively), while for those between 4-7 months, the standing values were lower for Rater 1 (ICC 0.655) and Rater 2 (ICC 0.782). The lower values might be explained by the difficult assessment of younger infants in situations with poor cooperation. Our findings are, to a certain degree, in line with previous reports, particularly for standing components in the early infant period [17].
Proper evaluation of the motor development of infants, particularly those who are at an increased risk for a delay, is of great importance for an adequate and onetime inclusion of effective treatment with proper follow-up. It was stated that AIMS could potentially be useful in the detection of motor deficits in an early stage for high-risk infants [17]. Effective decision making and strategies for the good management of children with motor delay is still insufficient.

Study Limitations
There are several limitations to this study. Even though it was conducted at the referral center "University Children's Hospital" with their highly skilled and educated personnel, the first limitation refers to the participation of one center (single-center), thus further studies for increased sensitivity are also advised in other centers in the country. Moreover, another limitation refers to the small number of participants that might affect the level of the study's statistical power. Furthermore, children were not randomly selected, and no other established motor infant tests such as the Bayley Motor Scale or Peabody Gross Motor Scale have concurrently been performed on infants at risk for motor delay to support the findings related to AIMS.

Conclusions
The Serbian version of AIMS was shown to have high consistency and high reliability with good to high temporal stability. Thus, it can be used in the evaluation of infant motor development in Serbia.
Author Contributions: I.P., D.N., M.L., and D.C., conceptualization, data curation, methodology, supervision, writing original draft; D.F., S.M., and Z.G., formal analysis, investigation; P.P., formal analysis, writing original draft. All authors have read and agreed to the published version of the manuscript.
Funding: This research received no external funding.