Reliability, Validity, and Responsiveness of the Chinese Learning Accomplishment Profile (C-LAP)

Objectives: To evaluate the reliability, validity, and responsiveness of the Chinese Learning Accomplishment Profile in China. Methods: 12,098 participants aged from 0 to 36 months from 30 provinces (mostly from Shanghai) in China were enrolled between 2013 and 2020. The reliability was reflected by Pearson correlation coefficients, Cronbach’s alpha coefficients and standard errors; the validity was shown by the coefficients between the dimensions, and we also evaluated the responsiveness as a supplement to the validity. Results: Reliability: in six domains among each subgroup, Pearson correlation coefficients between developmental age and chronological age ranged from 0.89 to 0.98, Cronbach’s alpha coefficients from 0.71 to 0.99, and standard errors from 0.15 to 2.76. Validity: after controlling for chronological age, the correlation coefficients between the dimensions were between 0.18 and 0.78, and most of them were below 0.70. Responsiveness: developmental age of all domains obtained via the Chinese Learning Accomplishment Profile system changed significantly (p < 0.001) with time (gap of 1–3 months), and the standardized response mean ranged from 0.66 to 2.45. Conclusions: The Chinese Learning Accomplishment Profile is suitable for assessing children’s development in Shanghai, but still needs confirmation when used in other provinces in China due to the great differences between regions in China.


Introduction
Children's learning performance is highly associated with early neuropsychology development, especially when children are aged 0 to 3 years old [1]. In recent decades, many scales have emerged regarding children's neuropsychological development, which can be divided into two types: normative assessment and criterion-referenced assessment. The former mainly evaluates the performance of a child by comparing it with that of normal children of his chronological age [2]. Despite obtaining the relative developmental level of the child, it is difficult for this type of assessment to provide teachers or parents with specific parenting guidance [3]. The tools of the criterion-referenced assessment provide a detailed sequential order of developmental skills in any one area of development [4,5], in which the items are skills, arranged from the easiest to the most complex. Only when the previous items are passed will the subsequent items be tested. These items, sorted by difficulty, can show in specific skills that children lack in different dimensions in more detail.
The Early Learning Accomplishment Profile (E-LAP), developed in 1969, is a classic criterion-referenced assessment system, the items of which were drawn from a number of well-known normative assessment tools [3]. After incorporating these items in sequential order, the E-LAP offers distinct advantages over other, more normative assessment devices, which were found to be reliable and valid [6]. The E-LAP examines six domains of child development including Gross Motor, Fine Motor, Cognitive, Language, Self-Help, and Social Emotional. Each domain can be used to assess children's developmental age in different aspects. Although the E-LAP was developed to only be used in conjunction with other instruments to assess child development, and cannot be used as a diagnostic tool [6], it has been used separately in many studies to evaluate children aged from 0 to 36 months [7][8][9].
To better assist Chinese pediatricians with evaluating children's development, we translated the E-LAP into a Chinese edition (Chinese Learning Accomplishment Profile (C-LAP)) with reference to the ITC guidelines for translating and adapting tests (ITC, 2017) [10].
The translation and adaptation processes considered linguistic, psychological, and cultural differences between China and USA through the choice of experts with relevant expertise. We selected samples of sufficient size and relevance for the empirical analyses, with relevant characteristics for the intended use of the test.
In this paper, the reliability and validity, and responsiveness were estimated to provide relevant statistical evidence of the adapted version of the assessment system in China.

Participants
A prospective study was conducted among 12,098 participants aged from 0 to 36 months from 30 provinces (mostly are from Shanghai) in China, between 2013 and 2020 (participants with abnormal results in the routine physical examination or diagnosed development disorders or diseases including cerebral palsy (CP), autism spectrum disorder (ASD), Down Syndrome (DS), etc., were excluded). Each child that participated in the study performed the C-LAP test to assess the developmental age, which refers to the extent of the child's development at a certain age. Other information, such as parents' educational level, age, and child's chronological age, were obtained by standardized questionnaires for further study.
Written informed consent to participate was obtained from the parent or legal guardian of the child. This study was approved by the Institutional Review Board of the School of Public Health, Fudan University (IRB00002408, FWA00002399; approval number IRB#2019-04-0741).

Statistical Analysis
Due to the long period of data collection, the data in the research were separated into two groups, including data collected from 2013 to 2015, and data collected from 2016 to 2019, for analysis. Since most participants were from Shanghai, they were also divided into two groups, based on whether they were from Shanghai, or not for analysis. R 3.6.2 was adopted for statistical analysis [11]. All properties were two-sided, while only p < 0.05 was statistically significant.

Reliability
Reliability refers to the test's consistency in replications [12]. Reliability can be evaluated with several methods, including the test-retest method, alternative-form method, split-half method, and inter-item consistency (internal consistency) method. In our research, we measured the correlations between chronological age and developmental age with Pearson correlation coefficients (r) and estimated Cronbach's alpha, which infers the internal consistency. Internal consistency reflects the coherence of items in each domain. If the test's Cronbach's alpha is high enough, internal consistency would be excellent, which presents good reliability for this study. We also evaluated the standard error of measurement (SEM) of the C-LAP system. SEM is calculated by the following equation [13]: where: SEM = Standard error of measurement S = Standard deviation of the test results A = Cronbach's alpha SEM is negatively associated with the test's reliability. If the test were completely reliable, the alpha coefficients would be 1, making SEM equal to 0. However, if the test were completely unreliable, with alpha coefficients of 0, SEM would be equal to the standard deviation of the test results.

Validity
Validity refers to the extent of the association between the test's result and true characteristics, which is also considered to be the accuracy of the test [14]. Construct validity was obtained to represent the C-LAP's validity. Pearson's r coefficients were calculated between different domains to examine the relationship between the domains. With age considered to be an important confounder, we calculated the coefficients without and with age-adjustments via Pearson zero-order correlations and partial correlations.

Responsiveness
Responsiveness refers to the test's ability to measure the clinical change [15], which is an aspect of validity. We selected participants who had completed the C-LAP test twice. These subjects were divided into three groups according to the time gap between the two tests for analysis. Paired t-tests were used to observe whether the subjects showed any changes between the two tests. We calculated standardized response mean (SRM) for the tests to show the responsiveness. SRM is calculated by dividing the mean change score by the standard deviation of change in scores [16]. The test with an SRM of 0.5 to 0.8 is considered to be moderately responsive, while tests with an SRM of 0.8 or larger are markedly responsive according to Cohen [17].
All methods were carried out in the accordance with the relevant guidelines and regulations in the manuscript.

Demographic Characteristics
The basic information gathered from the 12,098 participants is presented in Table 1 based on the different age groups. Among all the participants, 7063 (58.4%) completed the test between 2013 and 2015, and most of them were from Shanghai, comprising 91.8% of the total. Children aged 13 months or older were the largest proportion, but the number of children aged from 1 to 5 months or 6 to 12 months was still large enough for analysis. The gender ratio was 1.19 males per female. Characteristics of the samples used for examining SRM for children's developmental age obtained from two tests with a time gap of 1-3 months are shown in Tables S1 and S2.

Reliability
Pearson correlation results, which are detailed in Tables 2 and 3, showed that developmental age and chronological age were highly associated no matter where the subjects were from or when they came for the tests (r ranged from 0.89 to 0.98). Children aged from 1 to 5 months cannot be assessed in the Self-Help domain. As detailed in Table 4, the Cronbach's alpha coefficients are quite high (0.71 to 0.99) in six domains among all the subgroups. As presented in Table 5, standard errors of measurement are acceptable (0.15 to 2.76). However, the SEM for older children, such as the group aged 25 to 36 months is larger than that for younger children, such as the group aged from 13 to 24 months, and from 6 to 12 months versus 1 to 5 months.

Validity
Tables 6 and 7, respectively, show the matrix of zero-order correlations and partial correlations in different provinces from 2013 to 2015, while those from 2016 to 2019 are presented in Tables 8 and 9. The correlation of evaluation results in various areas can effectively reflect whether one dimension can be distinguished from the others. Before the correction of chronological age, the correlation coefficients between developmental age in different areas for children in the two periods were high (0.85-0.97). After controlling for chronological age, the correlation coefficient between the dimensions decreased markedly, ranging from 0.18 to 0.78. Among these partial correlation coefficients, the ones between fine movement and cognition, language and cognition were notably higher, reaching 0.62 and 0.63, respectively, in Shanghai, and 0.78 and 0.72 in other provinces.    Partial correlation coefficients are presented above the diagonal, while zero-order correlation coefficients are presented below the diagonal.

Responsiveness
For children who completed the two tests with a time gap of 3 months (Table 10), paired t-test results showed that developmental age, obtained via the C-LAP system, changed significantly (p < 0.001) over time. SRM (0.74 to 2.45) also showed that the C-LAP system had excellent responsiveness to time changes, which reflects the real developmental improvement of children regardless of domain or age group. SRM in the younger age group of 1~12 months is larger than that in older age groups, indicating greater responsiveness. For children who completed two tests with a time gap of 1 month or 2 months (Table 11), paired t-test results also showed that developmental age changed significantly (p < 0.001) between these two tests. SRM coefficients of the tests within 1 month were smaller than that within 2 months, illustrating the better responsiveness of the C-LAP system to a 2-month time change compared to a 1-month time change. SRM indicated a moderate responsiveness for the domains Fine Motor, Language, and Self-Help in the C-LAP system to tests with a time gap of 1 month. Excellent responsiveness was presented by the SRM of the other domains of the C-LAP system to a 1-month time change or all the domains to a 2-month time change.

Discussion
Previous studies have shown that E-LAP system has a high level of raters' reliability, internal consistency, and convergent validity [8,18,19]. The retest consistency was also reported as excellent, ranging from 0.93 to 0.998. The E-LAP was indicated to be reliable and valid for assessing children's development. For the C-LAP, this study found that, in the two time periods of 2013-2015 and 2016-2019, the correlation coefficients of chronological and measured developmental age of children in Shanghai and other provinces were between 0.89 and 0.97, indicating that the developmental age obtained via the C-LAP test could be reliably associated with the chronological age for children. Compared to the studies evaluating the reliability and validity of the E-LAP, this study demonstrated that the C-LAP had a relatively higher Cronbach's alpha of 0.71 to 0.99, with standard errors of measurement lower than 3. These data indicated high internal consistency of the C-LAP. We found that SEM coefficients in older age groups (e.g., 25-36 months) are relatively higher than those in younger age group (e.g., 13-24 months), while the Cronbach's alpha coefficients do not have much difference. Since SEM is positively related to the standard deviation of the observed scores and negatively related to Cronbach's alpha, the standard deviation of the developmental scores in older age groups should be larger, which indicates that the C-LAP is more reliable when assessing younger children's developmental age.
As developmental age in six domains was highly associated with chronological age, the developmental ages in each domain should be highly associated with each other if chronological age is not controlled, which is shown in the zero-order correlation results below the diagonal of the correlation matrix. The results of partial correlation, which is presented above the correlation matrix, showed that, after controlling for chronological age, the developmental ages of different dimensions of children in Shanghai were correlated to a certain extent, but none of these partial correlation coefficients were above 0.7; thus, the structural validity were demonstrated to be relatively ideal. However, in provinces other than Shanghai, the partial correlations between fine motor and cognition, language and cognition were higher (>0.7). This, in a sense, indicated that these dimensions were associated with each other, and other recent studies have also documented the obvious relationship between them [20]. However, there were several shared items in the C-LAP system, making the partial correlation coefficients between fine motor and cognition, as well as language and cognition, higher than others. Additionally, the relatively smaller sample size from other provinces when compared to Shanghai, and the direct combination of data from different provinces, may also cause these partial correlation coefficients to be larger than those in Shanghai. Hence, a larger sample size is needed for further verification in other provinces.
In this study, longitudinal data were also used to identify the difference in the developmental age in different domains, acquired from two assessments of the same research object with time gaps of from 1 to 3 months. The results indicated that the differences between the two measurements were significant three, two or even one month apart, which illustrated the high discriminant validity of C-LAP system, which sensitively detected the short-term changes in children's neuropsychological development in different areas. The standardized response mean (SRM) also demonstrated that the C-LAP system was moderately or markedly responsive towards children's development. We can also find that the C-LAP was more responsive in younger age groups compared with older age groups. This may be the result of the faster development of younger children compared to older ones. Thus, the developmental changes in younger children could be much easier to detect using the C-LAP system.
However, this study has several limitations. First, children with development disorders or diseases were not included in this study, so we could not analyze the reliability and validity of the C-LAP when assessing children with disabilities. Second, the test-retest reliability and concurrent validity were not analyzed due to the lack of corresponding data, which will be examined in a further study. Third, more than 90% of the participants were from Shanghai, which restricted this research. Even if we gathered 989 participants from other provinces in China with a high Cronbach's coefficient, it would still be necessary to re-examine the reliability and validity of the C-LAP system.

Conclusions
This study aimed to evaluate the C-LAP system, which was translated from the E-LAP to Chinese, and found that the C-LAP is generally reliable, valid, and moderately to markedly responsive in Shanghai. The C-LAP was found to be more appropriate for younger children compared with older ones regarding its reliability and responsiveness. However, the results for older children, aged from 1 to 3 years, were still outstanding. Thus, the C-LAP is suitable for assessing child development in Shanghai, but still needs confirmation when used in other provinces in China due to the great differences between regions of China.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/children8110974/s1, Table S1: Characteristics of the samples used for examining SRM for children's developmental age obtained from two tests with a time gap of 3 months, Table S2: Characteristics of the samples used for examining SRM for children's developmental age obtained from two tests with a time gap of 1-2 months. Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The datasets are available on reasonable request. If anyone wants to get access to the data, please send an email to chend19@fudan.edu.cn.