Submit to this Journal Review for this Journal Propose a Special Issue

Article Menu

Share Help Cite Discuss in SciProfiles

Open AccessArticle

Peer-Review Record

A Comparison of Different Methods for Measuring Individual Change in Kindergarten Children

Educ. Sci. 2025, 15(11), 1475; https://doi.org/10.3390/educsci15111475

by Theresa Pham^1,*

, Janis Oram¹

, Daniel Ansari²

, Marc F. Joanisse², Christine Stager³ and Lisa M. D. Archibald¹

Reviewer 1: Anonymous

Reviewer 2: Anonymous

Educ. Sci. 2025, 15(11), 1475; https://doi.org/10.3390/educsci15111475

Submission received: 8 September 2025 / Revised: 20 October 2025 / Accepted: 22 October 2025 / Published: 3 November 2025

Round 1

Reviewer 1 Report

Comments and Suggestions for Authors

Manuscript Title: A comparison of different methods for measuring individual change in kindergarten children

Dear Authors,

Thank you for the opportunity to review your manuscript. The topic is highly relevant to the fields of education and developmental psychology. Please find my detailed feedback below:

Abstract

Line 4: When you mention “change at group level,” please clarify what specific type of change you are referring to.
Line 18: Kindly specify the practical tool you are referring to here.

Introduction

Lines 30–31: When discussing research that uses statistics to quantify group-level change, please provide an example of a specific study to support your point and strengthen clarity.
Line 34: Why is change measured only through increases in scores? Is this an assumption? Please clarify.
Lines 35–41: This section is largely descriptive. Consider integrating relevant research to provide greater critical depth.
Line 77: While the point you raise is valid, note that developmental change may also vary across countries. Cultural, social, and economic factors should be acknowledged here.
Lines 89–90: Please clarify what you mean by “experimenter-created measures require more resources.” In what ways?
Lines 116–118: The point about negative change in standard scores would be stronger if contextualized in terms of developmental delay and the attainment of developmental milestones.
Line 126: Consider also addressing how norms may vary across cultures.
Line 138: When discussing minimal statistical training among practitioners, could you clarify if there is a defined threshold for this?
Line 141: References for the RCI are quite dated. Please add more recent research where possible.
Line 224: Rather than simply stating “mixed results,” please elaborate on the specific findings of the different studies.
Line 238: I suggest framing this as a research aim rather than “research goals,” as this may better reflect the foundation of the manuscript.

Method

In the Participants section, please state the inclusion and exclusion criteria.
Since demographic data were not collected, assumptions about participants (e.g., being monolingual and from a high socio-economic background) may affect the reliability of the results. Please address this.
Include the attrition rate.
In the Materials section, you note that norming for the Reading Readiness Screening Test is still ongoing. This raises the question of whether another standardized measure could have been used to assess children’s vocabulary.
Please provide the psychometric properties of the tools used, where applicable.
The Procedure section is very brief. Please expand to include details such as information sheets, parental consent, and debriefing. Additionally, when mentioning that research assistants were trained, specify the number of assistants, the type and duration of training, and other relevant details to support replication.

Results and Discussion

The results are clearly presented and easy to follow.
Lines 466–482: In the opening paragraph of the discussion, please link your findings more explicitly to the stated research aims.
Lines 496–499: The statement here appears to contrast with what you noted in the introduction (that negative change does not necessarily indicate lack of progress). Please reconcile this for consistency.
Lines 508–512: You mention that different methods yielded slightly different results but do not indicate which method may be more accurate. This could be highlighted as a limitation of the study.
In the Limitations section, consider including known-group validity testing (e.g., comparing a screening test with a gold-standard diagnostic tool). Additionally, you refer to “screening tools” in general—please specify which tool(s) you mean and why they were selected.

Thank you once again for your submission. I hope these comments will be helpful in refining your manuscript.

Comments for author File: Comments.pdf

Author Response

Reviewer 1.

Thank you for the opportunity to review your manuscript. The topic is highly relevant to the fields of education and developmental psychology. Please find my detailed feedback below:

RESPONSE: Thank you for your careful reading of our manuscript. We appreciate your constructive and encouraging feedback and have made changes accordingly to improve the quality of our manuscript.

Abstract

Comment 1: Line 4: When you mention “change at group level,” please clarify what specific type of change you are referring to.

RESPONSE 1: We have clarified the phrase “change at group level”. The phrase has been revised to read as “change due to learning over time” (page 1, line 5). In the context of our study, we are interested in change as it relates to student learning across the school years, that is, a student’s progress in developing academic-relevant competencies, which is referenced on line 12 of the abstract.

Comment 2. Line 18: Kindly specify the practical tool you are referring to here.

RESPONSE 2: We have revised the sentence to specify the practical tool as the calculate that was created to help clinicians and educators calculate individual change: “a practical tool (Excel-based Growth Calculator)” (page 1, line 19)

Introduction

Comment 3: Lines 30–31: When discussing research that uses statistics to quantify group-level change, please provide an example of a specific study to support your point and strengthen clarity.

RESPONSE 3: We thank the reviewer for providing the excellent suggestion of providing specific references to research to strengthen the argument made here. Given that our study measured change in students’ learning across the kindergarten years in a full-day kindergarten program, we have provided examples of prior research studies and reviews that have examined the effects of full-day kindergarten using group-level statistics such as frequentist approach, which, in turn, influenced policy decisions for all students. The specific text where we address this point is as follows:

“Take for example, the effects of full-day (vs. half-day) kindergarten on developing academic-relevant competencies which have typically been studied at the group-level using frequentist statistics (see Cooper et al., 2010 for a review). Subsequently, developmental research informs policy decisions such as implementing full-day kindergarten curriculum for all students given that full-day kindergarten is efficacious on average (e.g., Canada: Akbari et al., 2023; United Sates: Kauerz, 2005).” (pages 1-2, lines 42-47)

Comment 4: Line 34: Why is change measured only through increases in scores? Is this an assumption? Please clarify.

RESPONSE 4: We appreciate the Reviewer’s careful reading of our manuscript. We have clarified the sentence to state that change can be indexed by both increases or decreases in scores over time (page 2, line 50).

Comment 5: Lines 35–41: This section is largely descriptive. Consider integrating relevant research to provide greater critical depth.

RESPONSE 5: We have more clearly justified this paragraph by including relevant research discussing the challenges of measuring change. The following references are new and were added to this section:

Bedard, K., & Dhuey, E. (2006). The persistence of early childhood maturity: International evidence of long-run age effects. The Quarterly Journal of Economics, 121(4), 1437–1472. https://doi.org/10.1162/qjec.121.4.1437

Anderman, E. M., Gimbert, B., O’Connell, A. A., & Riegel, L. (2015). Approaches to academic growth assessment. British Journal of Educational Psychology, 85(2), 138–153. https://doi.org/10.1111/bjep.12053

The following references were cited in the original manuscript but have now been incorporated into this section:

Duff, K. (2012). Evidence-based indicators of neuropsychological change in the individual patient: relevant concepts and methods. Archives of Clinical Neuropsychology, 27(3), 248–261. https://doi.org/10.1093/arclin/acr120

Frijters, J. C., Lovett, M. W., Sevcik, R. A., & Morris, R. D. (2013). Four methods of identifying change in the context of a multiple component reading intervention for struggling middle school readers. Reading and Writing, 26(4), 539–563. https://doi.org/10.1007/s11145-012-9418-z

Hendricks, E. L., & Fuchs, D. (2020). Are individual differences in response to intervention influenced by the methods and measures used to define response? Implications for identifying children with learning disabilities. Journal of Learning Disabilities, 53(6), 428–443. https://doi.org/10.1177/0022219420920379

Maassen, G. H., Bossema, E., & Brand, N. (2009). Reliable change and practice effects: Outcomes of various indices compared. Journal of Clinical and Experimental Neuropsychology, 31(3), 339–352. https://doi.org/10.1080/13803390802169059

Comment 6: Line 77: While the point you raise is valid, note that developmental change may also vary across countries. Cultural, social, and economic factors should be acknowledged here.

Comment 7: Line 126: Consider also addressing how norms may vary across cultures.

RESPONSE: We are in agreement with the Reviewer that this is an important point to consider when studying developmental change and using norm-referencing. To address both Comments #6 and #7, we have added text in the revised manuscript to acknowledge the limitations of norm-referenced test with respect to potential cultural and linguistic biases. The added text is on page 3, lines 119-127:

“Relatedly, an important point to consider is the potential bias of norm-referenced tests if the culture and language of the student being assessed is different from the normative group (e.g., Lasher & Cockcroft, 2017; Higgins & Lefebvre, 2024). This can mean a risk for overidentification/misdiagnosis of culturally and linguistically diverse students as having learning difficulties when relying on norm-referencing (e.g., Paradis et al., 2013). It would be important to have feasible methods for capturing growth across the range of individual differences, including those developing typically or even struggling individuals who change but may not have normalized.”

Comment 8: Lines 89–90: Please clarify what you mean by “experimenter-created measures require more resources.” In what ways?

RESPONSE 8: We have clarified the types of resources required: “experimenter-created measures require more resources including a representative sample for norming and statistical training to convert raw scores to standard scores and percentiles.” (page 3, lines 108-110). Further, we expand on this discussion point on lines 147-157.

Comment 9: Lines 116–118: The point about negative change in standard scores would be stronger if contextualized in terms of developmental delay and the attainment of developmental milestones.

RESPONSE 9: We have contextualized the point about negative change in standard scores by including work by Farmer et al. (2020) discussing how norm-referenced scores often decrease over time among people with neurodevelopmental disorders who exhibit slower-than-average increases in ability. Respectfully, we also maintained our original discussion about alternative interpretations of standard scores decreasing over time. We want to acknowledge that interpreting (standard) scores is not clear-cut, and hence, should be used as one piece of evidence to guide educational and clinical decisions.

Comment 10: Line 138: When discussing minimal statistical training among practitioners, could you clarify if there is a defined threshold for this?

RESPONSE 10: While there might not be an established guideline quantifying the amount of statistical training among practitioners, we appreciate the reviewer’s concern and developed this section by adding references that support the idea of minimal statistical training among practitioners. We included recent work by Lakhlifi et al. (2023) who found that professionals (in the study, physicians and medical students) struggle with aspects of research methods, such as understanding p-values (page 4, line 164).

Comment 11: Line 141: References for the RCI are quite dated. Please add more recent research where possible.

RESPONSE 11: Respectfully, we have decided to maintain the original references given that these are pioneer studies on the reliable change index, and from which, the equation used in the current study was derived. However, we have also included updated references of recent research where possible.

Comment 12: Line 224: Rather than simply stating “mixed results,” please elaborate on the specific findings of the different studies.

RESPONSE 13: We have decided to remove this section given redundancy with the text on page 6, lines 230-242 discussing the specific findings of the different studies, which reviews similarities and differences between the change score indices from prior work.

Comment 13: Line 238: I suggest framing this as a research aim rather than “research goals,” as this may better reflect the foundation of the manuscript.

RESPONSE 13: We thank Reviewer 1 for this suggestion and have reframed as “research aims” instead (page 6, line 262).

Method

Comment 14: In the Participants section, please state the inclusion and exclusion criteria.

RESPONSE 14: We have specified that “There were no specific criteria listed for participation, as we were interested in obtaining an unselected, representative sample of kindergarten students.” (lines 287-289). This means that our sample was composed an unselected sample of children including typically developing and struggling learners, which tends to reflect the typical characteristics of diverse students in kindergarten classrooms.

Comment 15: Since demographic data were not collected, assumptions about participants (e.g., being monolingual and from a high socio-economic background) may affect the reliability of the results. Please address this.

RESPONSE 15: We thought critically about how to address this Reviewer and Reviewer 2’s concerns about demographic data available. Please refer to Reviewer 2, Comment 3 for our full response. For convenience, we reiterate the sections relevant for addressing the current comment here. First, we reviewed demographic data reported in the larger study and have incorporated additional information where available. The following section was added:

“The larger study did not collect specific information about the demographic characteristics for our sample (e.g., ethnicity, language(s) spoken, socioeconomic status, etc.), nor did the school share this information. There was limited information from a demographic survey that was discontinued after the first year of a study. As per parent report (n = 101), children were rated to be ‘good’ on average at: counting and recognizing numbers; letter names; number relationships; quantity concepts; understanding patterns. Children were rated to be ‘satisfactory’ on average at: letter sounds; meaning of written words. Further, children 'never’ or ‘rarely’ (once or twice a month) attended English as a Second Language or Literacy programs in the past six months before kindergarten. Based on previous studies with cohorts from this School Board (removed for peer review), we could only speculate that the sample might be largely monolingual English with a high socioeconomic status.” (page 7, lines 292-303)

Importantly, we recognize this limitation and have highlighted it in the discussion:

“Given the potential lack of linguistic and cultural diversity in our sample, this could limit the generalizability of findings and application of Excel-based Growth Calculator to a broader population. We provide instructions to change the summary data to match the sample characteristics of the context, if available, which, in turn, could improve accuracy of the calculator. We encourage future research to investigate the individual change indices reviewed here across different cultural, ethnic, racial, and socioeconomic groups.” (page 17, lines 714-720).

Comment 16: Include the attrition rate.

RESPONSE 16: We have included information on attrition rate and possible explanations: “There was 47.57% attrition between the two years. The main reason for attrition was that the larger study focused on kindergarten students in Year 2. Students in Year 1 were included in the study if they were part of a split Year 1/2 class as all students in the class were invited to participate. Other possible reasons for attrition included child absence, child moving schools, child withdrawal from the study, and school withdrawal (Board-level policy changes, labour shortages and disruptions).” (page 7, lines 284-290)

Comment 17: In the Materials section, you note that norming for the Reading Readiness Screening Test is still ongoing. This raises the question of whether another standardized measure could have been used to assess children’s vocabulary.

RESPONSE 17: We understand reviewer’s concern. However, we had to find a balance between forming a comprehensive screening tool that tapped across academic domains and the feasibility of administering the assessment in a timely manner for young children. For example, the Expressive Vocabulary Test Third Edition (EVT-3), a widely used standardized measure of children’s expressive vocabulary, takes approximately 20 minutes to complete, whereas our entire assessment battery spanning language, reading, and mathematics took approximately 30-40 minutes to completed. We note, however, in the limitation section the use of screening or assessment tools with few items to measure change (page 17, lines 700-709).

Comment 18: Please provide the psychometric properties of the tools used, where applicable.

RESPONSE 18: Psychometric properties of the tools used are provided in the revised manuscript. Specifically, the test-retest reliability metric is available from the larger study. For some measures (e.g., Sentence recall), we were able to find additional metrics to report as well.

Comment 19: The Procedure section is very brief. Please expand to include details such as information sheets, parental consent, and debriefing. Additionally, when mentioning that research assistants were trained, specify the number of assistants, the type and duration of training, and other relevant details to support replication.

RESPONSE 19: Before our initial submission, we thought that additional information about the Method section could be relegated to the larger study in order to streamline the novelty of the current work as a methodological study focused on employing multiple indices to assess change at the individual-level for all students. However, we are sensitive to Reviewer 1’s concerns about having a detailed Procedure section to support replication. As suggested, we have explicitly stated that all students in the participating classrooms were invited to participate in the study, parents received an information letter about the study (page 7, lines 274-279) and could opt-in to receive a summary of the study (page 8, lines 318-319).

In terms of the training research assistants received, we have included the following information:

“All participants were tested individually in a single session, lasting 30-40-minutes by trained research assistants (RAs). There were on average 14 trained RAs (range = 7-20) administering the assessment each year. Each RA attended a 2-3 hour training session. Training included viewing a video recording of the second author providing step-by-step explanations of administering and scoring the test battery, conducted on a practice child participant who was not included in the study. The RAs also had opportunities to practice the administering and scoring of each measure. We created a website that contained the training video and other resources for RAs to access during and after training.” (pages 7-8, lines 305-312).

Results and Discussion

Comment 20: The results are clearly presented and easy to follow.

RESPONSE 20: Thank you for your positive comment.

Comment 21: Lines 466–482: In the opening paragraph of the discussion, please link your findings more explicitly to the stated research aims.

RESPONSE 21: We have explicitly linked the findings to the research aims by presenting the research aims in parentheses.

Comment 22: Lines 496–499: The statement here appears to contrast with what you noted in the introduction (that negative change does not necessarily indicate lack of progress). Please reconcile this for consistency.

RESPONSE 22: We thank the Reviewer for highlighting this potential contradiction. In response to Reviewer 1, Comment 7, we have revised the current manuscript to suggest that a negative change could indicate a lack of progress. As a result, we believe that interpretations made in this section about using the change score to support each student needs are justified.

Comment 23: Lines 508–512: You mention that different methods yielded slightly different results but do not indicate which method may be more accurate. This could be highlighted as a limitation of the study.

RESPONSE 23: We are in agreement with Reviewer 1 that given our results, along with prior work (e.g., Frijters et al., 2013; Hendricks & Fuchs, 2020), have found that different methods yielded slightly different results, we cannot provide a conclusive recommendation of the most accurate method. We discuss how methods could be chosen based on best practice guidelines (page 15, lines 597-600) but acknowledge this as a limitation overall: “Overall, we acknowledge that different methods yielding slightly different results might preclude us from concluding which method may be most accurate. There needs to be further investigation comparing the methods to ascertain the accuracy of each index for evaluating young children’s academic progress.” (page 15, lines 625-628).

Comment 24: In the Limitations section, consider including known-group validity testing (e.g., comparing a screening test with a gold-standard diagnostic tool). Additionally, you refer to “screening tools” in general—please specify which tool(s) you mean and why they were selected.

RESPONSE 24: We have revised the limitation section to specify the screening tools used (lines 700-702), justified the need to balance having a comprehensive tool that taps across academic domains with resource- and time-constraints (lines 707-709), and call for future work to investigate the psychometric properties of the measures used in the current study (lines 691-693).

Thank you once again for your submission. I hope these comments will be helpful in refining your manuscript.

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors.

The manuscript entitled “A Comparison of Different Methods for Measuring Individual Change in Kindergarten Children” addresses an important issue in educational and clinical research: how to measure individual change in young children’s academic skills. This study compares four methods: normalisation, Reliable Change Index (RCI), Standardised Individual Difference (SID), and Standardised Regression-Based (SRB), using data from 157 kindergarten children assessed over one year. The authors argue that individual-level indices provide more nuanced insights than group-level statistics and propose a practical calculator for practitioners. Although the manuscript is well-structured, with a clear introduction, detailed methodology, and comprehensive discussion, several issues limit the clarity, rigour, and generalisability of the study’s findings, which are explained in greater detail in the following paragraphs.

Regarding the theoretical and conceptual framework, while the introduction provides a rationale for measuring individual change, it lacks a strong theoretical underpinning. The discussion of why the authors selected these four methods could be expanded beyond practical considerations. I highly recommend incorporating theoretical perspectives on developmental trajectories and educational assessment (e.g., Vygotskian or dynamic assessment frameworks) and explaining how these methods align with or challenge existing models of academic growth.

Regarding the sample description and generalisability, the absence of demographic details (e.g., socioeconomic status, linguistic background, ethnicity) is a significant limitation. The authors acknowledge this but do not discuss its implications for interpretation. I highly recommend providing any available contextual information (e.g., school board characteristics) and discussing how the lack of demographic diversity might affect the applicability of findings to other populations.

Regarding the statistical analysis and interpretation, the interpretation of agreement between methods (κ values) is clear, but the practical implications of discrepancies (e.g., normalisation vs. growth methods) need elaboration. I recommend discussing why normalisation identified more change and what this means for practitioners. Could this lead to over-identification of progress? How should educators reconcile conflicting results?

Having said that, I highlight that the manuscript addresses a critical gap in early education research by focusing on individual rather than group-level change, provides a practical tool for practitioners, enhancing translational impact, and employs multiple indices, allowing for a nuanced comparison of methods. However, the manuscript requires substantial improvements in theoretical framing, methodological transparency, and discussion of practical implications.

Author Response

Reviewer 2:

Comment 1: The manuscript entitled “A Comparison of Different Methods for Measuring Individual Change in Kindergarten Children” addresses an important issue in educational and clinical research: how to measure individual change in young children’s academic skills. This study compares four methods: normalisation, Reliable Change Index (RCI), Standardised Individual Difference (SID), and Standardised Regression-Based (SRB), using data from 157 kindergarten children assessed over one year. The authors argue that individual-level indices provide more nuanced insights than group-level statistics and propose a practical calculator for practitioners. Although the manuscript is well-structured, with a clear introduction, detailed methodology, and comprehensive discussion, several issues limit the clarity, rigour, and generalisability of the study’s findings, which are explained in greater detail in the following paragraphs.

RESPONSE 1: We thank Reviewer 2 for your critical review of our manuscript and offering insightful recommendations. We have made changes in response to your suggestions in order to strengthen the quality of our manuscript. Please see our response to each comment below.

Comment 2: Regarding the theoretical and conceptual framework, while the introduction provides a rationale for measuring individual change, it lacks a strong theoretical underpinning. The discussion of why the authors selected these four methods could be expanded beyond practical considerations. I highly recommend incorporating theoretical perspectives on developmental trajectories and educational assessment (e.g., Vygotskian or dynamic assessment frameworks) and explaining how these methods align with or challenge existing models of academic growth.

RESPONSE 2: We have strengthened our Introduction by including theoretical motivations for our current work. Relatedly, we have explained in the Discussion how our work aligns with theoretical and empirical approaches in assessing growth in student learning. Below you will find passages elaborating on the theoretical framework throughout the revised manuscript:

Introduction:

“A social interactionist approach to learning acknowledges that learning in educational settings is a dynamic process that reflects the interplay between the student and interaction with others and the environment (Vygotsky, 1978). What is to be measured, therefore, is the individual student’s progress or growth in the educational context. Indeed, there has been growing interest by policymakers and researchers to shift the focus away from mean or average effect for groups and focus instead on how student learning evolves over time for the individual student (Anderman et al., 2015). It would follow that available methods for assessing individual-level change need to be systematically compared, motivating the current study.” (page 1, lines 26-34).

Discussion:

“Being able to track individual progress (or lack thereof) is in line with a Vygotskian theoretical framework, in which educators or clinicians could support each student needs and make equitable educational decisions for individual students in order to improve learning outcomes (Vygotsky, 1978; Anderman et al., 2015).” (page 14, lines 571-574).

Comment 3: Regarding the sample description and generalisability, the absence of demographic details (e.g., socioeconomic status, linguistic background, ethnicity) is a significant limitation. The authors acknowledge this but do not discuss its implications for interpretation. I highly recommend providing any available contextual information (e.g., school board characteristics) and discussing how the lack of demographic diversity might affect the applicability of findings to other populations.

RESPONSE 3: We are sensitive to this Reviewer and Reviewer 1’s concerns about demographic data available. In our initial submission of the manuscript, we had referred interested readers to the larger study which contained more information about the demographic details that were available about the school and participants. However, for transparency, we have included available information in the revised manuscript and in this response letter.

First, the location of the study context was masked to ensure a blind-review process. In this school board, there are 67 publicly funded elementary schools. The larger study recruited 16 elementary schools across the board to participate in the study. Of which, 14 schools were in urban settings and 2 schools were in rural settings.

Second, we have reviewed demographic data reported in the larger study and have incorporated additional information where available. The following section was added to the revised manuscript:

“The larger study did not collect specific information about the demographic characteristics for the sample (e.g., ethnicity, language(s) spoken, socioeconomic status, etc.), nor did the school share this information. There was limited information available from a demographic survey that was discontinued after the first year of a study. As per parent report (n = 101), children were rated to be ‘good’ on average at: counting and recognizing numbers; letter names; number relationships; quantity concepts; understanding patterns. Children were rated to be ‘satisfactory’ on average at: letter sounds; meaning of written words. Further, children 'never’ or ‘rarely’ (once or twice a month) attended English as a Second Language or Literacy programs in the past six months before kindergarten. Based on previous studies with cohorts from this School Board (removed for peer review), we could only speculate that the sample might be largely monolingual English with a high socioeconomic status.” (lines 292-303)

Additional information that we could extrapolate about the participants is that the sample represented the full range of an unselected sample. Indeed, there was no specific criteria listed for participation in order to obtain an unselected, representative sample of kindergarten students (lines 287-289). Empirical evidence that could corroborate the unselected sample comes from Woodcock-Johnson III (Woodcock & Johnson, 1990) Tests of Achievement standardized test scores available for some Grade 1 and 2 students from the larger sample. There were scores for 218 Grade 1 students (of whom, 74 were participants in the current study) and 124 Grade 2 students (of whom, 24 were participants in the current study) available. Scores mapped onto the central tendency in the average range, as would be expected with a large sample. Specifically, standard scores ranged from 80-106 (SD = 13-40). Standard scores are scaled to a mean of 100 (SD = 15) relative to a larger sample of typically developing children.

Finally, and importantly, we recognize this limitation and have highlighted it in the discussion:

Comment 4: Regarding the statistical analysis and interpretation, the interpretation of agreement between methods (κ values) is clear, but the practical implications of discrepancies (e.g., normalisation vs. growth methods) need elaboration. I recommend discussing why normalisation identified more change and what this means for practitioners. Could this lead to over-identification of progress? How should educators reconcile conflicting results?

RESPONSE 4: We appreciate this insightful comment from Reviewer 2. We thought critically about interpretations of why the normalization method might have identified more students as having changed compared to other indices and what this means for clinical practice, without overinterpreting the results. We first need to contextualize the normalization method such that it could only inform on the trajectory of low scoring students. It could be the case that some children with low initial scores at the beginning of kindergarten were showing a compensatory growth pattern (e.g., Luwel et al., 2010; Lervåg & Hulmes, 2010). This means they were showing a fast growth pattern and, to a certain degree, were catching up with typically developing children (i.e., those within normalized levels at the beginning). This could help educators and clinicians identify low scoring children in need of additional resources or supports that would facilitate academic growth. An alternative explanation could be related to regression to the mean in that the time 2 score was within normalized level given that the time 1 score is below average. This interpretation aligns with the normalization calculation as well as regression to the mean and practice effects are not accounted for in the calculation in contrast to the remaining reliable change indices. As we have noted throughout the revised manuscript, practitioners should avoid solely relying on using test scores to make educational and clinical decisions. It is also important to note that all indices converged on not identifying a change in the majority of this subgroup (63% or 34/54). The additional text is as follows:

“Finally, the normalization method performed differently compared to other indices in our study, aligning with prior work (Frijters et al., 2013; Hendricks & Fuchs, 2020). There may be several explanations. The normalization method was focused on children with relatively low initial scores. Children with lower initial scores have been found to generally improve more than those with higher initial scores (e.g., Luwel et al., 2010; Lervåg & Hulmes, 2010). Relatedly, another explanation may be regression to the mean, as scores for this subgroup were relatively low at time 1. Indeed, the normalization equation does not account for additional variables (e.g., regression to the mean, practice effects). These interpretations tentatively suggest that for about a quarter of children (n = 13/54) with the lowest scores in the beginning, they are potentially showing a compensatory pattern to catch up with the rest of the sample by the second year of formal education. But, perhaps, only to a certain degree, as most of them did not make a change that was considered to be reliable based on our indices (n = 11/13). Practically, this might mean that practitioners should avoid prematurely ceasing services once these at-risk children seem to be within normalized levels. Instead, these children may have a high potential for growth that may need more time to be realized. Notably, the majority of children in the low scoring subgroup (n = 34/54) did not make a change according to any of the indices, highlighting that test scores should only be used as one source of data to guide educational and clinical decisions. Overall, we acknowledge that different methods yielding slightly different results might preclude us from concluding which method may be most accurate. There needs to be further investigation comparing the methods to ascertain the accuracy of each index for evaluating young children’s academic progress.” (page 15, lines 607-628).

Comment 5: Having said that, I highlight that the manuscript addresses a critical gap in early education research by focusing on individual rather than group-level change, provides a practical tool for practitioners, enhancing translational impact, and employs multiple indices, allowing for a nuanced comparison of methods. However, the manuscript requires substantial improvements in theoretical framing, methodological transparency, and discussion of practical implications.

RESPONSE 5: We once again thank the Reviewer for providing such thorough and supportive feedback and for your enthusiasm for this work.

Round 2

Reviewer 1 Report

Comments and Suggestions for Authors

The authors have addressed the comments and feedback

Reviewer 2 Report

Comments and Suggestions for Authors

Dear Authors.

I have carefully reviewed the revised version of the manuscript entitled “A Comparison of Different Methods for Measuring Individual Change in Kindergarten Children”, submitted for evaluation to Education Sciences.

I appreciate that the authors have taken into consideration the comments and suggestions made in the first review of this manuscript. In this new version, I believe that the authors have substantially improved their manuscript and corrected the weak points of the first version.

More specifically, I have noted that the authors have made the requested changes to the introduction and discussion of the manuscript, incorporating theoretical perspectives on developmental trajectories and educational assessment. In addition, the authors have clarified the methodological aspects requested and expanded the analysis of the results. Also, the authors have considerably expanded the discussion of the results and the implications of their study.

Finally, consider the manuscript has been sufficiently improved to warrant publication in Education Sciences. Therefore, the decision made on this manuscript is to ACCEPT.

Article Menu

A Comparison of Different Methods for Measuring Individual Change in Kindergarten Children

Further Information

Guidelines

MDPI Initiatives

Follow MDPI