1. Introduction
One of the goals of teaching computer science (CS) to young children is to foster the development of computational thinking (CT) skills that can promote broader problem-solving abilities in other disciplines and everyday life (
Wing, 2006). CT can be defined as the ability to frame and solve problems in a manner that can be carried out by information-processing agents (
Relkin, 2022; 
Wing, 2006, 
2011). Coding includes a number of distinct skills that are supported by CT constructs. In this paper, we present an assessment that measures a subset of CT and related skills thought to be foundational for future success in learning to code. The research team adopted the term “coding readiness” to use language that made the associated CS and CT concepts more readily understood by educators and families.
In the past two decades, many curricula, educational toys, and online platforms have been created to teach CT to young children (
Bell & Vahrenhold, 2018; 
Yang et al., 2024; 
Bers et al., 2022). This has created a need for measures of CT suitable for early childhood classrooms (
Lockwood & Mooney, 2018). Measurement of CT skills is vital because it gives educators insight into students’ progress and allows them to measure the success of CS lessons. A majority of existing CT assessments were developed for older children and often require coding skills to complete (
Ocampo et al., 2024). In recent years, several unplugged measures have been developed, extending CT assessment to elementary school-age children regardless of their coding experience. Unplugged assessment uses tasks that are analogous to coding activities but do not require programming experience to complete.
Developing reliable unplugged CT measures for younger, preschool-aged children has proven more challenging for various reasons. Preschool children tend to have shorter attention spans and more limited linguistic abilities than older children, thus complicating the process of CT assessment (
Bracken & Nagle, 2007; 
Meisels, 2007). High-quality assessment of children of this age often requires trained administrators to individually assess children. In addition, it is important to accommodate children’s emerging literacy skills by eliminating any requirement for the child to read (
Snow & Van Hemel, 2008).
One CT measure that is developmentally appropriate for the preschool age group is STEM + C (
Dominguez et al., 2022). The assessment includes a series of open-ended, hands-on tasks designed to elicit young children’s understanding of CT skills and concepts. STEM + C is suitable for CT research in preschool age children. However, it may be difficult to implement in routine classroom settings because it requires specific manipulatives (e.g., LEGO Duplos, toy vehicles, and Unifix cubes), as well as trained administrators, to assure reliable administration and scoring.
Another instrument developed specifically for children ages 3 to 5 is TechCheck-PreK. It includes fifteen items that measure six of the seven developmentally appropriate Powerful Ideas of Computer Science and CT put forth by Bers (
Bers, 2018). Extensive research has shown that TechCheck has suitable psychometric properties for assessing CT skills in young children (
Relkin et al., 2020, 
2021; 
Relkin & Bers, 2021). However, TechCheck-PreK has shown lower levels of reliability (α = 0.5) in preschool children than comparable versions of TechCheck for older children. This lower reliability may, in part, be attributable to a greater tendency for impulsive responses in younger children (
Relkin, 2022). Additionally, TechCheck-PreK includes computing systems-related hardware/software questions that may not be applicable to all coding educational interventions.
In the current study, we modified the TechCheck-PreK assessment to improve reliability and assess additional constructs related to coding readiness. The new Coding Readiness Assessment (CRA) combines ease of administration with automated data collection and scoring. The assessment was designed to assess the full range of skill levels in Pre-K-age children.
  2. Method
  2.1. Assessment Development
The CRA measures outcomes that were specified in a CT curriculum created by codeSpark (an educational technology company), the National Head Start Association, and RAND Corporation, with funding from the National Science Foundation. codeSpark led the creation of the curriculum and chose the CT constructs included in the curriculum.
To select CT and coding-readiness constructs, computer science and child development experts drew from the literature on coding educational concepts from prekindergarten into early elementary school. Six of the seven constructs (sequencing, patterns, looping, modeling, modularity, and debugging) were derived from the work of Bers. Some of her seminal work that informed these constructs include 
Coding as a Playground (
Bers, 2018) and 
Teaching Computational Thinking and Coding to Young Children (
Bers, 2021). In some cases, construct names were changed to be easily understood by a broader, non-technical audience of educators and families. For example, the construct Bers (
Bers, 2018) referred to as “Algorithms” has been reformulated as the “Sequencing” construct. Bers’ “Control Structures” were embodied in two constructs called “Patterns” and “Looping.” Additional insights came from McCormick and Hall’s (
McCormick & Hall, 2022) literature review on CT in preschool settings and Brennan and Resnick’s (
Brennan & Resnick, 2012) framework for CT. The construct of “Spatial Reasoning” was based on several studies emphasizing the importance of spatial reasoning in preschool education (
Wai et al., 2009; 
Lubinski, 2010; 
Bower et al., 2022). The constructs and skills assessed by the CRA within this framework are shown in 
Table 1.
  2.2. Participants
The data in this study were collected as part of a sub-study in a randomized control trial (RCT) testing a new CT intervention for preschool-age students. We recruited 80 teachers from 49 Head Start centers in 14 states across the United States. Of those, 16 declined to participate, resulting in a total of 64 teachers in 42 centers in 13 states being enrolled. A total of 917 students were consented to participate, with 802 completing the CRA at baseline and 835 at the endpoint. A subset of 
n = 720 children took the assessment at both timepoints. 
Table 2 shows the demographic variables of all participants in this study. As this study occurred solely in Head Start classrooms, all families had income below the federal poverty line or had demonstrated a similar need for enrollment. Teachers assessed children during school hours in their classrooms. Children were allowed to discontinue participation at any time.
  2.3. Procedures
  2.3.1. Initial Field Testing
Initially, the CRA consisted of nine items from the TechCheck-PreK assessment (
Relkin, 2022) and ten newly created items. Researchers administered the initial field testing with five children (ages 3, 4, 4, 5, and 5, respectively) to evaluate developmental appropriateness and user experience. Children were asked to explain why they chose their answers to give researchers a better understanding of their responses. We found that five items required minor formatting improvements and/or revision of the prompts. These items were modified before carrying out a feasibility study.
  2.3.2. Feasibility Study
Forty-nine children participated in the feasibility study using the modified 19 items from the initial field test. Teachers administered the assessments to the children, with 38 completing both the baseline and post-assessments. Administration time averaged under 8 min at both timepoints. Scores were normally distributed, with some slight ceiling effects noted in 5-year-olds and floor effects in 3-year-olds. In an interview after the assessment was administered, all three teachers involved in the study reported that children were often impulsive in their responses, a phenomenon that was also observed with the TechCheck-PreK.
To reduce ceiling and floor effects, two additional items were added to the assessment, resulting in a total of 21 items. Researchers conducted debriefing sessions with the three educators who administered the assessment. All reported noticing impulsivity in children’s responses. They stated that children sometimes answered questions before the prompts were read out loud. To address impulsivity, our team explored various strategies. One idea involved moving the tablet away from the child while the prompt was being read and then moving it back for the child to select their response. However, this method risked introducing errors if the educator did not remember to move the tablet or did not do so quickly enough. Ultimately, the method that we found to be most effective was a modification to the Qualtrics online survey platform that introduced a brief delay between item presentation and the timeframe for response. Representative questions illustrating the nature of the assessment are shown in 
Appendix A (
Table A1).
  2.3.3. Full-Scale Testing
Administration of the 21-item Coding Readiness Assessment (CRA) was carried out by early childhood educators. Each educator watched a seven-minute video that prepared them to act as administrators. The video provided educators with instructions on preparing for the assessment, administering it, and troubleshooting unusual situations that may occur during the assessment period. Educators were then asked to take a short certification survey assessing their understanding of administering the assessment. They were allowed to retake the certification survey until they scored at least a 75%. The educators in this study scored an average of 92% on the certification survey.
Students were assessed individually using tablet computers (iPad or Android) to present the assessment on the Qualtrics platform. At the start, children were given two practice questions, along with feedback, to ensure they understood the format of the assessment and how to select a response. Administrators read a script aloud to prompt the child for each question. After a brief pause, they asked the child to indicate their answer either verbally (i.e., describing the option) or non-verbally (i.e., by pointing to or touching the screen). Children were allowed up to one minute to indicate their answers. One point was allotted for each correct response. Each item was automatically scored using the Qualtrics platform (
Qualtrics, 2020, Provo, UT; 
https://www.qualtrics.com/blog/citing-qualtrics/, accessed on 22 January 2024).
  2.4. Data Analysis
Analyses were conducted using the Stata statistical software version 17. When exploring differences between two categories, we performed t-tests with the “-t-test-” command. When exploring differences across multiple categories, we performed an ANOVA analysis with the “-anova-” command. We augmented this analysis with regressions that predicted scores with age and race/ethnicity categories.
Given that the nesting of children in centers results in correlated errors among children in centers, our model clusters standard errors at the center level. We employed the “-reg-” command with the “-cluster-” option.
To explore possible bias in the assessment items, we performed a Mantel–Haenszel Differential Item Functioning (DIF) analysis using the “-mhdif-” command. We illustrate the distribution of scores by presenting kernel density plots that use a Gaussian kernel, operationalized through the “-two-way kernel-” command.
A combination of Classical Test Theory (CTT), Item Response Theory (IRT), and descriptive analyses was used to evaluate the psychometric properties of the instrument. 3PL IRT models were selected based on goodness of fit to examine the assessment’s difficulty, discrimination, and guessing parameters. The “-irt-” package in Stata was used to estimate these models. We analyzed the internal consistency of the assessment by calculating Cronbach’s alpha with the “-alpha-” command and test–retest reliability by calculating the correlation between scores across assessment waves using the “-pwcorr-” command.
Finally, we performed confirmatory factor analysis with the “-sem-” command in Stata. We assessed goodness of fit with four statistics: the chi-square statistics, Comparative Fit Index (CFI), Tucker Lewis Index (TFI), and Root Mean Square Error of Approximation (RMSEA). A model that adequately fits the data generally has a CFI and TLI near 0.95 or greater and an RMSEA of near 0.06 or less (
Hu & Bentler, 1999). To improve model fit and explore a version of the assessment that would have a lower burden on teachers and students, we eliminated items with standardized factor loadings with an absolute value of less than 0.50 (
Ryberg et al., 2020).
  3. Results
  3.1. Descriptives
The 21-item CRA was administered 1637 times across baseline and post assessments in an efficacy trial of a new codeSpark curriculum. Twenty-five assessments were not completed (98.5% completion rate). The CRA took an average of 9.8 min for children to complete (range 4.6–36.4 min).
Table 3 shows total score descriptives at baseline. Girls’ average score was slightly higher than boys’. This difference was not statistically significant for a model taking into account the ages and races of participants. DIF analysis indicates that three items favored boys (question 2 (χ
2(1) = 6.27, 
p < 0.05), question 3 (χ
2(1) = 5.91, 
p < 0.05), and question 5 (χ
2(1) = 7.55, 
p < 0.01)), while two items favored girls, question 9 (χ
2(1) = 4.90, 
p < 0.05) and question 10 (χ
2(1) = 31.04, 
p < 0.001). These five items are worthy of further study to ensure that the assessment is free of bias. The approximately even split between questions favoring girls and boys suggests that, overall, the assessment is not biased towards one gender over another.
 There were significant differences by age. T-tests indicate that five-year-old children scored 1.29 points higher than four-year-olds (
p < 0.001), and four-year-olds scored 2.07 points higher than three-year-olds (
p < 0.001). Total scores for all three ages were normally distributed, with no clear ceiling or floor effects (see 
Figure 1 for kernel density plots by age).
For the subset of children for whom race and ethnicity data were available (
n = 427), we investigated how scores differed across five categories of race and ethnicity (White, Black, Hispanic, Multiracial, and children from other races). 
Table 4 shows the average scores and sample size of each racial group at baseline. ANOVA analysis indicates that Black children scored 1.35 points higher than White children (
p < 0.05), and Hispanic children and children from other races and ethnicities scored about the same as White children (0.60 points higher; 
p = 0.322 and 0.37 points lower; 
p = 0.615, respectively). Multiracial children scored 2.44 points higher than White children (
p = 0.001).
To explore the robustness of the ANOVA results, we conducted a linear regression examining if the total score was predicted by age, gender, and race. White, male, and three-year-old children were chosen as reference groups. In a model that clustered standard errors to take into account nesting, none of the racial/ethnic groups were statistically distinguishable, and gender was not a significant predictor of outcome. However, age was still a significant predictor, with four-year-olds (β = 1.85, p < 0.001) and five-year-olds (β = 3.42, p < 0.001) scoring significantly higher than three-year-olds.
  3.2. Reliability and Psychometrics
In order to evaluate a more complete range of ability levels, we combined baseline and post data for both the treatment and control groups in the psychometric analyses. Baseline and post assessments were administered approximately three months apart. Internal consistency of the CRA was found to be satisfactory, α = 0.78 (
Tavakol & Dennick, 2011). Test–retest reliability in the control group only was an acceptable value of 
r = 0.65 (
Matheson, 2019). To select an IRT model for our analysis, we compared model fit using likelihood ratio tests for a 3PL model compared to a 2PL model (χ
2(1) = 172.82, 
p < 0.001), a 2PL compared to a 1PL (χ
2(1) = 508.51, 
p < 0.001), and 3PL compared to 1PL models (χ
2(2) = 681.32, 
p < 0.001). The results indicate a significantly better fit for the 3PL model. 
Figure 2 displays plots of the overall test characteristic curve and test information function. The test characteristic curve was close to ideal, with a sigmoidal shape centered near zero. The test information function also peaked near zero and showed only minor asymmetry.
Difficulty and discrimination values are shown in 
Table 5. As intended in the design of the assessment, we found a range of item difficulty values. All items had acceptable discrimination values, with 13 out of the 21 items showing particularly strong discrimination (
Bichi & Talib, 2018). The estimated guessing parameter of 0.28 was acceptable for a three-option multiple-choice assessment. As a robustness check, we also estimated a 3PL model using the baseline data only and found similar results to the pooled model (see 
Appendix B; 
Table A2 and 
Figure A1).
  3.3. Item and Construct Analysis
We examined the percentage of children who answered each item correctly at baseline and found that there was a higher than chance level performance on all 21 items (see 
Figure 3).
Figure 4 shows the percentage of control-group children answering correctly for items pooled according to the CT construct they were intended to probe. Notably, every construct increased in the three-month interval of the study, suggesting that the instrument is sensitive to change, whether due to natural development, testing effects, or other factors.
   3.4. Confirmatory Factor Analysis
We conducted a confirmatory factor analysis (CFA) to examine if each of the CT constructs was grouped as intended. The model with the best fit included 13 of the 21 items and six of seven constructs, excluding looping. At least two questions were associated with each of the six factors included in the model. This model had an acceptable fit: χ2(DF) = 401.751, p < 0.001, RMSEA = 0.063, CFI = 0.950, and TLI = 0.927.
  4. Discussion
We created and evaluated a new unplugged measure of CT for preschool-aged children. The CRA was easy to administer, reliable, showed acceptable psychometric properties, and was sensitive to change. Teachers in a variety of classrooms and centers throughout the nation were able to successfully administer the assessment during the school day.
Younger children scored lower than older children at baseline. This finding aligns with other work that has shown that CT skills develop over time, even in the absence of specific CS educational interventions (
Relkin, 2022).
Based on confirmatory factory analysis, thirteen questions were correctly assigned to the CT constructs they were intended to probe. Despite the inclusion of two questions intended to probe looping, confirmatory factory analysis did not show a clear association between the items designed to probe this construct. Future work may improve upon probes of looping for children in this age group.
A small difference was observed in the average CRA scores of girls and boys, but the difference was not statistically significant after taking into account the ages and races of participants. Previous research has found that girls may be subject to STEM stereotypes that inhibit their performance in CS-related areas (
Sullivan, 2019; 
Master et al., 2021). In the current study, there was no significant difference favoring boys over girls, as has been observed previously with older children. This finding suggests that these gender-related stereotypes may not develop until later in childhood, reinforcing the importance of introducing STEM at an early age.
Prior research has found that, due to societal inequalities, Black and Hispanic children at older ages may be at a disadvantage when it comes to STEM-related fields as compared to their White counterparts (
Margolis et al., 2017; 
Wang & Hejazi Moghadam, 2017; 
Google & Gallup, 2016, 
2020). We observed numerical differences by race and ethnicity; namely, Multiracial and Black children scored higher than White children. However, these differences were not significant in a model that took into account age and gender clustered by schools. Prior work has found that race/ethnicity may be a proxy for other variables, such as socio-economic status (SES) (
Williams et al., 2016). However, we did not have suitable data to factor SES into our analyses. In light of the small sample sizes of some of the groups, we cannot state conclusively whether there are differences in performance between racial/ethnic groups on the CRA. Additional studies with a larger sample size with representative racial and ethnic groups and other background characteristics, such as SES, are needed to determine if there are differences.
  5. Limitations
This study was not designed to assess the effects of normal development on CRA performance. While clear differences in average scores were detected as a function of age, the age groups were not matched in terms of their background demographics. Therefore, the observed differences between age groups cannot be considered a substitute for longitudinal observations of intra-individual development.
While the sample size of this study was adequate to carry out psychometric modeling, there was insufficient power to draw firm conclusions about whether race or family income are significant predictors of CRA performance. Although girls averaged slightly higher scores than boys, further studies will be needed to establish whether or not there is a gender difference in performance on the CRA.
The creation of the additional new CRA items was guided by insights drawn from a review of the recent literature on CT and coding readiness in young children, lending face validity to the assessment. This study did not formally establish the CRA’s construct validity. However, the CRA is based in part on TechCheck-PreK, which has established construct validity for the assessment of young children’s CT skills.
The present sub-study did not evaluate the extent to which the CRA predicts readiness to participate in a coding educational intervention. The RCT results will establish this measure’s utility in relation to educational computer science interventions. The current results suggest the CRA has suitable properties to be used as a measure in future studies examining the impact of interventions and factors such as age on coding readiness in young children.
  6. Conclusions
It can be difficult to assess CT skills in Pre-K children for a variety of reasons, including their short attention spans and limitations of verbal abilities. Individual administration by an adult is typically required; however, early childhood educators often face challenges such as limited time and insufficient support staff. As a brief screening assessment that can be completed in an average of less than 10 min, the CRA is well-suited to address these challenges.
   
  
    Author Contributions
Conceptualization, C.D., J.F.P. and E.R.; data curation, C.D.; software, C.D., V.L.J. and E.R.; formal analysis, C.D. and E.R.; methodology C.D., E.R. and J.F.P.; resources, E.R., C.D. and J.F.P.; writing—original draft preparation, E.R. and C.D.; writing—review and editing, E.R., C.D., V.L.J. and J.F.P.; visualization, C.D. and E.R.; supervision, J.F.P.; project administration, C.D. and V.L.J.; funding acquisition, C.D. and J.F.P.; investigation C.D. and V.L.J. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Science Foundation, grant number 2122436.
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki, and approved by the RAND IRB (Human Subjects Protection Committee) (protocol 2021-N0258; approval date 4 June 2021).
Informed Consent Statement
Teachers gave informed consent, and parents were given the option to opt out of their child’s participation in the study. Children gave their verbal assent to participate.
Data Availability Statement
The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.
Conflicts of Interest
The authors declare no conflicts of interest.
  Appendix A. Representative Questions from the Coding Readiness Assessment
  
    
  
  
    Table A1.
    Representative questions from the Coding Readiness Assessment.
  
 
  Appendix B. Baseline Scores 3PL Modeling, Test Characteristic Curve and Test Information Function at Baseline
  
    
  
  
    Table A2.
    Baseline Scores 3PL Modeling.
  
 
  
      Table A2.
    Baseline Scores 3PL Modeling.
      
        | Question | Difficulty | Discrimination | Guessing | 
|---|
| 1 | 1.973 | 1.204 | 0.27 | 
| 2 | 1.994 | 0.778 | 
| 3 | 2.842 | 0.807 | 
| 4 | 1.753 | 0.35 | 
| 5 | 1.549 | 0.24 | 
| 6 | 0.985 | 1.113 | 
| 7 | 0.372 | 3.747 | 
| 8 | 0.598 | 2.053 | 
| 9 | 2.018 | −1.265 | 
| 10 | 2.139 | −0.976 | 
| 11 | 0.652 | 2.184 | 
| 12 | 1.899 | −0.072 | 
| 13 | 1.584 | −0.663 | 
| 14 | 0.693 | −0.918 | 
| 15 | 1.269 | −0.05 | 
| 16 | 1.504 | 0.547 | 
| 17 | 1.459 | 0.062 | 
| 18 | 1.554 | 1.44 | 
| 19 | 1.523 | 0.55 | 
| 20 | 1.105 | 0.579 | 
| 21 | 0.778 | 0.788 | 
      
 
  
    
  
  
    Figure A1.
      Test characteristic curve and test information function at baseline.
  
 
   Figure A1.
      Test characteristic curve and test information function at baseline.
  
 
References
- Bell, T.,  & Vahrenhold, J. (2018). CS unplugged—How is it used, and does it work? In H. -J. Böckenhauer, D. Komm,  & W. Unger (Eds.), Adventures between lower bounds and higher altitudes: Essays dedicated to juraj hromkovič on the occasion of his 60th birthday (pp. 497–521). Springer. [Google Scholar] [CrossRef]
- Bers, M. (Ed.). (2021). Teaching computational thinking and coding to young children. IGI Global. [Google Scholar] [CrossRef]
- Bers, M. U. (2018). Coding as a playground: Programming and computational thinking in the early childhood classroom. Routledge. [Google Scholar]
- Bers, M., Strawhacker, A.,  & Sullivan, A. (2022). The state of the field of computational thinking in early childhood education. In OECD education working papers no. 274. OECD Publishing. [Google Scholar] [CrossRef]
- Bichi, A. A.,  & Talib, R. (2018). Item response theory: An introduction to latent trait models to test and item development. International Journal of Evaluation and Research in Education, 7(2), 142–151. [Google Scholar] [CrossRef]
- Bower, M., Wood, L. A.,  & Lister, R. (2022). Exploring the development of computational thinking in preschoolers: A systematic review. Computing Education, 179, 104401. [Google Scholar]
- Bracken, B.,  & Nagle, R. (2007). Psychoeducational assessment of preschool children (4th ed.). Routledge. [Google Scholar] [CrossRef]
- Brennan, K.,  & Resnick, M. (2012, April 13–17). New frameworks for studying and assessing the development of computational thinking [Conference session]. 2012 Annual Meeting of the American Educational Research Association, Vancouver, BC, Canada. Available online: https://scratched.gse.harvard.edu/ct/files/AERA2012.pdf (accessed on 16 December 2024).
- Dominguez, X., Leones, T., Kamdar, D.,  & Gracely, S. (2022). Preschool problem solvers: Developing assessment tasks to measure young children’s learning of computational thinking skills and practices [Unpublished manuscript, Digital Promise].
- Google & Gallup. (2016). Diversity gaps in computer science: Exploring the underrepresentation of girls, blacks, and hispanics. Google Inc. [Google Scholar]
- Google & Gallup. (2020). Current perspectives and continuing challenges in computer science education in U.S. K-12 schools. Google Inc. [Google Scholar]
- Hu, L.,  & Bentler, P. M. (1999). Cutoff criteria for fit indexes in covariance structure analysis; conventional criteria versus new alternatives. Structural Equation Modeling: A Multidisciplinary Journal, 6, 1–55. [Google Scholar] [CrossRef]
- Lockwood, J.,  & Mooney, A. (2018). Computational thinking in education: Where does it fit? A systematic literary review. International Journal of Computer Science Education in Schools, 2(1), 41–60. [Google Scholar] [CrossRef]
- Lubinski, D. (2010). Spatial ability and STEM: A sleeping giant for talent identification and development. Personality and Individual Differences, 49(4), 344–351. [Google Scholar] [CrossRef]
- Margolis, J., Estrella, R., Goode, J., Holme, J. J.,  & Nao, K. (2017). Stuck in the shallow end: Education, race, and computing. MIT Press. [Google Scholar]
- Master, A., Meltzoff, A. N.,  & Cheryan, S. (2021). Gender stereotypes about interests start early and cause gender disparities in computer science and engineering. Proceedings of the National Academy of Sciences of the United States of America, 118(48), e2100030118. [Google Scholar] [CrossRef]
- Matheson, G. J. (2019). We need to talk about reliability: Making better use of test-retest studies for study design and interpretation. PeerJ, 7, e6918. [Google Scholar] [CrossRef]
- McCormick, K. I.,  & Hall, J. A. (2022). Computational thinking learning experiences, outcomes, and research in preschool settings: A scoping review of literature. Education and Information Technologies, 27, 3777–3812. [Google Scholar] [CrossRef]
- Meisels, S. J. (2007). Accountability in early childhood: No easy answers. In R. C. Pianta, M. J. Cox,  & K. L. Snow (Eds.), School readiness and the transition to kindergarten in the era of accountability (pp. 31–47). Paul H. Brookes Publishing Co. [Google Scholar]
- Ocampo, L. M., Corrales-Álvarez, M., Cardona-Torres, S. A.,  & Zapata-Cáceres, M. (2024). Systematic review of instruments to assess computational thinking in early years of schooling. Education Sciences, 14(10), 1124. [Google Scholar] [CrossRef]
- Qualtrics. (2020). Qualtrics [Software]. Qualtrics. Available online: https://www.qualtrics.com (accessed on 22 January 2024).
- Relkin, E. (2022). The development of computational thinking skills in young children [Ph.D. Dissertation, Tufts University].  ProQuest Dissertations and Theses Database. [Google Scholar]
- Relkin, E.,  & Bers, M. U. (2021, April 21–23). TechCheck-K: A measure of computational thinking for kindergarten children [Conference session]. 2021 IEEE Global Engineering Education Conference (EDUCON), Virtual Conference. [Google Scholar] [CrossRef]
- Relkin, E., de Ruiter, L.,  & Bers, M. U. (2020). TechCheck: Development and validation of an unplugged assessment of computational thinking in early childhood education. Journal of Science Education and Technology, 29(4), 482–498. [Google Scholar] [CrossRef]
- Relkin, E., de Ruiter, L.,  & Bers, M. U. (2021). Learning to code and the acquisition of computational thinking by young children. Computers & Education, 169, 104222. [Google Scholar] [CrossRef]
- Ryberg, R., Her, S., Temkin, D., Madill, R., Kelley, C., Thompson, J.,  & Gabriel, A. (2020). Measuring school climate: Validating the education department school climate survey in a sample of urban middle and high school students. AERA Open, 6(3). [Google Scholar] [CrossRef]
- Snow, C. E.,  & Van Hemel, S. B. (2008). Early childhood assessment: Why, what, and how. National Academies Press. [Google Scholar]
- Sullivan, A. A. (2019). Breaking the STEM stereotype: Reaching girls in early childhood. Rowman & Littlefield Publishers. [Google Scholar]
- Tavakol, M.,  & Dennick, R. (2011). Making sense of cronbach’s alpha. International Journal of Medical Education, 2, 53–55. [Google Scholar] [CrossRef] [PubMed]
- Wai, J., Lubinski, D.,  & Benbow, C. P. (2009). Spatial ability for STEM domains: Aligning over 50 years of cumulative psychological knowledge solidifies its importance. Journal of Educational Psychology, 101(4), 817–835. [Google Scholar] [CrossRef]
- Wang, J.,  & Hejazi Moghadam, S. (2017, March 8–11). Diversity barriers in K-12 computer science education: Structural and social [Conference session]. 2017 ACM SIGCSE Technical Symposium on Computer Science Education (pp. 615–620), Seattle, WA, USA. [Google Scholar] [CrossRef]
- Williams, D. R., Priest, N.,  & Anderson, N. B. (2016). Understanding associations among race, socioeconomic status, and health: Patterns and prospects. Health Psychology, 35(4), 407–411. [Google Scholar] [CrossRef]
- Wing, J. M. (2006). Computational thinking. Communications of the ACM, 49(3), 33–35. [Google Scholar] [CrossRef]
- Wing, J. M. (2011). Research Notebook: Computational thinking—What and why? In The link magazine. Carnegie Mellon. Available online: https://www.cs.cmu.edu/link/research-notebook-computational-thinking-what-and-why (accessed on 16 December 2024).
- Yang, W., Su, J.,  & Li, H. (2024). Demystifying early childhood computational thinking: An umbrella review to upgrade the field. Future in Educational Research, 2, 458–477. [Google Scholar] [CrossRef]
|  | Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. | 
      
© 2024 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).