Abstract
As professional development (PD) programs aimed at improving early childhood educators’ teaching efficacy in integrated STEM (science, technology, engineering, mathematics) become more prevalent, understanding how best to evaluate their outcomes is increasingly important. This study compared two self-report survey designs commonly adopted in program evaluation—traditional pre-test–post-test (TPP) and retrospective pre-test (RPP)–post-test—within a year-long integrated STEM PD initiative for early childhood educators. Twenty-five educators completed the TPP survey, fifty-five completed the RPP survey, and a subset of twenty-four completed both. This study investigates whether these two designs differ in detecting change in teaching efficacy in literacy, mathematics, science, and nutrition. Findings revealed statistically significant increases across all content areas using both survey methods, with large effect sizes. Comparisons between traditional and retrospective pre-test scores showed no statistically significant differences, suggesting that response shift bias may not have meaningfully affected results in this context. The findings indicate that either survey approach can be appropriate for measuring affective outcomes such as self-efficacy. Informed by these findings and prior research, this study concludes that the choice between traditional and retrospective survey designs should be guided by practical considerations, such as program setting, time and efficiency, and constructs measured, and not only by the validity and reliability of the specific survey design. These results offer valuable guidance for evaluators seeking efficient and valid tools for assessing PD in early childhood teacher education and suggest the need for further research exploring different self-report measures in diverse educational contexts.
1. Introduction
Early childhood educators play a critical role in fostering young children’s curiosity, interest, and foundational understandings of content areas (Nesmith & Cooper, 2020). Yet, many report feeling underprepared or lacking confidence in teaching science, technology, engineering, and mathematics (STEM) in ways that are developmentally appropriate, playful, and integrated with other content (Brenneman et al., 2019). As a result, professional development (PD) programs aimed at enhancing early childhood educators’ competencies in teaching integrated STEM are on the rise (e.g., Erol & İvrendi, 2024; Maiorca et al., 2023). Strengthening early childhood educators’ pedagogical knowledge and confidence in teaching STEM is essential not only for improving instructional quality but also for sustaining the early childhood teaching workforce (Kewalramani et al., 2025).
To gain an effective understanding of the outcome of PD participation, robust evaluation efforts are necessary (Darling-Hammond et al., 2017; Smylie, 2014). Evaluation helps illuminate not just whether educators learn as a result of PD, but how learning reflects deeper shifts in their perceptions of themselves as professionals. While many tools exist for assessing PD outcomes, self-report measures are commonly used and particularly useful for evaluating affective constructs such as perceptions, attitudes, and values, when objective performance metrics are either unfeasible or unsuitable for the given context (Hill, 2019). However, methodological debates persist regarding how best to collect self-report data to accurately reflect participants’ growth. Although numerous studies have explored PD in early childhood STEM education (see reviews by Brunsek et al., 2020; Egert et al., 2018; MacDonald et al., 2021), fewer have examined the methodological validity of the approaches used to assess change in participating educators’ beliefs and perceptions.
This study investigates early childhood educators’ self-reported changes in teaching self-efficacy across four domains (i.e., literacy, mathematics, nutrition, and science) following participation in an integrated STEM PD program called Seeds to STEM (S2S). More importantly, it compares two commonly used survey designs in PD evaluation: the traditional pre-test–post-test (TPP) and the retrospective pre-test–post-test (RPP). In the TPP design, the same set of questions is administered twice, before (time 1) and after (time 2) the program intervention, to look for growth or differences in participant outcomes as a result of engaging in the program (Bell, 2010). In the RPP design, participants are tasked with answering the same set of questions twice, reflecting first on “then” (pre-intervention) and second on “now” (post-intervention) in one sitting after a program has been completed (Howard, 1980). By examining how each design captures educators’ change in perception, this study contributes both methodologically and substantively to the field of early childhood teacher preparation. Methodologically, we offer insight into how evaluation design influences our understanding of teacher growth and professional identity development. Substantively, this study highlights how PD participation can strengthen early childhood educators’ sense of efficacy in teaching integrated STEM subjects. Practical considerations are also discussed to help evaluators make informed decisions about design and data collection. Three research questions were addressed:
RQ1. To what extent did early childhood educators’ self-reported teaching efficacy (outcome expectancy and self-efficacy) in literacy, mathematics, nutrition, and science significantly change as a result of participating in S2S when using a TPP survey design?
RQ2. To what extent did early childhood educators’ self-reported teaching efficacy (outcome expectancy and self-efficacy) in literacy, mathematics, nutrition, and science significantly change as a result of participating in S2S when using an RPP survey method?
RQ3. Is there a statistically significant difference in early childhood educators’ self-reported baseline (TPP) and retrospective-pre (RPP) teaching efficacy (outcome expectancy and self-efficacy) in literacy, mathematics, nutrition, and science?
2. Literature Review
Evaluating Perceived Outcomes of Early Childhood PD.
Professional development (PD) plays a vital role in enhancing instructional quality and supporting child development in early childhood settings (Egert et al., 2018). As PD initiatives increasingly emphasize integrated approaches to STEM teaching, evaluating their outcomes has become essential and methodologically challenging. Effective PD evaluation requires tools that capture change not only in teachers’ knowledge and practices but also in affective domains such as self-efficacy. There are two key concepts within Bandura’s (1977, 1997) self-efficacy theory forming the theoretical foundation of this study: personal teaching efficacy expectancy refers to an individual’s estimate of their ability to perform an activity, while outcome expectancy pertains to the anticipated result of a behavior. Importantly, self-efficacy is not a fixed trait and is affected by various factors. Empirical research has demonstrated that self-efficacy can be influenced by PD (e.g., Pfitzner-Eden, 2016; von Suchodoletz et al., 2018). Strong teacher efficacy beliefs have been associated with improved instructional quality, career engagement, and leadership practices among early childhood educators (Guo et al., 2010; Lipscomb et al., 2022; Pitkäniemi et al., 2025).
Within the context of integrated STEM, teacher efficacy becomes particularly critical. Early childhood educators often report lower confidence in teaching science and engineering compared to literacy or mathematics, citing limited content knowledge, insufficient preparation, and perceived developmental inappropriateness of STEM for young learners (Brenneman et al., 2019; MacDonald et al., 2021). Implementing integrated STEM requires teachers to make interdisciplinary connections, which can be challenging for educators with limited STEM knowledge and experience (Kelley & Knowles, 2016). Research suggests well-designed PD can enhance educators’ self-efficacy in teaching integrated STEM (DeJarnette, 2018; Zhou et al., 2023).
Given the emphasis of PD on professional growth both cognitively and affectively, teachers’ perceptions and beliefs, such as self-efficacy, serve as key indicators of program impact and are therefore important constructs to measure in PD evaluation. As these constructs are inherently subjective and evolve through reflection, self-report surveys are among the most common instruments for assessing PD outcomes (see Liu et al., 2025 for review). Surveys offer a direct way to capture educators’ perceptions, beliefs, and attitudes that are not easily observable (Krosnick, 1999). Although alternatives (e.g., classroom observations, interviews) can provide richer contextual information, they are often resource-intensive and less feasible in PD evaluation contexts. Self-report surveys therefore provide a practical and scalable method for measuring perceived outcomes in early childhood PD. Importantly, the methodological design of surveys, dictating how and when self-report data are collected, is essential to ensure the validity of PD evaluation findings.
3. Survey Designs for Measuring PD Outcomes
Understanding how different survey designs capture change in affective domains is critical for evaluating teacher PD outcomes in early childhood contexts. The following sections review two commonly employed self-report approaches, the traditional pre-test–post-test (TPP) and the retrospective pre-test–post-test (RPP), and examine their methodological strengths, limitations, and existing empirical evidence comparing these designs in educational contexts.
4. Traditional Pre-Test–Post-Test Design
TPP designs enable the establishment of baseline measurements of participants’ knowledge, skills, attitudes, or behaviors before an intervention, thus providing a clear comparison point (Pratt et al., 2000). Additionally, collecting pre-test scores can reduce error variance, resulting in more robust statistical analyses than those obtained from survey designs lacking pre-test data (Dimitrov & Rumrill, 2003). However, numerous challenges exist in collecting and utilizing true pre–post data to effectively evaluate the perception outcome among program participants. One major concern is response shift bias, which occurs when an individual’s evaluation reference regarding the target construct being assessed shifts as a result of the intervention (Howard, 1980; Howard & Dailey, 1979).
Response shift bias can be observed when participants initially rate themselves highly on a pre-test due to “meta-ignorance” (being unaware of their ignorance; Dunning, 2011), but then rate themselves lower on the post-test, as they become more aware of the room for improvement—even when their objective knowledge or skills have increased. For example, a pre-K teacher initially may rate their classroom management highly, but after PD, realize their standards were previously too low, leading to the adjustment of their internal scale. In this case, a ceiling effect is created where the pre-test scores are so high that there is little room for improvement in the post-test (Chyung et al., 2020; K. D. Edwards & Soland, 2024), leading to the underrepresentation of change in growth and disguise the true outcome of participating in educational programs, as empirically demonstrated in various studies (e.g., Drennan & Hyde, 2008; Goedhart & Hoogstraten, 1992; Kowalski, 2023; Pratt et al., 2000).
TPP has other drawbacks regarding participant recruitment, matching, and internal validity. Since self-reported data is collected twice, TPP is critiqued for its time and resource-consuming nature for all parties involved (the program, evaluators, and participants; Schiekirka et al., 2013). Getting the same participants to voluntarily complete both pre- and post-surveys poses a practical challenge in program evaluation. Matching participants from pre- to post—while maintaining anonymity can also be problematic if participants do not use the same information or forget their identifiers (e.g., ID numbers; Geldhof et al., 2018). Furthermore, for longitudinal programs, history threats and maturation threats may arise and compromise the internal validity of the outcomes (Cook & Campbell, 1979), meaning any significant change identified when comparing pre- and post-self-report results may be caused by factors emerging during the program duration other than the intervention itself. Despite empirical evidence demonstrating the prevalence of response shift bias and other disadvantages, many evaluators continue to favor TPP because of its historical roots and familiarity.
5. Retrospective Pre-Test–Post-Test Design
Given these various validity concerns with the TPP design described in the previous section, RPP has been advocated by some scholars as an alternative to TPP in program evaluation. Compared to TPP, the use of a retrospective pre-test helps reduce response shift bias and provides a more economical and efficient means to collect evaluation data (Little et al., 2019; Sibthorp et al., 2007). Another benefit of RPP is its potential to facilitate feelings of efficacy and provide reflective learning opportunities among participants after the program (Hill & Betz, 2005). A retrospective design is recommended, particularly when the survey focuses on measuring change in affective constructs such as beliefs, preferences, attitudes, and values (Kowalski, 2023; Little et al., 2019). Generally, RPP tends to indicate larger program effects than TPP, partially due to the minimizing of response shift biases and ceiling effects seen in TPP designs (Geldhof et al., 2018).
An RPP design is not without its drawbacks, however. In some cases, adopting retrospective ratings instead of true pre–post self-reports may increase bias rather than reduce it: recall bias (or memory distortion) theory calls into question the accuracy of a retrospective pre-test (Schwartz & Rapkin, 2004). The elapsed time between data collection and the recalled experience is critical for recall accuracy, indicating that longer intervals may lead to greater distortion at the “then test.” (Hipp et al., 2020) Additionally, personal recall theory suggests that retrospective reports can be reconstructed based on individuals’ implicit theory of stability or change (Pearson et al., 1992; Ross, 1989). In other words, regardless of the extent of actual change from the program, participants may underrate change when operating under an implicit theory of stability and overrate change when dictated by a theory of change. Other sources of validity concern in RPP surveys include selection bias, participants’ effort justification, impression management/social desirability, and motivation to present oneself better than the past self (e.g., Hill & Betz, 2005; Talari & Goyal, 2020). Inflated estimates of change are likely to be found, particularly when the “then” and “now” items are presented next to each other (Hill, 2019), which also lengthens the survey and becomes more burdensome for participants in one sitting.
6. Existing Research Comparing TPP and RPP
With the strengths and limitations of both TPP and RPP designs established, it is important to consider empirical evidence comparing their performance in practice. Evidence was found that the two designs do not always produce different outcomes. Comparing results from 17 items in an adult nutrition education program’s behavior survey between traditional and retrospective pre-test, Auld et al. (2017) identified only one item that yielded significantly different results, suggesting that the same outcomes could be found with either approach. This finding is consistent with an earlier study by Bhanji et al. (2012), where medical students enrolled in a pediatric course were able to identify learning using either a TPP or RPP design for self-assessment. Interestingly, the self-identified change in learning using either survey design was not significantly correlated with the change in knowledge identified on a multiple-choice exam (an objective measure), meaning that while students were able to “identify” that learning happened using self-reports (surveys), they were not able to accurately “quantify” learning without objective measures.
This discrepancy between self-reported and objective measurement results illuminates a potentially larger issue with self-reported measures. Either prospective or retrospective self-report ratings have flaws and cannot replace objective measures. Nonetheless, objective measures are not always feasible in teacher education evaluation work and are not ideal for assessing subjective changes in affective constructs (e.g., beliefs, values, attitudes). For teacher PD evaluators, the choice between different designs or measures is often a tradeoff between the scientific rigor required to accurately evaluate programs (likely with time- and resource-intensive approaches) and a desire to be as unobtrusive for participants as possible (Hill & Betz, 2005). This interplay between methodological robustness and participant burden is embedded in the debate and design choices between traditional and retrospective approaches.
Despite the ongoing discourse surrounding the efficacy of TPP and RPP designs, empirical studies often lack a comprehensive examination of the psychometric properties and validity evidence for the self-report measures deemed essential to support any comparison of different designs (Hill, 2019). Some of these programs that served as the context of comparison only spanned several days (e.g., Bhanji et al., 2012; Kowalski, 2023). Given that recall accuracy is related to the time passed since the event, it is meaningful to compare findings from RPP and TPP designs for longer PD programs. Empirical research also tends to focus on item-level comparison (e.g., Drennan & Hyde, 2008; Kowalski, 2023; Pratt et al., 2000), leaving a gap in understanding how these designs perform at a construct level. This gap is especially notable in early childhood education where (a) self-report measures including surveys are commonly adopted to evaluate PD initiatives, (b) unique workforce characteristics including high program variability, diverse entry pathways, and varied professional experiences may influence how early childhood educators interact with self-report instruments (e.g., Bassok et al., 2016; Watts et al., 2023), and (c) programs often operate with limited funding and staffing, coupled with high turnover and short release time (Whitebook et al., 2018), making evaluation approaches that minimize burden while maintaining validity especially necessary. Due to these reasons, understanding how self-report measures function in early childhood contexts addresses a unique and practical need in early childhood PD evaluation.
Comparing different survey designs is therefore necessary in helping evaluators make informed decisions about particular measures adopted. Set in the context of a year-long pre-K STEM-related teacher learning program, this study compared results from traditional and retrospective pre–post-surveys on eight sub-scales of a validated measure of teaching self-efficacy, ultimately aiming to elucidate whether there are significant differences in perception outcome detected during evaluation when data are collected via these two designs from the same participants. By doing so, we sought to contribute to the larger dialogue concerning the methodological choice between TPP and RPP designs to measure change in teacher participants’ self-reported outcomes as a result of participating in early childhood teacher PD.
7. Methods
7.1. Study Context
The Seeds to STEM (S2S) project is a year-long integrated STEM and literacy PD initiative for early childhood educators focused on gardening and funded by the National Institutes of Health. A main goal of S2S is to support early childhood educators in assisting young children aged 3–5, along with their families, in developing informed and nutritious eating habits. The year-long PD sequence begins with a three-hour workshop conducted at convenient locations, such as early learning centers or museums. Following this initial training, early childhood educators receive individualized coaching to support mathematics, science, nutrition, and literacy curriculum integration in their classrooms. Each site is assigned a team of two to three coaches, who conduct at least four annual visits to model instructional techniques and provide direct feedback. S2S curriculum is informed by established pedagogical frameworks and museum-based educational practices, guided by three fundamental principles: (a) interactive, play-based STEM learning (Aikenhead, 2001, 2006); (b) fostering curiosity through role-playing and exploration (S. Edwards & Edick, 2013; NAEYC, 2002); and (c) a child-centered instructional approach (Lerkkanen et al., 2016). Early childhood educational best practices intentionally integrate literacy because it underpins nearly all early learning. Accordingly, S2S incorporated targeted literacy elements within all 16 inquiry-based STEM and nutrition lessons to support development across domains (see Koskey et al., 2025). Additionally, the curriculum aligns with the Head Start Health (2013) guidelines and the Next Generation Science Standards (2013), ensuring its consistency with recognized early learning objectives. Currently, S2S implementation is underway in both Los Angeles, CA, and Philadelphia, PA, across a variety of early learning settings, such as public and private childcare centers, preschools, and home-based facilities.
7.2. Instrumentation
As part of the evaluation for S2S, multiple surveys were developed in alignment with the desired PD outcomes, and each underwent a rigorous validation process (see Koskey et al., 2025; May et al., 2024; May et al., 2025). One survey used in evaluation with early childhood educators is entitled the Preschool Teaching Efficacy Belief Instruments (P-TEBI; May et al., 2022). The P-TEBI was modified from the widely used Science Teaching Efficacy Belief Instrument (STEBI; Riggs & Enochs, 1989) for use with early childhood educators to measure their teaching self-efficacy across four content areas (science, mathematics, nutrition, and literacy) due to the integrated nature of S2S. Bandura’s (1977, 1997) self-efficacy theory guided the development of the original STEBI and its adaptation for the P-TEBI, resulting in two overarching P-TEBI scales: Preschool Personal Teaching Efficacy Beliefs (PPTEB) and Preschool Teaching Outcome Expectancy Beliefs (PTOEB).
The P-TEBI consists of 40 items in total (PPTEB = 20 items; PTOEB = 20 items), with each scale containing five items related to teaching self-efficacy across the four content areas. An example item from the PPTEB scale is, “I know enough about science to effectively teach it to my students.” An example from the PTOEB scale is, “When the skills of students improve, it is often because their teacher found a better way of teaching” (see May et al., 2022, for full set of items). Statements were tailored for each content area and rated on a 4-point Likert-type scale (1 = “Strongly Disagree”, 2 = “Disagree”, 3 = “Agree”, 4 = “Strongly Agree”). For the retrospective survey, each section (representing the four content areas) contained identical items presented twice under separate subsections: a “Then” version and a “Now” version. The only difference between the two subsections was the main instructional stem (e.g., “Circle your agreement for Science Teaching Beliefs before the Program” and “Circle your agreement for Science Teaching Beliefs after the Program”). A prior measurement study (Koskey et al., 2025) documented strong reliability for the two main scales (PPTEB = 0.97; PTOEB = 0.96) and across the content-specific sub-scales (ranging from 0.82 to 0.99) when used with a preschool teacher sample. Furthermore, robust evidence of internal structure has been reported for the P-TEBI, with each content-specific subscale functioning as a unidimensional construct (May et al., 2022). Thus, consistent with guidelines for the intended P-TEBI score interpretation and use (May et al., 2022), the current study examines differences in early childhood educators’ content-specific teaching efficacy and outcome expectancy across TPP and RPP survey designs.
7.3. Data Collection and Sample
Early childhood educators participating in S2S were invited to complete the P-TEBI survey at two time points: before the start of programming (“Pre”) and once they had completed S2S where the same set of items were administered tasking them to reflect on both “then”—before the program (“Retro Pre”), and “now”—after the program (“Post”) on the same page. All surveys were administered in paper–pencil format for easier access to participants. Early childhood educators who completed the survey were then assigned IDs and matched for anonymity purposes.
All early childhood educators participating in S2S (N = 55 at the time of this study) were asked to complete surveys as part of the program’s evaluation. Educators received the instructional materials necessary to deliver S2S lessons, but no additional incentives were provided for participation in surveys. To be included in the study sample, educators were required to complete at least 80% of survey items related to a research question. Thus, sample sizes differ across our study’s questions. Twenty-five (50%) educators participated in the TPP survey (consisting of Pre and Post, addressing RQ1), 55 (100%) educators participated in the RPP survey (consisting of Retro Pre and Post, addressing RQ2). A subset (n = 24, 44%) took part in both surveys at two time points, meaning that they completed both Pre and Retro Pre surveys (addressing RQ3). Table 1 summarizes the key demographic features of participants forming three samples that addressed each research question. Across samples, participants were predominantly female, largely Black or African American, and most held a college degree. The majority served as Lead Teachers, with slightly more educators located in Los Angeles than in Philadelphia. Participants worked primarily in center-based early childhood settings, followed by home- and school-based programs (see Table 1 for details). Regardless of geographic location or early childhood setting, S2S was designed for consistent implementation and delivery across sites.
Table 1.
Demographic Features of Study Participants.
8. Analysis
For each of the eight P-TEBI subscales, the mean score was computed at the relevant time points: Pre (for the TPP design), Retro Pre (for the RPP design), and Post. Because all research questions (RQs) focus on within-participant comparison and each content-specific subscale functions as a unidimensional construct (May et al., 2022), a series of paired-sample t-tests (two-tailed) were used to examine the difference in group means for each dependent variable (e.g., mathematics self-efficacy). For RQ1 (TPP design), paired t-tests compared Pre and Post scores to determine whether self-efficacy and outcome expectancy changed over the course of the PD when measured with a traditional pre-test–post-test design. For RQ2 (RPP design), paired t-tests compared Retro Pre and Post scores to assess perception change captured through a retrospective pre-test–post-test design. For RQ3 (baseline comparison), paired t-tests compared Pre and Retro Pre scores among participants who completed both measures to examine whether the two survey designs produced statistically different baseline estimates. Additionally, effect sizes (eta-squared) were calculated for each paired-sample t-test conducted. Whereas statistical significance indicates that the differences between group means were unlikely to occur by chance and is influenced by sample size, effect size provides a meaningful measure of the magnitude of group differences or change over time (Tabachnick & Fidell, 2019).
All analyses were run in SPSS (Version 29).
9. Results
9.1. RQ1: Comparing Traditional Pre- and Post-Survey Results
Paired-sample t-tests revealed statistically significant increases across content areas and subscales from traditional pre-PD to post-PD survey administration (p-values ranged <0.001–0.05). Effect sizes were large regardless of the content area and subscale (η2 ranged 0.23–0.49), indicating 23–49% of the variance in efficacy subscales can be explained by program participation. Table 2 provides the sample sizes, mean ratings, t-statistics, and effect sizes (η2) when comparing traditional pre- and post-survey responses.
Table 2.
Statistical Results Comparing Traditional Pre- to Post-Survey Responses.
9.2. RQ2: Comparing Retrospective Pre- and Post-Survey Results
Similarly, paired-sample t-tests revealed statistically significant increases across content areas and subscales from retrospective pre-PD to post-PD survey administration (p-value < 0.001). All effect sizes were again considered large irrespective of content area and subscale (η2 ranged 0.24–0.42). Findings suggest 24–42% of the variance in efficacy subscales can be accounted for by program participation. Table 3 provides the sample sizes, mean ratings, t-statistics, and effect sizes (η2) when comparing retrospective pre- and post-survey responses.
Table 3.
Statistical Results Comparing Retrospective Pre- to Post-Survey Responses.
9.3. RQ3: Comparing Traditional Pre-Survey Results and Retrospective Pre-Survey Results
Paired-sample t-tests showed no significant difference between Pre and Retro Pre (p-value ranged 0.071–0.801; see Table 4). Further, effect sizes were small (η2 ranged 0.003–0.015) in subscales across three content areas (literacy, math, and science), meaning only approximately 1% or less variance in efficacy subscales could be explained by the type of pre-tests administered (traditional versus retrospective). For nutrition, the effect size was considered medium for the self-efficacy subscale (η2 = 0.120) and large for the outcome expectancy subscale (η2 = 0.154). This suggests more variance in nutrition efficacy subscales may be explained by the type of pre-tests compared to other content areas, but results were not statistically different.
Table 4.
Statistical Results Comparing Traditional Pre-tests and Retrospective Pre-tests.
10. Discussion
This study addressed the need for research comparing the effectiveness of traditional and retrospective pre–post-survey designs in evaluating changes among early childhood educators participating in professional development (PD). Unlike prior studies that have examined change at the item level (e.g., Auld et al., 2017; Drennan & Hyde, 2008; Kowalski, 2023; Pratt et al., 2000), this study compared how the two designs performed at a construct level, focusing on teaching self-efficacy across multiple domains. Analysis showed that S2S teachers reported statistically significant growth in their teaching self-efficacy across content areas from pre-PD to post-PD survey administration, regardless of the design (TPP or RPP). Additionally, there were no statistically significant differences between traditional and retrospective pre-test self-efficacy scores, regardless of the subscale and content area. Results from this study align with previous research, which demonstrated that there is little difference between TPP and RPP methods (e.g., Auld et al., 2017; Bhanji et al., 2012). While other program evaluators have previously advocated for an RPP design (Little et al., 2019) or utilization of both TPP and RPP (Auld et al., 2017; Bhanji et al., 2012), our study suggested that either design yielded comparable results in measuring change in teaching efficacy within the context of early childhood educator professional development. Substantively, these findings suggest that well-designed, long-term integrated STEM PD can strengthen early childhood educators’ sense of efficacy in teaching integrated STEM subjects, aligning with prior research (DeJarnette, 2018; Zhou et al., 2023).
Two factors may help explain why the TPP and RPP designs yielded comparable results in this study. First, the year-long PD program emphasized ongoing cycles of teacher reflection, which may have reduced response-shift bias effects by helping participants recalibrate their understanding of their knowledge and skills throughout the program. Second, the P-TEBI survey employed in the current study to measure changes in Program participants’ teaching self-efficacy underwent nearly a full year of development and validation (May et al., 2022). The high-level instrument quality supported the validity of inferences drawn from the resulting data and likely strengthened the consistency of measurement across survey designs. Together, these features of the Program and instrument may have contributed to the similarity in results observed between the TPP and RPP approaches in our study context.
10.1. Implications for Early Childhood Teacher Program Evaluation
Since teacher PD effectiveness is directly tied to instructional quality and workforce retention (e.g., Kewalramani et al., 2025), improving PD evaluation through appropriate selection of survey design can support early childhood policy efforts aimed at strengthening the early childhood education workforce. Given comparative results found in growth in perceptions with TPP and RPP methods, it is important for early childhood PD evaluators to consider their setting and allocated resources when making methodological decisions. The following sections present research-informed considerations evaluators should think through when selecting a survey type and its distribution strategy, informed by the findings from this study and prior research.
10.2. Time and Efficiency Considerations
Although many evaluators continue to favor traditional pre–post designs, the TPP method has clear drawbacks regarding time and efficiency (e.g., Schiekirka et al., 2013). In early learning settings, where teachers balance demanding schedules, these challenges can be amplified. TPP requires multiple instances of data collection (i.e., before and after the intervention). Logically, this requires a greater time commitment on behalf of both participants and researchers/evaluators compared to RPP, which uses a single instance of data collection. Increased incentives may be needed to encourage participants to complete a survey at both time points. Additionally, matching participants from pre- to post- while maintaining anonymity can be problematic if participants do not use the same information or forget their identifiers (e.g., ID numbers; Geldhof et al., 2018). When considered alongside our findings, these factors suggest that RPP survey designs may be more advantageous for early childhood PD evaluators in terms of time and efficiency—a conclusion also supported by prior research (e.g., Little et al., 2019).
Considerations of time are also related to theoretical concerns regarding response shift bias. TPP designs have been critiqued for their sensitivity to response shift bias (Chyung et al., 2020; K. D. Edwards & Soland, 2024; Kowalski, 2023). However, the similar traditional and retrospective pre-test scores found in this study indicate that response shift bias may be more of a theoretical rather than practical concern in our particular context, a similar conclusion to those from several prior studies comparing these survey designs (e.g., Auld et al., 2017; Bhanji et al., 2012). As mentioned, this lack of response shift bias identified might be attributed to the extended program design that emphasized ongoing teacher reflection. Teacher programs seeking to minimize response-shift bias may similarly benefit from incorporating intentional opportunities for participants to revisit and reassess their understanding throughout the learning process. Because our study is limited to one program focused on early childhood educators’ integrated STEM teaching outcomes, future research should investigate this same phenomenon using various samples and contexts to enhance the field’s understanding and better inform evaluation survey selection decisions.
11. Construct Considerations
The construct that is being measured should also be considered when selecting survey methods. Although previous scholars have recommended RPP designs for affective constructs (Kowalski, 2023; Little et al., 2019), our findings suggest that either TPP or RPP designs can be appropriate. Changes in cognitive constructs, however, may be more accurately measured using a TPP design because of concerns related to recall bias theory (Pratt et al., 2000; Schwartz & Rapkin, 2004). Therefore, TPP may be preferable when recall accuracy is a priority when evaluating early childhood teacher PD programs. In contrast, one may hypothesize that surveys measuring participants’ perceived knowledge are more susceptible to response shift bias and meta-ignorance, where participants have some knowledge about a construct prior to a program intervention but are unaware of how much they do not know (Dunning, 2011). Future research should examine a range of affective and cognitive constructs using different survey designs to identify which methods are best suited for each construct and context. Ultimately, it is essential for early childhood PD evaluators to thoroughly consider the specific construct (representing targeted teaching learning outcomes) they intend to measure in order to select the survey design most appropriate for effectively capturing change over time within that construct.
12. Limitations
This study has several limitations. First, the same participants were not used to address all research questions due to the voluntary nature of participation and inevitable attrition common in longitudinal research and program evaluations (e.g., Beddoes, 2024; Rathore, 2022). As a result, group differences such as variations in engagement may have influenced the findings. For example, educators who took both TPP and RPP pre-tests may have been more engaged with S2S than those who only completed the RPP, potentially showing greater change on the TPP comparison than would have happened if all RPP respondents had taken the TPP pre-test. Sample variation also has implications for data analytic decisions. For example, the use of repeated-measures ANOVA would be more robust than using multiple t-tests, and post hoc analysis would offer the opportunity to examine where group differences are located, but the use of repeated-measures ANOVA would limit the sample to individuals who have no missing data (i.e., pre, retro-pre, post). Future studies should aim to use consistent samples across survey designs or examine how participant characteristics contribute to differences in perception outcomes. Second, while the study compares TPP and RPP, two of the three common approaches for administering self-reported measures, it does not include a direct measure of perceived change, where participants report perceived change directly at the conclusion of a program without any pre- and post-tests. Future research could incorporate such measures to provide a more comprehensive understanding of the relationships among TPP, RPP, and perceived change.
Admittedly, internal validity threats are inherent in both TPP and RPP designs. Both are susceptible to social desirability bias (Bhattacherjee, 2012), as with most self-report measures. If time and resources allow, evaluation efforts would benefit from paring these survey designs with more objective indicators of changes in participants’ knowledge and instructional practice to holistically assess program outcomes. Moreover, without an experimental design that controls confounding variables, perceived changes observed cannot be attributed solely to PD program participation. Although S2S was delivered in Philadelphia and Los Angeles to enhance external validity, contextual differences between sites remain a limitation, as they may have shaped teachers’ implementation experiences and survey responses, despite outcomes being analyzed in aggregate. Given the practical limitations of experimental designs in teacher PD evaluation, this study does not attempt to ascertain program impact. Rather, it contributes by offering important considerations under which TPP or RPP methods are most appropriate for capturing change in early childhood teacher participants. Future research should continue to refine the use of these approaches to advance outcome measurements in early childhood teacher PD contexts.
13. Concluding Thoughts
Methodologically rigorous evaluation is critical for advancing the quality of early childhood PD. This study contributes to that effort by demonstrating that both TPP and RPP survey designs yielded comparable results of change in teaching self-efficacy within the context of a year-long integrated STEM PD program, suggesting that both designs can be appropriate for assessing affective constructs such as teaching self-efficacy. For evaluators, the choice between these designs should be guided not only by psychometric considerations but also by the practical considerations of PD programs in early childhood education, such as time, efficiency, participant accessibility, and desired teacher learning outcomes. Beyond methodological considerations, our findings underscore the important role of integrated STEM PD in supporting early childhood educators’ self-efficacy. Finally, it is equally crucial that the measurement instrument (e.g., questionnaire, inventory, scale) is of high quality. Early childhood PD evaluators should be able to provide theoretical and/or empirical evidence supporting the validity and reliability of survey outcomes (AERA et al., 2014). Instruments that have undergone rigorous development and validation, with the P-TEBI as an example, effectively remove measurement concerns as a confounding factor when selecting between survey administration designs (i.e., TPP or RPP), allowing such decisions to be based on the specific evaluation context and available resources. Moving forward, these results point to several implications for PD evaluation policy and methodological practice. PD evaluators should prioritize context-appropriate survey designs, high-quality instruments, and evaluation systems well aligned with program goals. Future research and policy efforts should continue to refine guidance on selecting survey designs that balance methodological rigor with the practical realities of early childhood teacher education.
Author Contributions
Conceptualization, Y.F. and T.A.M.; methodology, Y.F. and T.A.M.; software, Y.F., C.H. and T.D.F.; validation, T.A.M. and K.L.K.K.; formal analysis, Y.F., C.H. and T.D.F.; investigation, Y.F., C.H. and T.D.F.; resources, T.A.M.; data curation, Y.F., C.H. and T.D.F.; writing—original draft preparation, Y.F., C.H., T.D.F. and T.A.M.; writing—review and editing, T.A.M. and K.L.K.K.; visualization, Y.F., C.H. and T.D.F.; supervision, T.A.M.; project administration, Y.F.; funding acquisition, T.A.M. All authors have read and agreed to the published version of the manuscript.
Funding
This research was funded by the National Institutes of Health grant number [R25GM142028].
Institutional Review Board Statement
Drexel University Institutional Review Board 2010008163, 10 June 2021.
Informed Consent Statement
Written informed consent for participation was obtained from all subjects involved in the study.
Data Availability Statement
Restrictions apply to the datasets. The datasets presented in this article are not readily available because they contain sensitive or personally identifiable information, and sharing them could compromise participant confidentiality and privacy. Requests to access the datasets should be directed to Yiyun Fan at yiyunfanfan@gmail.com.
Conflicts of Interest
The authors declare no conflict of interest.
References
- Aikenhead, G. S. (2001). Students’ ease in crossing cultural borders into school science. Science Education, 85(2), 180–188. [Google Scholar] [CrossRef]
- Aikenhead, G. S. (2006). Science education for everyday life: Evidence-based practice. Teachers College Press. [Google Scholar]
- American Educational Research Association (AERA), American Psychological Association & National Council on Measurement in Education. (2014). Standards for educational and psychological testing. American Educational Research Association. [Google Scholar]
- Auld, G., Baker, S., McGirr, K., Osborn, K. S., & Skaff, P. (2017). Confirming the reliability and validity of others’ evaluation tools before adopting for your programs. Journal of Nutrition Education and Behavior, 49, 441–450. [Google Scholar] [CrossRef] [PubMed]
- Bandura, A. (1977). Self-efficacy: Toward a unifying theory of behavior change. Psychological Review, 84(2), 191–215. [Google Scholar] [CrossRef] [PubMed]
- Bandura, A. (1997). Self-efficacy: The exercise of control. W.H. Freeman and Company. [Google Scholar]
- Bassok, D., Fitzpatrick, M., Greenberg, E., & Loeb, S. (2016). Within-and between-sector quality differences in early childhood education and care. Child Development, 87(5), 1627–1645. [Google Scholar] [CrossRef]
- Beddoes, K. (2024). Five years later: Lessons and insights from a longitudinal, mixed-methods study. International Journal of Social Research Methodology, 27(6), 805–810. [Google Scholar] [CrossRef]
- Bell, B. A. (2010). Pretest–posttest design. In N. J. Salkind (Ed.), Encyclopedia of research design. SAGE Publications. [Google Scholar]
- Bhanji, F., Gottesman, R., de Grave, W., Steinert, Y., & Winer, L. R. (2012). The retrospective pre–post: A practical method to evaluate learning from an educational program. Academic Emergency Medicine, 19, 189–194. [Google Scholar] [CrossRef]
- Bhattacherjee, A. (2012). Social science research: Principles, methods, and practices. University of South Florida. [Google Scholar]
- Brenneman, K., Lange, A., & Nayfeld, I. (2019). Integrating STEM into preschool education: Designing a professional development model in diverse settings. Early Childhood Education Journal, 47(1), 15–28. [Google Scholar] [CrossRef]
- Brunsek, A., Perlman, M., McMullen, E., Falenchuk, O., Fletcher, B., Nocita, G., Kamkar, N., & Shah, P. S. (2020). A meta-analysis and systematic review of the associations between professional development of early childhood educators and children’s outcomes. Early Childhood Research Quarterly, 53, 217–248. [Google Scholar] [CrossRef]
- Chyung, S. Y., Hutchinson, D., & Shamsy, J. A. (2020). Evidence-based survey design: Ceiling effects associated with response scales. Performance Improvement, 59(6), 6–13. [Google Scholar] [CrossRef]
- Cohen, J. (1988). Statistical power analysis for the behavioral sciences. Routledge. [Google Scholar] [CrossRef]
- Cook, T. D., & Campbell, D. T. (1979). Quasi-experimentation: Design & analysis issues for field settings. Houghton Mifflin. [Google Scholar]
- Darling-Hammond, L., Hyler, M. E., & Gardner, M. (2017). Effective teacher professional development. Learning Policy Institute. [Google Scholar]
- DeJarnette, N. K. (2018). Implementing STEAM in the early childhood classroom. European Journal of STEM Education, 3(3), 18. [Google Scholar] [CrossRef]
- Dimitrov, D. M., & Rumrill, P. D., Jr. (2003). Pretest-posttest designs and measurement of change. WORK: A Journal of Prevention, Assessment & Rehabilitation, 20(2), 159–165. [Google Scholar]
- Drennan, J., & Hyde, A. (2008). Controlling response shift bias: The use of the retrospective pre-test design in the evaluation of a master’s programme. Assessment & Evaluation in Higher Education, 33(6), 699–709. [Google Scholar] [CrossRef]
- Dunning, D. (2011). The Dunning-Kruger effect: On being ignorant of one’s own ignorance. In M. P. Zanna, & J. M. Olson (Eds.), Advances in experimental social psychology (Vol. 44, pp. 247–296). Elsevier Academic Press. [Google Scholar] [CrossRef]
- Edwards, K. D., & Soland, J. (2024). How scoring approaches impact estimates of growth in the presence of survey item ceiling effects. Applied Psychological Measurement, 48(3), 147–164. [Google Scholar] [CrossRef] [PubMed]
- Edwards, S., & Edick, N. A. (2013). Culturally responsive teaching for significant relationships. Journal of Praxis in Multicultural Education, 7(1), 4. [Google Scholar] [CrossRef]
- Egert, F., Fukkink, R. G., & Eckhardt, A. G. (2018). Impact of in-service professional development programs for early childhood teachers on quality ratings and child outcomes: A meta-analysis. Review of Educational Research, 88(3), 401–433. [Google Scholar] [CrossRef]
- Erol, A., & İvrendi, A. (2024). Stem professional development of early childhood teachers. Psychology in the Schools, 62(1), 86–112. [Google Scholar] [CrossRef]
- Geldhof, G. J., Warner, D. A., Finders, J. K., Thogmartin, A. A., Clark, A., & Longway, K. A. (2018). Revisiting the utility of retrospective pre-post designs: The need for mixed-method pilot data. Evaluation and Program Planning, 70, 83–89. [Google Scholar] [CrossRef]
- Goedhart, H., & Hoogstraten, J. (1992). The retrospective pretest and the role of pretest information in evaluation studies. Psychological Reports, 70, 699–704. [Google Scholar] [CrossRef]
- Guo, Y., Piasta, S. B., Justice, L. M., & Kaderavek, J. N. (2010). Relations among preschool teachers’ self-efficacy, classroom quality, and children’s language and literacy gains. Teaching and Teacher Education, 26(4), 1094–1103. [Google Scholar] [CrossRef]
- Head Start Health. (2013). Improving head start for school readiness act of 2007, pub. L. no. 110–134, 121 stat. 1363 (2007). Available online: https://www.congress.gov/110/plaws/publ134/PLAW-110publ134.pdf (accessed on 14 December 2025).
- Hill, L. G. (2019). Back to the future: Considerations in use and reporting of the retrospective pretest. International Journal of Behavioral Development, 44(2), 184–191. [Google Scholar] [CrossRef]
- Hill, L. G., & Betz, D. L. (2005). Revisiting the retrospective pretest. American Journal of Evaluation, 26(4), 501–517. [Google Scholar] [CrossRef]
- Hipp, L., Bünning, M., Munnes, S., & Sauermann, A. (2020). Problems and pitfalls of retrospective survey questions in COVID-19 studies. In U. Kohler (Ed.), Survey research methods (Vol. 14, No. 2, pp. 109–1145). European Survey Research Association. [Google Scholar] [CrossRef]
- Howard, G. S. (1980). Response-shift bias: A problem in evaluating interventions with pre/post self-reports. Evaluation Review, 4(1), 93–106. [Google Scholar] [CrossRef]
- Howard, G. S., & Dailey, P. R. (1979). Response-shift bias: A source of contamination of self-report measures. Journal of Applied Psychology, 64(2), 144–150. [Google Scholar] [CrossRef]
- Kelley, T. R., & Knowles, J. G. (2016). A conceptual framework for integrated STEM education. International Journal of STEM Education, 3(1), 11. [Google Scholar] [CrossRef]
- Kewalramani, S., Devi, A., & Ng, A. (2025). Supporting early childhood preservice teachers to effectively integrate STEM in their future teaching practice. Education Sciences, 15(2), 189. [Google Scholar] [CrossRef]
- Koskey, K. L. K., May, T. A., & Provinzano, K. (2025). Development and validation study of the Preschool Teaching Efficacy Belief Instruments. Early Education and Development, 1–22. [Google Scholar]
- Kowalski, M. J. (2023). Measuring changes with traditional and retrospective pre-posttest self-report surveys for a brief intervention program. Evaluation and Program Planning, 99, 102323. [Google Scholar] [CrossRef]
- Krosnick, J. A. (1999). Survey research. Annual Review of Psychology, 50, 537–567. [Google Scholar] [CrossRef]
- Lerkkanen, M. K., Kiuru, N., Pakarinen, E., Poikkeus, A. M., Rasku-Puttonen, H., Siekkinen, M., & Nurmi, J. E. (2016). Child-centered versus teacher-directed teaching practices: Associations with the development of academic skills in the first grade at school. Early Childhood Research Quarterly, 36, 145–156. [Google Scholar] [CrossRef]
- Lipscomb, S. T., Chandler, K. D., Abshire, C., Jaramillo, J., & Kothari, B. (2022). Early childhood teachers’ self-efficacy and professional support predict work engagement. Early Childhood Education Journal, 50(4), 675–685. [Google Scholar] [CrossRef]
- Little, T. D., Chang, R., Gorrall, B. K., Waggenspack, L., Fukuda, E., Allen, P. J., & Noam, G. G. (2019). The retrospective pretest–posttest design redux: On its validity as an alternative to traditional pretest–posttest measurement. International Journal of Behavioral Development, 44(2), 175–183. [Google Scholar] [CrossRef]
- Liu, J., Wang, K., & Pan, Z. (2025). The effectiveness of professional development in the self-efficacy of in-service teachers in STEM education: A meta-analysis. Behavioral Sciences, 15(10), 1364. [Google Scholar] [CrossRef] [PubMed]
- MacDonald, A., Danaia, L., Sikder, S., & Huser, C. (2021). Early childhood educators’ beliefs and confidence regarding STEM education. International Journal of Early Childhood, 53(3), 241–259. [Google Scholar] [CrossRef]
- Maiorca, C., Martin, J., Burton, M., Roberts, T., & Tripp, L. O. (2023). Model-eliciting activities: Pre-service teachers’ perceptions of integrated STEM. Education Sciences, 13(12), 1247. [Google Scholar] [CrossRef]
- May, T. A., Koskey, K. L. K., & Provinzano, K. (2024). Developing and validating the preschool nutrition education practices survey. Journal of Nutrition Education and Behavior, 56(8), 545–555. [Google Scholar] [CrossRef]
- May, T. A., Koskey, K. L. K., & Provinzano, K. (2025). Development and validation of the preschool Mathematics Education Practices Survey. Journal of Applied Measurement. Available online: https://jamntnu.net/issues.html (accessed on 14 December 2025).
- May, T. A., Provinzano, K. P., & Koskey, K. L. K. (2022). PreK teaching efficacy belief instrument suite (mathematics, science, nutrition, literacy). Available online: https://www.tandfonline.com/doi/abs/10.1080/10409289.2025.2585295 (accessed on 14 December 2025).
- NAEYC. (2002). Early learning standards: Creating the conditional for success. National Association for the Education of Young Children (NAEYC), National Association of Early Childhood Specialists in State Departments of Education (NAECS/SDE). Available online: https://www.naeyc.org/sites/default/files/globally-shared/downloads/PDFs/resources/position-statements/position_statement.pdf (accessed on 14 December 2025).
- Nesmith, S. M., & Cooper, S. (2020). Elementary STEM learning. In C. Johnson, M. Mohr-Schroeder, T. Moore, & L. English (Eds.), Handbook of research on STEM education (pp. 101–114). Routledge. [Google Scholar]
- Next Generation Science Standards (NGSS). (2013). Next generation science standards: For states, by states. Available online: https://www.nextgenscience.org (accessed on 14 December 2025).
- Pearson, R. W., Ross, M., & Dawes, R. M. (1992). Personal recall and the limits of retrospective questions in surveys. In J. M. Tanur (Ed.), Questions and survey questions: Meaning, memory, expression, and social interactions in survey (pp. 65–94). SAGE Publications. [Google Scholar]
- Pfitzner-Eden, F. (2016). Why do I feel more confident? Bandura’s sources of predict preservice teachers’ latent changes in teacher self-efficacy. Frontiers in Psychology, 7, 1486. [Google Scholar] [CrossRef]
- Pitkäniemi, H., Hirvonen, R., Heikka, J., & Suhonen, K. (2025). Teacher efficacy, its sources, and implementation in early childhood education. Early Childhood Education Journal, 53, 1705–1715. [Google Scholar] [CrossRef]
- Pratt, C. C., McGuigan, W. M., & Katzev, A. R. (2000). Measuring program outcomes: Using retrospective pretest methodology. American Journal of Evaluation, 21(3), 341–349. [Google Scholar] [CrossRef]
- Rathore, D. (2022). Overcoming data collection challenges and establishing trustworthiness: The need for flexibility and responsiveness in research. Waikato Journal of Education, 27(2), 47–51. [Google Scholar] [CrossRef]
- Riggs, I. M., & Enochs, L. G. (1989, March 30–April 1). Toward the development of an elementary teacher’s science teaching efficacy belief instrument. The Annual Meeting of the National Association for Research in Science Teaching, San Francisco, CA, USA. [Google Scholar]
- Ross, M. (1989). Relation of implicit theories to the construction of personal histories. Psychological Review, 96(2), 341–357. [Google Scholar] [CrossRef]
- Schiekirka, S., Reinhardt, D., Beibarth, T., Anders, S., Pukrop, T., & Raupach, T. (2013). Estimating learning outcomes from pre-and posttest student self-assessments: A longitudinal study. Academic Medicine, 88(3), 369–375. [Google Scholar] [CrossRef]
- Schwartz, C. E., & Rapkin, B. D. (2004). Toward a theoretical model of quality-of-life appraisal: Implications of findings from studies of response shift. Health and Quality of Life Outcomes, 2, 14. [Google Scholar] [CrossRef]
- Sibthorp, J., Paisley, K., Gookin, J., & Ward, P. (2007). Addressing response-shift bias: Retrospective pretests in recreation research and evaluation. Journal of Leisure Research, 39(2), 295–315. [Google Scholar] [CrossRef]
- Smylie, M. A. (2014). Teacher evaluation and the problem of professional development. Mid-Western Educational Researcher, 26(2), 97–111. Available online: http://www.mwera.org/MWER/volumes/v26/issue2/v26n2-Smylie-POLICY-BRIEFS.pdf (accessed on 14 December 2025).
- Tabachnick, B. G., & Fidell, L. S. (2019). Using multivariate statistics (7th ed.). Pearson. [Google Scholar]
- Talari, K., & Goyal, M. (2020). Retrospective studies–utility and caveats. Journal of the Royal College of Physicians of Edinburgh, 50(4), 398–402. [Google Scholar] [CrossRef]
- von Suchodoletz, A., Jamil, F. M., Larsen, R. A. A. A., & Hamre, B. K. (2018). Personal and contextual factors associated with growth in preschool teachers’ self-efficacy beliefs during a longitudinal professional development study. Teaching and Teacher Education, 75, 278–289. [Google Scholar] [CrossRef]
- Watts, T. W., Jenkins, J. M., Dodge, K. A., Carr, R. C., Sauval, M., Bai, Y., Escueta, M., Duer, J., Ladd, H., Muschkin, C., Peisner-Feinberg, E., & Ananat, E. (2023). Understanding heterogeneity in the impact of public preschool programs. Monographs of the Society for Research in Child Development, 88(1), 7–182. [Google Scholar] [CrossRef]
- Whitebook, M., McLean, C., & Austin, L. J. (2018). Early childhood workforce index 2018. Center for the Study of Child Care Employment, University of California. Available online: https://cscce.berkeley.edu/wp-content/uploads/2022/04/Early-Childhood-Workforce-Index-2018.pdf (accessed on 14 December 2025).
- Zhou, X., Shu, L., Xu, Z., & Padrón, Y. (2023). The effect of professional development on in-service STEM teachers’ self-efficacy: A meta-analysis of experimental studies. International Journal of STEM Education, 10(1), 37. [Google Scholar] [CrossRef]
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |
© 2025 by the authors. Licensee MDPI, Basel, Switzerland. This article is an open access article distributed under the terms and conditions of the Creative Commons Attribution (CC BY) license (https://creativecommons.org/licenses/by/4.0/).