Lessons Learned from 10 Experiments That Tested the Efﬁcacy and Assumptions of Hypothetical Learning Trajectories

: Although reformers have embraced learning trajectories (LT, also called learning progressions) as an important tool for improving mathematics education, the efﬁcacy and assumptions of LT-based instruction are largely unproven. The aim of a recently completed research project was to ﬁll this void. Fulﬁlling this aim was more challenging than many supporters of LT-based instruction might imagine. A total of 10 experiments were untaken, of which 5 demonstrated that LT-based instruction was signiﬁcantly more efﬁcacious than a counterfactual involving either a Teach-to-Target/Skip-Level approach (Assumption 1) or the same unordered activities (Assumption 2). The results of the remaining studies were non-signiﬁcant either for theoretical (2) or methodological (3) reasons. In the ﬁve indicating LTs’ efﬁcacy, we found that some LTs consists of levels that are facilitative conditions for the next higher level and, thus, may be helpful but perhaps not necessary for the subsequent level.


Introduction
A hypothetical learning trajectory (HLT) is an extension of a learning progression or learning trajectory (LT) that also includes instructional goals and activities [1]. Specifically, HLTs in mathematics education consist of three components [2][3][4]:

1.
A goal is the target developmental level. Goals are based on the structure of mathematics, societal needs, and research on children's thinking about and learning of mathematics and require input from experts in mathematics, mathematics education, educational policy, and developmental psychology [5][6][7].

2.
A developmental progression is a sequence of theoretically and research-based increasingly sophisticated patterns of thinking that most children pass on the way to achieving the goal or target. Theoretically, each level serves as a foundation for successful learning of subsequent levels.

3.
Instructional activities include theory and research-based curricular tasks and pedagogical strategies designed explicitly to promote the development of each level.
The conventional wisdom in the mathematics education community holds that HLTs are an important tool in improving mathematics education. Indeed, it may seem obvious that instruction (a) should promote lower levels of knowledge to the lay the foundation for a goal at higher level compared to teaching to a target (focusing directly on a goal) and (b) is more efficacious than using a project approach that entails instructional activities without regard to developmental order.
Such assumptions are consistent with the conclusions of an Institute of Education Sciences (IES) Practice Guide: Teaching math to Young Children [8]. The purpose of the IES Practice Guide was to review the research literature and make instructional recommendations based on this evidence and expert opinions. Frye et al. found moderate evidence Learning progressions have captured the imaginations and rhetoric of school reformers and education researchers as one possible elixir for getting K-12 education "on track" . . . Learning progressions and research on them have the potential to improve teaching and learning; however, we need to be cautious . . . The enthusiasm gathering around learning progressions might lead to giving heavy weight to one possible solution when experience show single solutions to education reform come and go.
More recently, Lobato and Walters [1] noted the empirical evidence supporting the efficacy and assumptions of learning progression/LT-based instruction is (still) surprisingly limited.
To provide such evidence, we proposed and IES funded the HLT Project, "Evaluating the Efficacy of Learning Trajectories in Early Mathematics". Sections 2-4 summarize the project's rationale, methods, and results, respectively. Spoiler alert: Corroborating the efficacy and basic assumptions of HLT-based instruction was challenging. Section 5 discusses theoretical reasons for our inconsistent findings and underscores why developing and utilizing the potential of HLT-based instruction is challenging, and Section 6 focuses on methodological reasons for some findings, with implications for future research projects. Section 7 summarizes our conclusions.

Goals
The overarching goal of our HLT Project was to rigorously evaluate the efficacy of using LTs as a curricular and pedagogical tool and the key assumptions on which HLT-based instruction is based. To ensure the findings were generalizable, we conducted multiple experiments across various mathematical topics and age groups. We studied the preschool and kindergarten ages because HLTs are particularly important for early childhood mathematics education for several interrelated reasons. One is that early childhood educators too often have minimal, if any, training on mathematics development and education. As a result, they frequently underestimate young children's (informal) mathematical knowledge, mechanically teach the lessons specified in a curriculum guide or textbook, and focus only on the most basic numeracy content, which many, or even most, children have already learned [11][12][13][14][15]. Indeed, because of a negative disposition towards mathematics instructions, many early childhood teachers do not set any mathematical goals, use any mathematical curriculum or resources, and rely on (hit-or-miss) opportunities that emerge from children's play or routine activities [13,[16][17][18][19][20]. The pedagogical knowledge and learning expectations of teachers of academically at-risk children are particularly unlikely to foster numeracy [21][22][23][24][25]. Another reason is that LTs have been well developed for the preschool-kindergarten age range.
We now turn to the two key assumptions of HLT-based instruction.
• Assumption 1. Instruction in which LT levels are taught consecutively (e.g., for children at level n, using instructional activities to foster level n + 1 and then n + 2 before instruction on a goal or target-level knowledge at level n + 3) results in greater learning than instruction that immediately and solely targets level n + 3 (or higher levels), namely the "Skip-Level" or "Teach-to-Target" approach. • Assumption 2. Instruction aligned with an LT sequence results in greater learning than instruction that either uses a traditional curriculum's activities and sequence (business as usual) or uses the same activities as those of the LT but chosen and ordered to fit a theme-based project.
The arguments pro and con for each assumption and the evidence regarding the efficacy of HLT-based instruction are addressed in turn.

Assumption 1
The first assumption is that instruction should move children from their present level to the next higher level and continue in this manner until the instructional goal is reached. Proponents of traditional didactic instruction (a Teach-to-Target approach) continue to argue that teaching to a skill-direct instruction and drill of target knowledge-is the most mathematically rigorous and efficient way to ensure accurate target-level knowledge (see [7,[26][27][28][29][30]). Such an approach avoids promoting the informal and error-prone strategies of lower levels and the slow movement through these lower levels. An example of this "Teach-to-Target" approach is the "worked examples" method-explicitly describing and illustrating how to solve a new type of problem, including the why (conceptual rationale) for each step [31][32][33]. Some evidence supports the Teach-to-Target approach [27,[34][35][36][37], although the research designs often do not include other research-validated approaches.
In contrast, those interested in educational reform have long recommended building on prior knowledge as a means overcoming limitations of rote memorization engendered by traditional, didactic instruction. For example, in his 1892 "Talk to Teachers", the eminent psychologist William James [38] advocated meaningful memorization: "When we wish to fix a new thing in a pupil's [mind], our . . . effort should not be so much to impress and retain it as to connect it with something already there . . . If we attend clearly to the connection, the connected thing will . . . likely . . . remain within recall". (pp. 101-102) In a similar vein, Piaget [39] argued that "the fundamental relation from the point of view of pedagogical . . . application" is not associations, but assimilation: "the integration of any sort of reality into [an existing] structure"(p. 16).
Since the late 19th century, when research on the development of mathematical knowledge exploded, educational reformers became increasingly interested in developing, promoting, and using such an approach [40]. A basic assumption for using HLT-based instruction is that it is more efficacious than teaching a target-level competence directly [2,8].
Consider, for example, the classic example of achieving fluency with basic sums such as 3 + 4 = 7. Traditional didactic instruction focuses on direct imposition of the knowledge: repeated exposure and practice of the basic facts, frequently accompanied by suppression of children's existing slow and sometimes error-prone informal strategies [41,42]. If such efforts fail to result in memorization by rote, exposure and practice are increased and the correct answer is provided if a child responds incorrectly or does not respond quickly [43].
In contrast to this one-phase approach, mathematics educators have long recommended achieving meaningful memorization using three phases [44][45][46]. In Phase 1, children are encouraged to develop efficient counting strategies to better detect patterns and relations among basic sums. In Phase 2, children next use discovered mathematical regularities to devise reasoning strategies such as the near-doubles reasoning strategy (e.g., 3 + 4 = [3 + 3] + 1 = 6 + 1 = 7). This strategy builds on prior knowledge by relating an unknown near double (3 + 4) to a known double (3 + 3 = 6) and a known add-1 combination (6 + 1 = 7). If this effort fails, knowledge of these prerequisites would be checked, and if need be, remedied. For example, if a child was not fluent with the add-1 combination 6 + 1 = 7, prerequisite knowledge for its fluency-number-after relations (e.g., when we count, the number after six is seven)-would then be checked. Once fluency with numberafter relations was achieved, remedial instruction would next focus on encouraging a child to recognize the connection between adding 1 and the structure of the counting sequencethat is, the number-after rule for adding 1 (e.g., the sum of 7 + 1 is the number after seven in the counting sequence or eight). In Phase 3, children achieve fluency by either automatizing reasoning strategies or internalizing families of related facts [2,[47][48][49].
Similar to this example, HLTs can highlight developmentally appropriate and important goals (e.g., the importance of number-after relations as a basis for fluency with basic sums) and help focus instructional efforts on them. HLTs underscore how children typically develop and the need to consider what they must already know to make progress and what level of instruction is within their comprehension (e.g., a child who does not know number-after relations is unlikely to achieve fluency with add-1 combinations, let alone near doubles such as 3 + 4) [50,51]. HLTs, then, spotlight the need for formative assessment to determine where children are developmentally on a progression, so that instruction can target their learning needs with meaningful and effective learning tasks. For these reasons and more, researchers, educators, and policy makers have recommended HLTs as a useful tool for teachers in helping them to understand, promote, and assess children's mathematical learning [2,4,8,47,52,53].

Assumption 2
The second assumption of an LT approach is that there is a sequence of such levels of learning and teaching that is determined by research-based developmental progressions and that instruction is more efficacious if it promotes each level in turn. Postulating that each level of knowledge builds hierarchically on the concepts and processes of the previous levels stands in contrast to some traditional early childhood curricular organizations: theme, project, and emergent approaches [54][55][56][57][58][59]. In these approaches, a theme (e.g., "colors"), a project (e.g., visiting an apple orchard and making applesauce), or an emergent issue (e.g., building a bus when children expressed interest in buses spontaneously) determines the sequencing of activities. For example, if the theme is colors, children are asked to sort by color; if it involves apples, children might count the seeds in an apple or cut them and talk about "halves". Thus, the activity is chosen for its fit to the classroom work, which is ostensibly more meaningful and connected for the child and thus will lead to greater learning.
In Experience and Education, Dewey [60] summarized the lessons he learned from his own efforts to reform education. He argued that instruction cannot simply consist of a hodgepodge of activities without clear educational purposes. Teachers must strive to provide educative experiences (experiences that lead to worthwhile learning or a basis for later learning), not mis-educative experiences (activities for the sake of activity and that may impede development). According to Dewey's "principle of interaction," educative experiences result "from an interaction of external factors (e.g., the nature of the subject matter and teaching practices) and internal factors (e.g., a child's developmental readiness and interests). Unless a theme, project, or emergent issue is carefully chosen and developed with important goals and students' range of developmental levels in mind, instruction may violate Dewey's principle of interaction and, thus, be inefficient, ineffective, or even detrimental. Some, many, or even most children may not be developmentally ready or developmentally too advanced for the instruction. Although careful integration of mathematics into daily routines and instruction in other areas can be valuable, doing so without regard to the mathematical goals and developmental progressions of an LT may be mis-educative. Although children's interest should guide instructional decisions, children's interests are malleable, and teachers can inspire new interests.

Existing Evidence of Efficacy and Its Limitations
Before we began the HLT Project, the following critical question had yet to be answered causally: "Which approach, HLT-based, Teach-to-Target, or Theme/Project/Emergentbased results in better mathematical outcomes for preschool children?" Although LT-based instruction is often recommended as a valuable educational tool, there is surprisingly little empirical support for this belief or its underlying assumptions [1,8]. Most research has focused on empirically validating the developmental levels of HLTs by using a cross-sectional methodology or tracking the progress of individuals over time (e.g., [61][62][63][64]). Relatively little research has involved closely examining the impact of instructional scaffolding on children's movement along an HLT compared to not doing so.
Moreover, although considerable research has shown interventions that have HLTs as a component are efficacious in promoting numeracy, little research has directly or systematically examined their unique contribution or assumptions [8]. For instance, a preschool curriculum based on HLTs promoted numeracy significantly more than did business-asusual instruction (effect size, 1.07) or an intervention organized by mathematical topics (effect size, 0.47 [65]). Although the HLT and topically based interventions were closely matched in terms of content and superior performance of the former might be due to using an HLT, the two curricula had other differences (e.g., different activities and integrated versus discrete content) that might account for the performance difference.

Methods
The HLT Project entailed scientific and rigorous tests not heretofore conducted on the HLT construct by designing experiments that had the following three characteristics:

1.
Ensured causal interpretation of the findings via Randomized Control Trials.

2.
Ensured a control group received an intervention that was as similar as possible to the HLT intervention, except for a single defining attribute of the HLT construct.

3.
Identified each participant's location on a LT at pretest and ensured an equivalent baseline for posttest comparison of interventions on the dependent measure(s).

Research Design to Test Assumption 1
To test the assumption that progressively teaching one level above a child's existing level on an LT should be more efficacious than skipping a level and directly teaching to the target level, seven experiments were undertaken that involved a comparison of a LT-based instruction and a control group, which received the same target-level instruction but skipped prior levels.

Research Design to Test Assumption 2
To test the assumption that presenting instruction in the developmental order hypothesized by a LT presumably matters, three experiments involved comparing an experimental group received LT-based training (activities ordered by an LT) with a counterfactual group, which, involved the same activities but not ordered by a LT and (typically) a business-asusual (BAU) control group, which received only classroom experiences. Table 1 shows that the results of the seven experiments that evaluated Assumption 1 produced different results. Four found that progressively teaching one level above a child's existing level on an LT was more efficacious than skipping a level and directly teaching to the target level: Experiment 3 [66], Experiment 4 [67], Experiment 7 [68], and Experiment 10 [69]. Unpublished Experiments 1, 2, and 9 had a positive impact but not above and beyond the Teach-to-Target intervention) due to methodological problems.

Results for Assumption 1
The results of Experiments 3, 4, and 7 indicate that the LT-based instructional approach is efficacious in various ways. For example, Experiment 3 indicated that children in the Teach-to-Target group showed an aversion to math, whereas the HLT-taught children exhibited engagement. HLT participants in Experiments 4 and 7 involving arithmetic showed growth not only in correct answers but also in the use of more sophisticated strategies. Indeed, although the Teach-to-Target intervention in the Experiment 7 had a heavier dosage of target-level arithmetic instruction, the HLT-based instruction produced significantly and (as measured by effect size) substantially more accurate solution at and above the target level. This is striking in that the counterfactual children spent all their instructional time at the target level, far more than the HLT children. Nevertheless, the HLT children scored higher on items measuring that level (and those measuring levels above) than the counterfactual children.

Results for Assumption 2
As Table 1 shows, the results of the three experiments that evaluated Assumption 2 similarly produced mixed results. Experiment 8 [70] indicated that using activities ordered by an HLT was more efficacious than with using the same, albeit unordered, activities. That is, the experiments testing Assumption 1 used mostly different activities. However, the child in these experiments experiences the same activities. Thus, the results specifically showed the importance of following the developmental progression.
However, Experiment 5 [71] and Experiment 6 [72,73] found that the HLT-based intervention produced significant learning but not significantly better than that involving the same unordered activities. The next two sections discuss possible reasons for the mixed results.

Discussion of Theoretical Issues
Why-despite our own belief in LT-based instructional approaches-was it so difficult to corroborate the efficacy and underlying assumptions of such an approach? Four theoretical factors might account for the inconsistent results.

Nature of the Relation between Successive Levels
Earlier levels in an HLT may support later levels either by facilitating the latter or serving as a developmental prerequisite (a necessary condition) for the target knowledge. As an example of a developmental prerequisite, consider two concepts in object counting: The count-to-cardinal concept, also known as the cardinality principle (CP), entails understanding that the last number word said when counting a set indicates the total number of items in that set (e.g., counting a set of five blocks as "one, two, three, four, five" and recognizing that there are "five' blocks in all). The cardinal-to-count concept serves as the conceptual basis for counting out a specified number of items: to produce a given quantity, count object objects to that number. Fuson [74] hypothesized that count-to-cardinal concept (or CP) serves as developmental prerequisite for the cardinal-to-count concept: indicates that a cardinal label of a set such as "five" indicates what the last number word would be if the set were counted. In essence, the cardinal-to-count concept is the inverse of the count-to-cardinal concept and serves as the rationale for the counting-out procedure. For instance, "five" in the request "give me five blocks" specifies that the counting-out process should stop when the count reaches five. With facilitative relations, a messy middle can be expected. That is, though success on an earlier facilitative level increases the probability of success on a later target knowledge, knowledge of any one facilitator may or may not be evident before the target knowledge emerges. With modest facilitators particularly, a child might skip one or even more levels, or appear to do so, and still learn higher target knowledge. This might account for why, in Experiment 5 [71] and Experiment 6 [72,73], the experimental intervention based on an HLT resulted in significantly improved patterning knowledge but not significantly better than the counterfactual intervention, which involved the same unordered activities. Teaching the levels in order was not crucial for promoting an advanced level of patterning knowledge.
For Experiment 3 [66], Experiment 4 [67], Experiment 7 [68], and Experiment 8 [70], the experimental intervention based on an HLT resulted in significant improvement above and beyond that of the counterfactual. Nevertheless, in these experiments, some control participants achieved success on the target level without instruction on precursor levels. Such results are consistent with an HLT that embodies strong facilitators, but not prerequisites, for the target knowledge. (Both prerequisite or necessary and facilitative relationships are postulated by Hierarchical Interactionalism [2].) In Table 1, the evidence indicated that the HLT in Experiment 7 was nearly a necessity for kindergartners with lowest entry level. These results suggest that the earliest levels in the HLT are more critical than later levels and probably unwise to skip and/or that the greater the distance between a child and the target level, the more important is the adjustment of instruction to the child's level.
For Experiment 10 [69], the HLT involved a hypothesized conceptual prerequisite for a target concept and skill. With one exception (described in the fourth bullet below), participants pretested at a level below the conceptual prerequisite had negligible or no success on target tasks. The HLT-based experimental intervention resulted in significant improvement on both conceptual and procedural fluency dependent measures above and beyond the improvement of the counterfactual (Teach-to-Target intervention). Specific findings include

•
Five of the seven participants who received the HLT-based intervention, which included prior training on the conceptual prerequisite, had (some) success on the targetconcept measure; six of seven, on the target procedural-fluency measure.

•
The one HLT participant who was unsuccessful on both the conceptual and the procedural-fluency task had negligible success learning the conceptual prerequisite. • Seven of the eight participants who were trained on the target concept and skill but not the prerequisite concept had (almost) no success learning the target knowledge. • Finally, post hoc analysis indicated that the exceptional Teach-to-Target participant who mastered both the target concept and skill not only exhibited the best pretest performance of the sample but appeared to have learn the prerequisite concept during the pretesting.
Overall, then, of the seven children who exhibited knowledge of the prerequisite concept before the target training, six appeared to benefit from the target-level training and exhibited some success on the measure of target understanding (see Table 2). Of the eight children who did not exhibit knowledge of the prerequisite concept before the target training, the target-level training resulted in no success on the measure of target understanding in seven (Teach-to-Target) cases and negligible success in another (HLT participant). The corresponding results for the target skill were all seven prerequisite knowers achieved (some) success, whereas seven not-knowers had no success and one had minimal success (see Table 3). The lack of a messy middle is strongly consistent with prerequisite knowledge involving a necessary relation and, in such cases, instructional order (including not skipping the lower level) is important. Note. LT = HLT-based intervention. TtT/Skip = Teach-to-Target/Skip Level(s) counterfactual; Unord = same but unordered instructional activities counterfactual; BAU = business-asusual (passive) control condition. a The 180 children were assigned to one of the three sub-experiments depending on their initial (pretest) level of development. b Slavin and Smith [75] caution that effect sizes for small-n studies, such as Experiments 4, 5, and 11, are more variable than those of large-n studies. Thus, the former produce less reliable and replicable estimates of program impact than the latter. They further note that the most important source of this greater variability may be what Cronbach et al. [76] call "superrealization". Superrealization refers to high implementation fidelity due to better monitoring and more input by experimenters than would be available at scale. Slavin and Smith conclude that, although this variable may not impact internal validity, it can appreciably affect external validity.

Qualitative Differences between Successive Levels
Even with a succession of prerequisite levels that involve necessary conditions, if two successive levels are highly similar, children may spontaneously construct the higher level from the lower level learned with the support of instruction. That is, with little or no external help, students may generalize learning to the next level. Achieving the lower level (via instruction) may effectively be a necessary and sufficient condition for achieving the higher level. Alternatively, children might spontaneously construct a lower but "skipped" level as they learn the level higher with the support of instruction, "filling in" the knowledge of the skipped level [2,77]. In such cases, skipping instruction on the next level and focus on the next higher level would be warranted at least for some students.

Number of Paths to Target Knowledge
Various scholars have questioned whether there is a single path for all key ideaswhether an HLT can be considered the only or even the best path to a goal [2,5,77,78]. For example, Lesh and Yoon [79] proposed that some knowledge domains might be characterized as the diametrically opposite of a linear, ladder-like LT, namely a web of knowledge. With multiple pathways of facilitators, the middle ground between initial knowledge and the target knowledge can be especially messy.

Validity of the LT
Some domains such as early patterning have been researched less than other domains such as counting, number, and arithmetic development. Thus, the relations among levels of knowledge or thinking of the former are less clear than those of the latter. Experiment 5 [71] entailed evaluating the LT for early knowledge of for repeating patterns summarized in Figure 1. One unresolved question of particular interest was: Where should translating a repeating pattern into letters fit in a patterning HLT? Logically, such a competence fits the definition of Level 3 (Children can abstract a pattern and translate it into new media), which in Sarama and Clements' [2] original LT was combined with Level 4 (Children can identify the core of a repeating pattern (the smallest portion of the pattern that repeats to create the rest of the pattern). For the LT training, then translating repeating patterns was postponed until after participants received Level-2 training. Interestingly, two popular early childhood mathematics curricula-Building Blocks [80] and Mathematics Their Way [81]-regularly use letters to label patterns from the beginning of patterning instruction. This approach was used in the counterfactual training.
Baroody et al. [71] observed that children in both conditions struggled mightily with translating patterns into different materials (e.g., translating the circle-square-circle-squarecircle-square pattern depicted above into triangles-hexagon-triangle-hexagon-trianglehexagon or-in a few cases-even a circle-square repeating pattern involving different colors). In contrast, they quickly learned to translate repeating patterns into letters (e.g., translating • • • into the plastic alphabet letters: ABABAB). Using letters to label the elements of a pattern or its core, then, seems to be a distinct form of translating patternsqualitatively different from translating a pattern into other objects (see also [72]). As a result, the counterfactual ("unordered") intervention may have conferred two advantages:

1.
The early use of letters to label the elements of a pattern may have fostered the Level-2 competencies (e.g., extending a repeating pattern) by counterfactual participants.

2.
Early use of letters to label the core of a pattern may have helped some such participants achieve Level-4 competence (identifying the core of a repeating pattern). (Par-enthetically, translating a pattern into different objects (listed as a Level-3 competence in Figure 1) may be more challenging and facilitated by an explicit understanding the concept of a core unit (listed as a Level-4 competence in Figure 1). This conjecture is consistent not only with Baroody et al.'s [71] observations but with Fyfe et al.'s [82] finding that using letters to identify unit cores was efficacious in promoting the ability to translate a pattern into different objects. Although an implicit consideration of unit may naturally help some children to translate a repeating pattern into different materials, more explicit instruction that entails systematic instruction that first involves using letters to label the elements of a pattern (Level-2) and then the core of a pattern may provide a better basis for most children to tackle this challenging task.) Yilmaz et al. [73] reported eye-tracking data that indicated Level-2 children implicitly attend to the core when, say, extending a pattern and only later construct the explicit knowledge that permits success on the core-identification task used to assess Level 4. That is, experiences constructing Level 2 implicitly draws attention to the core and can facilitate explicit attention to the core during Level-4 training whether conducted simultaneously or afterward. So, another reason for indistinct impact of HLT-based instruction and instruction based on the same unordered activities is that existing patterning LTs, such as that in Figure 1, may have been based on incomplete information-on research that did not adequately examine children's implicit patterning knowledge.

Discussion of Methodological Issues
Another barrier to confirming the efficacy of HLT-based instruction and its assumptions are the methodological challenges of such research. We first discuss five general challenges and then illustrate these issues with a description of our efforts to study a particular domain (early cardinality development). These are by no means the only challenges. However, we believe that their explication may increase the quantity and quality of future research.

Issues with the Starting Level
When evaluating the efficacy and assumptions of HLT-based instruction, careful attention must be paid in identifying a participant's starting developmental level, ensuring enough participants are at an appropriate starting level to achieve significant statistical power, and equating the learning conditions on this variable. For example, Baroody et al. [71] reported that, unlike type of intervention, starting level was significantly related to learning the target knowledge (core identification). The two HLT and three unordered participants who exhibited partial Level-2 competence at pretest all achieved success on the target (Level-4) task at posttest. In contrast, among participants who were at Level 1 at pretest, only three of the six HLT-like participants and one of the five non-HLT children did so. Given that Level 3 should perhaps follow Level 4, the five participants who started with partial knowledge of Level 2 were already close to the target level, whereas those who started at Level 1 were a full level away from it. With a larger sample of children who start at Level 1, then, type of intervention might have made a significant difference.

Sacrifice of Ecological Validity
Research requires trade-offs between internal and external validity (e.g., between controls that permit a clear conclusion and results that can be generalized to actual classrooms). The positive impact of the HLT-based instruction may be greater outside of a controlled sequence of activities used in the present project. For example, in the experiments that compared an HLT-based intervention with an intervention using the same unordered activities [70][71][72], the former involved a fixed sequence of activities, regardless of a child's progress. This was necessary to equate coverage and dosage and eliminate these factors as possible confounds or alternative explanations. However, HLTs are recommended as resources to support more flexible instruction based on formative assessment [2,8]. That is, typically the use of HLTs involves immediately moving to the next higher level once a level is attained and only after this level is attained.

Small Sample Size
A possible explanation for the insignificant finding of Experiment 5, for example, was the small sample insufficient power to detect a real difference. However, a follow-up with three the number of participants per group (Experiment 6) also yielded a non-significant difference [72].

Entangling Lower with Higher of Levels of Instruction
An analysis of Experiment 1 revealed two inter-related reasons for the lack of a significant difference between the HLT-based intervention and the Teach-to-Target intervention. One was that the target-level activities form the Building Blocks curriculum used for both interventions involved both target-level and lower-level competencies. Another is that-despite their research protocol training-the trainers naturally did what educators do, which was they help a child with both levels of competencies regardless of a child's assignment (having trainers teach only one condition would have introduced a possible confound). In effect, the two types of intervention were not clearly distinct. A lack of fidelity to both the HLT and counterfactual also plagued Experiment 2. The plan was to have 180 children starting at the same level, but the population was so diverse that children were assigned to three different levels and thus three instructional conditions. Despite additional professional development, trainers found it difficult to accurately enact the six different instructional conditions (often doing 3 or more each day with different children).

Imprecise Dependent Measures
The operational definition of target-level competencies needs to be precise. The dependent measures for Experiment 1 and Experiment 2 (in Table 1)-the first two efforts to examine a cardinality LT-involved tasks drawn from the TEMA-3 [83] and REMA [84]. Two 'how many?' tasks-cardinality rule with 8 items (after counting 8 items, asking a child how many) and how many pennies (after counting 8 pennies)-served to gauge prerequisite knowledge (Level 2, the count-to-cardinal concept, or CP). A give-n task (put 5, 7, and 10 boxes in a cart) served partly to gauged target-level knowledge (Level 4, cardinalto-count, producing a set). Unfortunately, these tasks do not precisely measure conceptual understanding at Levels 2 and 4.
Whereas the count-to-cardinal concept or cardinality principle (CP) entails understanding that the last number word used to count a collection also indicates its total number of items, Fuson [74] observed that many children can learn the cardinality rule (stating the last number word is an acceptable response to the how many question) by rote-without recognizing that it represents the total. Thus, children successful on the cardinality rule with 8 items may or may not have constructed the hypothesized prerequisite (Level-2) knowledge for Level 4.
Children successful on the give-n task ('put 5 [then 7, and finally 10] boxes in a cart') almost certainly understand the Level 4 (cardinal-to-count) concept (a cardinal term such as "seven" indicates what the last number word would be if a collection is counted). This advanced cardinal concept is the basis for knowing when to stop the counting-out process (e.g., put 7 boxes in the cart, stop counting out boxes when "seven" is reached). However, the task involves executing a counting-out procedure that requires remembering the requested number, counting items as they put in the cart, comparing a count to the requested number. In brief, a child might understand the cardinal-to-count concept but respond incorrectly because of a procedural slip up.

A Case in Point: Cardinality Development
Although some researchers agree with Fuson's [74] hypothesis that ability to count out a requested number of items and its conceptual rationale (Level 4 in Table 4) should build on earlier level of cardinal-number understanding (Levels 1 to 3 in Table 4 [8,85]; others do not [86,87]. Table 4. A possible learning progression of key aspects of pre-counting and counting-based cardinal number knowledge and their type of mapping, conceptual basis, and direct measure [8,74,85].

Aspect of Cardinal Number Conceptual Basis Mapping Direct Measure
Pre-meaningful counting (verbal subitizing-based) cardinality development Level 1A: number recognition (n-knower levels) Cardinal representation of a small number underlies immediate subitizing of 1, 2, or 3 Quantity-to-word (via subitizing) How-many task Level 1B: putting out a requested n (also commonly called n-knower levels) Cardinal representation of small numbers used to subitize when 1, 2 or 3 have been put out Word-to-quantity (via subitizing) Give-n task Counting-based cardinality development Predict last n word and give-n tasks a Meaningfully attaining Level 2 may be preceded by learning the last-word rule, which can be applied without understanding to achieve success on the how-many task.
6.2.1. Experiment 9: Lessons Learned, Part 1 To evaluate the validity of the hypothesized LT (Table 4), the lessons learned in Experiments 1 and 2 were then applied to Experiment 9. Specifically, a conservation of numerical identity task was added to check whether correct responses on the 'how many?' task were due to a cardinality rule learned by rote or the meaningful count-to-cardinal concept (cardinality principle). This task required a child to not only generate the cardinal number for a collection of 5 or 6 by counting but apply this outcome meaningful-to recognize whether a transformation affected the total (addition or subtraction of 1) or not (change in appearance). The scoring of the give-n task was modified to distinguish between errors that violate the cardinal-to-count concept (e.g., counting out all the available items or counting out more than requested number) and minor errors that do violate the principle.
Overview of Experiment 9. This effort entailed randomly assigning 10 participants to the HLT condition (4 boys, mean age = 3.55 years, 5 African American, 3 multiracial, 8 free/reduced lunch) and 10 in the Teach-to-Target condition (4 boys, mean age 3.8, 3 African American, 2 multiracial, 9 free/reduced lunch). An analysis revealed that both groups improved significantly and substantially at delayed posttest on the give-n task but that the HLT-Like group did not significantly improve more than the Teach-to-Target group.
Methodological issues with Experiment 9. Three issues appeared to account for the nonsignificant difference. Two involved the starting level. One compromising issue was that children were included in Experiment 9 regardless of how far below the target level they were developmentally. For the example, the lowest-functioning child in the experiment could not initially subitize even one and two and had trouble counting one-to-one with collections beyond two. Despite focused remedial efforts, this HLT-assigned child did not improve on these foundational competencies. This makes sense given the relatively long time needed to construct verbal concepts of "one" and "two" [88,89]. Training on the hypothesized prerequisite cardinality concept (count-to-cardinal concept) and, thus, the more advanced target-level knowledge (cardinal-to-count concept and counting-out procedure) had no impact. A second issue with starting level was that, at pretest, half of the children included in Experiment 9 could occasionally count out a collection of 5 or more upon request. That is, although highly inconsistent in their performance on the give-n task, they sometimes appeared to apply the cardinal-to-count concept (i.e., stopped their counting-out process at the requested number).
A third issue was assessing the cardinal-to-count concept (Level 4) directly and reliably. Despite the more lenient scoring of the give-n task, a performance failure due to the cognitive demands of implementing the counting-out procedure might still have underestimated understanding of the cardinal-to-count concept. For instance, even if a child understood the concept, the demands on attention and memory required to remember the requested number, count out items, and/or compare the count to the requested number might cause a slip up [40].

Experiment 10: Lessons Learned, Part 2
Building on Experiments 1, 2, and 9, Experiment 10 was undertaken. Methodological improvements to Experiment 10. Three modifications were implemented: First, children who could not recognize 1 and 2 were excluded from the experiment as developmentally unready.
Second, to better test the hypothesis of whether skipping a level makes a difference, only children who had not already achieved Level 2 and who did not have more than minimal success counting out 5 to 7 items were included.
Third, observations during the training phase of Experiment 10 suggested that a stop-at-n task, which involved asking a child to stop a Muppet's counting-out process at the requested number, might serve as an effective measure of the cardinal-to-count concept. A child who recognizes that the requested number represent the cardinal value of the requested collection and should be the stopping point of the counting-out process (i.e., understands the cardinal-to-count concept) should be successful on the stop-at-n task. Unlike the give-n task, this task relieved children of the demands of counting out a collection themselves (minimized cognitive demands and performance failure). The stop-at-n task, then, was adopted as the dependent conceptual measure and the give-n task was retained as in the dependent procedural fluency measure in Experiment 10.
Experiment 11: Results and limitations. As noted previously, the results of Experiment 10 clearly indicated that the count-to-cardinal concept (cardinality principle) is a developmental prerequisite for the cardinal-to-count concept and counting-out collections beyond the subitizing range. Unclear is whether the prerequisite Level 2 is a necessary condition for Level-4 competencies, as hypothesized by Fuson [74], or a necessary and sufficient condition, which is essentially equivalent to Sarnecka and Carey's [87] hypothesis that the concepts are indistinct or develop simultaneously. For a prerequisite involving a necessary condition, all the data should be distributed among cells A, B, and C with cell D = 0, as it is Table 2 [90]. For a necessary and sufficient condition, all the data should be distributed between cells B and C with cells A and D = 0, as it is in Table 3.
Aside from the conflicting results, the problem is that the sample is too small to be sure what the distribution would be in each table for the population of young children. (The COVID pandemic interrupted data collection midstream.) There is another reason Sarnecka and Carey's [87] alternative hypothesis that the count-to-cardinal concept (CP) underlies both meaningful one-to-one counting and fluency with counting out a specified number of items (i.e., that the count-to-cardinal and the cardinal-to-count concepts are indistinct) cannot be discounted. According to this alternative hypothesis, the HLT participants significantly and substantially outperformed the Teach-to-Target children, because the former received training on the count-to-cardinal concept (CP) and the latter did not and, thus, the former had a greater dosage of counting-based cardinality training overall.
One way to critically test Fuson's [74] hypothesis against Sarnecka and Carey's [87] alternative hypothesis would be to track longitudinally whether an understanding of count-to-cardinal and the cardinal-to-count concepts evolve sequentially or simultaneously. Another would be to train children who have achieved Level 1 in Table 1 (i.e., are developmentally ready for Level 2) on the count-to-cardinal concept (CP). If the countto-cardinal concept (CP) is a necessary condition for the cardinal-to-count concept, as Fuson hypothesizes, and the two concepts are clearly distinct (i.e., involve a significant conceptual leap), then participants should significantly improve on the former but not the latter and skipping Level 2 to achieve Level 4 would not be an option. (Currently, there is too little evidence to determine whether the number-constancy concepts-extensions of the count-to-cardinal concept-are a necessary or facilitative condition for cardinal-tocount concept and counting out, thus it unclear whether Level 3 can be skipped.) If the count-to-cardinal concept (CP) is indistinct from the cardinal-to-count concept as Sarnecka and Carey hypothesize (i.e., the former is effectively a necessary and sufficient condition for the latter), then theoretically participants should improve on both tasks to an equal degree. If the count-to-cardinal concept is a necessary condition for the cardinal-to-count concept but the two concepts are only somewhat distinct, then the results could be messy: significantly more participants may or may not improve on the former than on the latter. If-contrary to what the present results indicate-the count-to-cardinal concept (CP) is only a facilitative condition for the cardinal-to-count concept, then Level 2 may be skippable in achieving Level 4, and some portion of the comparison group trained only Level 4 may achieve the cardinal-to-count concept. In brief, corroborating the efficacy and assumptions of HLT-based instruction, in general, and the validity Fuson's hypothesis, in particular, is challenging for both theoretical and methodological reasons.

Conclusions
Overall, then, the evidence of the HLT Project corroborates the efficacy and basic assumptions of an HLT-based approach. Nevertheless, as the case of testing Fuson's hypothesis [74] about cardinality development illustrates, much research still needs to be performed to evaluate whether an HLT's developmental progression consists of facilitative levels or developmental prerequisites. Except in cases where an LT involves prerequisite knowledge (i.e., at least a necessary condition) for a higher level that is qualitatively different, a messy middle can be expected, and some children without a lower level of knowledge can be expected to achieve a higher level of knowledge. This is consistent with the theory upon which the LTs examined in the HLT Project are based, which does not require that levels are prerequisites to be educational useful (and that different types of developmental progressions exist [2]). Further, the theory recognizes that some children can learn multiple contiguous levels of thinking in parallel. However, recall that the HLT in Experiment 7 was nearly necessary for kindergartners with the lowest entry level. This implies that at least for some children learning some topics, the greater the distance between a child and the target level, the more important the adjustment of instruction to the child's level.
Different topics may have quite distinct conceptual structures. For example, consider Rittle-Johnson et al.'s [91] view that existing LTs for early knowledge of for repeating patterns might better be characterized as a "construct map"-a probabilistic continuum of knowledge rather than distinct phases of knowledge. Indeed, in our patterning experiments (Experiments 5 and 6), the evidence indicates a series of facilitative factors, refinement of the patterning HLT (Figure 1), and methodology may yet identify one or more prerequisite levels for later levels.
Further, Hierarchical Interactionalism theory [2] states that HLTs are hypothetical in two ways. First, they must be realized with teachers and children. Second, they should continually be improved based on new information. Therefore, we interpret the null results of the patterning studies (Experiments 5 and 6) as a valid caution that an LT approach is only as good as the LT it uses. However, our analyses also indicate that the null results were due to faults not so much in the LT approach itself but in the LT (which has already been substantially revised, e.g., see LearningTrajectories.org). The more research on a given topic, the more valid future versions of that topic's LT.
In summary, creating and evaluating HLTs are challenging but worthwhile tasks, as the HLT Project illustrates. Even the three Assumption 1 experiments that did not significantly favor the HLT-based instruction for methodological reasons, and thus cannot be considered a valid test, served as pilot studies to work out the intricate methodology needed for the successful corroboration of the cardinality LT (in Table 4). HLT Project results also indicate that the benefits justify meeting the challenges. Data Availability Statement: Data for the experiments, after all planned research is published, will be processed by the Scholarly Commons office of the University of Denver Library system and shared on the Inter-University Consortium for Political and Social Research (ICPSR) system.

Conflicts of Interest:
The authors declare no conflict of interest.