1. Introduction
Many students struggle to develop strong reading skills, with national and international assessments documenting persistent gaps in literacy achievement. These challenges reflect a range of issues in literacy instruction, including misalignment between research and teaching practice (
Moats, 2020) and the reliance on a single, uniform model of literacy instruction for all students (
Compton-Lilly et al., 2023). The COVID-19 pandemic disrupted instruction and learning, resulting in many students falling even further behind in reading (
Kennedy & Strietholt, 2023). Because third-grade reading proficiency is a critical milestone that strongly predicts students’ long-term academic trajectories (
Fiester, 2010), schools are increasingly implementing high-impact tutoring programs as a targeted strategy to support struggling readers (
C. D. Robinson & Loeb, 2021).
A recent randomized controlled trial (RCT) focusing on the efficacy of a virtual early literacy tutoring program for students from kindergarten to second grade compared one-on-one to two-on-one virtual tutoring. In keeping with the prior literature, the study found that one-on-one tutoring produced larger gains in students’ literacy achievement, with the estimated effect size of one-on-one tutoring approximately twice that of the two-on-one tutoring (
C. D. Robinson et al., 2024). Students assigned to receive two-on-one tutoring did not experience learning gains compared to the control group. The analysis in this paper builds on this RCT by exploring differences that may explain why one-on-one and two-on-one sessions differ in effectiveness. The virtual tutoring environment, with the video and audio that it produces, provides an unprecedented opportunity to study the details of tutoring sessions. By analyzing tutor pedagogy, we may find mechanisms that could provide insights on why one-on-one tutoring tends to be more effective than two-on-one tutoring, and, potentially, how to improve two-on-one approaches so that schools can leverage the value of tutoring at a lower cost.
Advancements in computational methods have made it possible to transcribe audio recordings and analyze those transcriptions at scale, offering insights into human interactions and behaviors. By employing techniques in natural language processing (NLP), machine learning (ML), and large language models (LLMs), researchers have created automated discourse models and analytical tools to identify teacher and tutor talk moves (e.g.,
Demszky et al., 2021). They also investigate how these interactions influence student engagement and learning outcomes (
Abdelshiheed et al., 2024;
Booth et al., 2024;
Jacobs et al., 2022;
X. Liu et al., 2024;
O’Connor & Michaels, 2019).
To date, no research has systematically analyzed tutor utterances at a large scale to understand the quantity and quality of tutoring provided. Instead, researchers have administered surveys or collected anecdotes from school administrators and practitioners to explore how relationship-building contributes to student engagement or how educators may deploy different pedagogical approaches (e.g.,
Kraft & Lovison, 2025). This study leverages computational methods and audio-recordings of tutoring from the RCT of one-on-one versus two-on-one tutoring (
C. D. Robinson et al., 2024) to uncover how tutoring differs across these settings. It asks
How does tutor pedagogy differ between two-on-one and one-on-one formats in ways that might explain their differential effectiveness? We find that students in one-on-one sessions receive more time focused on personalized content and relationship building than those in two-on-one sessions and that tutors employ different cultural, linguistic, and relational approaches in the two contexts to enhance content learning and foster relationships.
2. Background
High-impact tutoring—tutoring with a consistent tutor, using data and high-quality instructional materials, over a substantial number of hours—has emerged as one of the most effective interventions for accelerating student learning (
C. Robinson et al., 2021);
Nickow et al., 2024). Research consistently shows that incorporating tutoring interventions into the school day leads to substantial positive effects on student learning, with especially strong impacts on early literacy (
Dietrichson et al., 2017;
Nickow et al., 2024). These interventions can also strengthen student engagement and improve attendance (
Lee et al., 2024), perhaps because they facilitate a consistent connection with a dedicated tutor who can cultivate a sense of belonging and accountability (
Guryan et al., 2021). Recent research also provides evidence of positive effects for students affected by school closures and learning disruptions from the COVID-19 pandemic (
Cortes et al., 2025);
Lee et al., 2024).
Recruiting tutors is a key challenge in scaling tutoring (
Groom-Thomas et al., 2023). While effective tutors can come from a variety of backgrounds (
Nickow et al., 2024), finding enough tutors in the local community to support all struggling students can be difficult, especially in rural communities and areas with tight labor markets (
Groom-Thomas et al., 2023). One solution is virtual tutoring, where the students are in school but the tutor is not and the tutoring session takes place online. Virtual tutoring expands the labor market beyond geographic boundaries, which allows schools to provide tutoring even if there are not enough tutors in the local area. Virtual tutoring also addresses the cost of commuting, which can be substantial relative to the compensation. Many school districts invested in virtual tutoring in response to the pandemic, and recent research shows that the virtual approach can be effective for improving student learning (
Carlana & La Ferrara, 2021;
Gortazar et al., 2023;
Hashim et al., 2025;
C. D. Robinson et al., 2024).
Funding is another key challenge to tutoring uptake: effective tutoring programs require many tutors and experienced educators to oversee and coach tutors (
Groom-Thomas et al., 2023). To manage the costs of high-impact tutoring, which often range from
$1200 to
$2500 per student, education agencies often increase the student-tutor ratio to two or more students per tutor in a session. While larger student-tutor ratios can reduce costs and may facilitate peer (student-student) learning, larger ratios are generally associated with lesser learning gains (
Nickow et al., 2024). The differential effectiveness between one-on-one and two-on-one tutoring motivates our analysis of the differences between one-on-one and two-on-one tutoring sessions.
3. Theoretical Framework
Tutoring improves student learning outcomes by addressing two major challenges of classroom teaching: the diverse academic levels and needs of students and the inability to provide students with the attention and human relationship they need for motivation and engagement (
Guryan et al., 2023;
Roorda et al., 2017;
Scales et al., 2020). The key question—and the focus of this study—is how the dynamics of tutoring differ between one-on-one and two-on-one formats across the two critical ingredients of tutoring: personalization of academic content and relationship building.
According to
Vygotsky (
1978), learning occurs when a child interacts with people in their environment and collaborates with peers. In the context of our study, students’ literacy development may depend on the structure of the tutoring sessions, particularly concerning the student-tutor ratio and peer relationships. On one hand, one-on-one tutoring may be more effective than two-on-one tutoring because it allows for more personalized instruction, enabling the tutor to focus on the material that benefits the student most. Furthermore, one-on-one tutoring can foster a stronger relationship between the tutor and the student, which may increase engagement and motivation for learning. On the other hand, two-on-one tutoring may promote peer collaboration and social learning, potentially enhancing engagement and learning if tutors can capitalize on effective peer interaction.
We use tutors’ utterances during tutoring sessions as a proxy to characterize the social processes that enable student learning. We explore both the quantity of time tutors spend on instruction and relationship building in one-on-one and two-on-one sessions and the quality of interactions during those times. By examining these differences in detail, we can better understand the mechanisms that might explain variations in student learning outcomes and inform best instructional practices for small-group tutoring. We further break down our measures of quantity and quality as follows.
3.1. Quantity
Tutor attention mediates the value of instructional time. Educator attention is a limited resource (
Kahneman, 1973). In a one-on-one session, tutors may direct their undivided attention to their sole student, while in two-on-one sessions tutors may split their attention between their students (
Zhang et al., 2025). Thus students in two-on-one sessions experience three distinct types of instructional time: instruction directed at both them and the other student, instruction directed at only them, and instruction directed only at the other student. Personalized instruction directed at one student in a two-on-one session may be approximately equivalent to that in a one-on-one session, though the presence of the other student may introduce social dynamics, such as collaboration or competition, that may influence learning (
Johnson & Johnson, 2009;
Zajonc, 1965). However, the value of instruction directed at the other student or at both students together is unclear.
Quantity of time spent on relationship building also may affect learning and differ between one-on-one and two-on-one sessions. A meta-analysis finds that positive student-teacher relationships are associated with higher engagement and increased student achievement (
Roorda et al., 2017). High-impact tutoring emphasizes consistent tutor-student pairings because this consistency provides more time to build stronger tutor-student relationships (
C. D. Robinson & Loeb, 2021). Again, the scarcity of tutor attention means that two-on-one students are likely to get less direct relationship-building time with their tutor than one-on-one students.
Differences in quantity between one-on-one and two-on-one tutoring may also stem from differences in the time spent on disruptions and from logistical or structural components of tutoring, such as the number and length of sessions and the pace of instruction (i.e., the tutor speaking speed). On the one hand, students in one-on-one tutoring may benefit from more frequent sessions because scheduling is simpler with just two people involved. On the other hand, students in two-on-one tutoring might experience longer sessions and a faster tutor speaking pace, as the tutor works to address the needs of multiple students. Examining the number and length of sessions, tutor talk time, tutor-student pairing consistency, and tutor speaking speed could allow us to understand if students in two-on-one and one-on-one sessions receive different quantities of instruction.
3.2. Quality
Tutoring sessions likely vary in the quality, as well as the quantity, of interactions, whether instructional or relationship-building interactions. For example, one-on-one sessions may allow tutors to personalize instruction and relationship building more than they are able to do in two-on-one sessions. We use personalization as a key indicator of the quality of targeted instruction and relationship building because both types of interactions likely hinge on their level of personalization.
According to Vygotsky, the quality of instruction is closely tied to how well it is personalized. He advanced the concept of the Zone of Proximal Development (ZPD) for which personalized instruction should target just above a student’s current ability level (
Vygotsky, 1978). In one-on-one tutoring, tutors can identify their student’s ZPD and provide the necessary scaffolding to support learning within that range (
Vygotsky, 1978;
Wood et al., 1976). In two-on-one tutoring, tutors must simultaneously manage two distinct ZPDs. If the students have differing abilities, the tutor may encounter cognitive overload, where external demands (different student needs) and internal processing (individual scaffolding strategies) exceed their available attentional resources (
Sweller, 1988). This overload can reduce a tutor’s effectiveness in teaching (
Feldon, 2007).
Similarly, one-on-one tutoring may be better suited for developing highly personalized student-teacher relationships because the tutor can focus entirely on understanding and connecting with a single student. In contrast, two-on-one tutoring requires the tutor to cultivate two relationships at once, which may result in less in-depth knowledge of each individual student and weaker personalization (
Ichii, 2022).
3.3. Research Questions
Ultimately, one-on-one sessions may deliver a higher quantity of instruction and relationship building by offering the tutor’s undivided attention and a higher quality through more personalized content and interactions. This study is the first to compare the quantity and quality of instruction and relationship building in one-on-one versus two-on-one tutoring, using a large set of tutoring session transcripts from an RCT (
C. D. Robinson et al., 2024). Specifically, we address the following research questions:
RQ1: How do the structural features of tutoring vary between one-on-one and two-on-one tutoring? In particular, how does tutoring differ in the number and length of sessions, tutor talk time, tutor-student pairing consistency, and tutor speaking speed?
RQ2: How does time allocation vary between one-on-one and two-on-one tutoring? In particular, how much talk time do tutors spend on instruction, relationship building, and classroom-management, and how do tutors in two-on-one sessions allocate their attention between their students for these topics?
RQ3: How does the content of tutors’ instructional time and relationship-building time vary between one-on-one and two-on-one tutoring? In particular, do tutors in one-on-one sessions provide more individualized instruction and more individualized interactions during relationship-building?
4. Program and Study Context
OnYourMark Education (OYM) is a virtual early literacy tutoring program with curriculum grounded in the Science of Reading. OYM uses Amplify’s mCLASS intervention to target the basic literacy skills evaluated in the Dynamic Indicators of Basic Early Literacy Skills (DIBELS), with a focus on phonics, phonological awareness, fluency, vocabulary, and comprehension. OYM tutoring is delivered in one-on-one or two-on-one sessions embedded into the school day. The program aims to promote positive tutor-student relationships through a small student-tutor ratio and by pairing students with a consistent tutor.
During the 2022–23 school year, Uplift Education, a charter management organization in Texas, partnered with OYM to provide early literacy tutoring to 2206 kindergarteners, first graders, and second graders in 12 of its elementary schools as part of an RCT to evaluate the effectiveness of the tutoring program. The intervention targeted students who were “below” or “well below” grade-level benchmarks at the beginning-of-year reading assessments. Students eligible to receive tutoring at these schools were randomized into one of three groups: control (business-as-usual without tutoring), individualized tutoring (1 student:1 tutor), or tutoring in pairs (2 students:1 tutor). Students assigned to receive tutoring met their tutor online for 20 min during the school day, four times per week. Tutoring rolled out in September and continued through May. This present study builds on this RCT by exploring the session-level data collected during the experimental year using NLP methods.
The RCT was conducted using a stratified randomization design. To prioritize equity and address school staff concerns about randomization, approximately ten high-need students per school were guaranteed tutoring slots and excluded from the analytic sample. After reserving these seats, school leaders identified additional eligible students based primarily on beginning-of-year DIBELS assessments, with final selections left to local discretion. Eligible students were grouped into pairs based on school, scheduling availability, and overlapping areas of foundational literacy need, such as naming letter sounds, developing phonemic awareness, identifying regular and irregular words, sounding out and blending, etc. A total of 2085 students were ultimately randomized within 34 school-by-grade strata. Student pairs were assigned as clusters to either treatment or control conditions, ensuring balance across grade levels and initial performance. Treatment students were further randomized to receive either one-on-one or two-on-one tutoring, and then assigned to tutors at random. This process resulted in 510 students in one-on-one tutoring, 570 students in two-on-one tutoring, and 1005 students in the control condition (see
C. D. Robinson et al., 2024 for more details on the study design). The present study focuses on a subset of the 1080 treatment students, whose tutoring sessions were recorded and analyzed.
5. Data
This study uses the session metadata and video recordings from the OYM RCT, spanning roughly ten weeks from February to May 2023. During this time, OYM recorded a total of 23,825 tutoring sessions. A typical OYM tutoring session is scheduled for 20 min; however, the actual duration varied based on when the tutor and student(s) logged onto the platform and began the lesson. Some sessions were cut short due to technical issues or attendance problems, while others were extended to meet instructional needs. Consequently, the dataset includes session recordings of varying lengths, ranging from 1 to 40 min. To focus on the most common tutor-student interactions, we excluded 15% of the shortest and longest sessions for each group size from the analysis. As a result, the sample for analysis includes 16,629 session recordings: 10,474 from students assigned to one-on-one settings and 6155 from students assigned to two-on-one settings. Due to logistical difficulties, 27.10% of two-on-one sessions were conducted with just one student and 1.02% of one-on-one sessions were conducted with two students. Of these sessions, 4451 took place in kindergarten, 6377 in first grade, and 5646 in second grade, while the remaining 155 sessions involved students from different grades.
To explore our outcome of interest, tutor pedagogy as demonstrated by their language, we utilized automatic speech recognition (specifically WhisperX) to transcribe audio recordings. For this study, we focus exclusively on the transcripts of tutors because the students’ transcripts are not reliable due to persistent background noise in the tutoring venues (e.g., classrooms, hallways, or cafeterias). The mean word error rate (WER) for tutor utterances, calculated from a random sample of 12 transcripts stratified by grade and group size, is 16.24% (SD = 4.10), indicating 83.76% of the transcription is correct. This rate is within the acceptable performance range for automatic speech recognition, especially given the wide range of WERs of WhisperX (3–60% depending on the quality and difficulty of the datasets;
Bain et al., 2023;
Ferraro et al., 2023;
Radford et al., 2023). Our WER was driven by substitutions (9.28%), followed by deletions (4.52% error rate), and insertions (2.44% error rate). Our sample consists of 3,754,469 tutor utterances in total. On average, tutors spoke 224 utterances (1277 words) per session. We constructed our outcome measures from these transcripts, as detailed below in the description of the methods used to address each research question.
We link these transcriptions to student-level and tutor-level data from OYM and the school district’s administrative data. The student-level data includes grade, date of birth, race/ethnicity, gender, whether the student received free or reduced price lunch or were otherwise indicated as economically disadvantaged based on the receipt of other public assistance, whether the student had an Individualized Education Plan or 504 Plan, whether the student was designated as an English learner, and the student’s availability for tutoring within the school day. In the sample, more than 65 percent of the students were Hispanic, 25 percent were Black, 4 percent were White, 1 percent were Asian American, and 3 percent were multiracial. Additionally, approximately 34 percent of the students were English learners, and 6 percent had disabilities. Furthermore, over 93 percent of the students were economically disadvantaged.
The student-level data also includes student beginning, middle and end of school year achievement scores from DIBELS. DIBELS is a widely used and well-validated set of measures that assess the acquisition of literacy skills (
Smolkowski & Cummings, 2016). The components of DIBELS align closely with the early literacy skills targeted by the OYM intervention. The DIBELS composite score consists of various subtests, typically administered for 60 s each, that measure specific literacy subskills, such as letter sounds, decoding, and reading fluency. These subtests are purposefully ordered to assess specific language skills as they develop, and the content varies by grade level.
OYM provided administrative information including the name of the tutor and tutoring attendance. Demographic information about tutors was collected through an optional survey. Among the tutors who responded (with a response rate around 50 percent), 23 percent were former teachers, while 12 percent were part-time teachers. Approximately 34 percent had graduated from college. The racial and ethnic composition of the tutors was 13 percent Hispanic, 27 percent Black, 35 percent White, and 3 percent Asian. Due to the high rate of missingness among tutor demographic data, we were unable to perform heterogeneity analyses by tutor characteristics.
This study received approval from the Stanford Institutional Review Board (IRB #68027), which granted a waiver of parental consent and student assent. We de-identify all transcripts by removing names before they are processed through text classifiers. The results presented in this paper are aggregated to ensure student confidentiality. Any personally identifiable information (PII) is only used to link transcripts to student demographics before being removed from the dataset.
6. Methods
The goal of this study is to explore how tutor pedagogy differs across one-on-one and two-on-one sessions in ways that may explain their varying effectiveness. We employ text analysis techniques to annotate and analyze the transcripts of tutoring sessions based on tutor recordings.
In this paper we report the results of bivariate comparisons. Because students and tutors were randomized into one-on-one or two-on-one tutoring, we do not need to control for potentially confounding factors when comparing the session types. We do run analyses controlling for all available pre-treatment measures (see
Table A1 in
Appendix A), but the results are so similar that we do not report them in the main text.
6.1. RQ1: Examining the Structural Features of Tutoring
Our first research question asks: how do the structural features of tutoring vary between one-on-one and two-on-one tutoring? Structural features define the frequency and intensity of the tutoring. We investigate the structural features of the tutoring sessions, focusing on how one-on-one and two-on-one sessions differ in the following aspects:
number of sessions per week
session length
percent of sessions with the same tutor
tutor talk time
tutor speaking rate (speed)
To address this research question, we calculate the frequency and duration of each tutoring session and tutor utterance using timestamps from the platform’s metadata. We define a tutor utterance as a single instance of tutor speech during which tutors are speaking without interruption. We determine the number of attended sessions per week at the student level based on their assigned conditions. We measure tutor consistency by the percentage of sessions student(s) met with the same tutor. We define session length as the overlapping period when both the tutor and at least one of the students are present. We utilize NLP to count the number of words in each tutor utterance, aggregating these counts at the session level. We measure utterance duration using the timestamps. By combining word counts with duration, we estimate the tutors’ speaking speed in words per minute at the session level. This metric provides insight into the pacing of tutor speech during the sessions.
6.2. RQ2: Examining Time Use During Tutoring Sessions Through Text Classification
Our second research question asks: how does time allocation and tutor attention differ between one-on-one and two-on-one tutoring for the following interaction types? We distinguish three types of time use:
content instruction
relationship building
classroom management
To address this question, we develop text classification models to label tutor utterances. Text classification models, or text classifiers, take written text as an input and return a label as an output. We utilize text analysis techniques based on machine learning and pre-trained contextual embeddings (Bidirectional Encoder Representations from Transformers [BERT], see
Devlin et al., 2019, or Robustly Optimized BERT Pretraining Approach [RoBERTa], see
Y. Liu et al., 2019). Specifically, we have domain experts hand-annotate tutor transcripts, and then train an open-source transformer model (BERT or RoBERTa) based on these annotations. We train three separate text classifiers to label utterances according to content instruction, relationship building, and classroom management. Note that we borrow the term “classroom management” from the whole class setting to refer to non-instructional tasks related to maintaining order by addressing behavioral or tech issues. We also develop an additional text classifier to determine how the tutor allocates their attention in two-on-one sessions between students. By combining the labeled utterances with their timestamps from session log files, we analyze the tutor talk time and the percentage of tutor utterances for different types of tutor interactions in each tutoring session.
Content Instruction Classifier: The tutoring program uses a curriculum based on the Science of Reading for young learners. We annotate utterances related to reading skills—such as sounding out words, blending or segmenting syllables, and identifying rhymes—as content instruction. Utterances that do not pertain to content instruction mainly include greetings, small talk, playing games, and disruptions (e.g., troubleshooting technical issues or managing behavior). This binary text classifier (content instruction vs. non-content instruction) achieves an accuracy score of 84 percent.
Relationship-Building Classifier: Building strong teacher-student relationships can be achieved through a variety of methods. We fine-tuned this classifier to identify tutor utterances that exemplified four categories of relationship-building practices: providing motivational praise for student learning, making personal connections with students, showing care and affection, and engaging in enjoyable activities together (
Breiseth, 2020;
National Student Support Accelerator [NSSA], n.d.), as well as a null category representing non-relationship building.
Table 1 provides sample tutor utterances for each category. This multi-class relationship-building classifier achieved an accuracy score of 0.87 (macro-F1 = 0.66), with better performance in classifying utterances that establish personal connections (F1-score = 0.80) or demonstrate care and affection (F1-score = 0.68).
Classroom-Management Classifier: This classifier consists of three categories: behavioral classroom management, technical classroom management, and non-classroom management.
Table 1 provides sample tutor utterances for each category. The classroom management classifier achieved an accuracy score of 0.95 (macro-F1 = 0.83), with an F1-score of 0.86 for identifying technical challenges and 0.67 for identifying behavioral issues.
Attention Classifier: One complication in comparing time allocation for two-on-one and one-on-one sessions arises because the tutor in a two-on-one session is choosing not only what to focus time on but also how to split their attention between their two students. In the two-on-one setting, students experience three forms of attention: direct attention, observation of their peers’ direct attention, and shared attention targeted at both them and their peer. To account for this variation, we employed an Attention Classifier (see
Zhang et al., 2025 for more information) to categorize utterances into four groups: direct instruction for “Student A,” instruction directed toward a peer “Student B,” instruction aimed at both students (“Both”), or instruction aimed at only one student, with ambiguity regarding the recipient (“Unknown”). By cross-tabulating these attention labels with the three other classifiers (content instruction, relationship building, and classroom management), we can investigate how tutors allocate their attention between students at various points during the session.
6.3. RQ3: Examining the Details of Personalized Instruction and Relationship Building
Our third research question asks: how does the content of tutors’ instructional time and relationship-building time vary between one-on-one and two-on-one tutoring? To answer this question, we qualitatively examine how tutors facilitate content instruction and build relationships, as these are two key components of tutoring that contribute to student learning and engagement (
Neitzel et al., 2022;
Nickow et al., 2024). Specifically, we investigate whether tutors provide more personalized content instruction and relationship building in one-on-one tutoring sessions compared to two-on-one sessions.
For content instruction, we utilize the utterance labels from our second research question. We apply BERTopic, a topic modeling algorithm based on pre-trained BERT contextual embeddings, to cluster and summarize ten topics (i.e., themes) and compare the frequency of these topics across formats (
Grootendorst, 2022). For relationship building, we use our multi-class relationship building classifier from research question two to explore the frequency of relationship-building subcategories according to group size. For the topics and subcategories that show significant differences between the two group sizes, we conduct a deeper qualitative analysis of the transcripts, focusing on the language used by tutors.
7. Results
7.1. RQ1 Finding: Similar Structural Features of Tutoring
Overall, tutoring sessions in one-on-one and two-on-one formats exhibit similar distributions of structural features, as shown in
Figure 1. We examined the mean differences between these formats by analyzing each feature separately in the following sections.
7.1.1. Number of Sessions per Week
During the ten-week period from February to May 2023, students assigned to one-on-one tutoring attended an average of 25.64 sessions (SD = 6.93), while those in two-on-one tutoring attended an average of 25.46 sessions (SD = 6.93). The difference between the two groups is not statistically significant (p = 0.65). Students in both settings attended an average of 2.65 sessions per calendar week, with no statistically significant difference (p = 0.81).
7.1.2. Percentage of Sessions with the Same Tutor
Students consistently received tutoring from the same tutor, regardless of group size. In one-on-one tutoring, students met with the same tutor an average of 88.74 percent of the time (SD = 16.07), while those in two-on-one tutoring had a similar average of 88.60 percent (SD = 15.79). The difference between the two groups is not statistically significant (p = 0.88).
7.1.3. Session Length and Tutor Talk Time
The average length of a tutoring session was slightly longer in the two-on-one format (M = 19.73 min, SD = 1.38) than in the one-on-one format (M = 18.84 min, SD = 1.67). This difference is statistically significant (p < 0.001). Tutors also spoke for a slightly longer duration in two-on-one sessions (M = 9.04 min, SD = 2.16) than in one-on-one sessions (M = 8.27 min, SD = 2.10), again showing a statistically significant difference (p < 0.001). These variations may be due to the logistical challenges of the two-on-one format, such as waiting for the second student to join before starting the lesson.
7.1.4. Tutor Speaking Speed
Tutors spoke slightly faster in two-on-one sessions than in one-on-one sessions. Tutors’ average speaking speed was 149.73 words per minute (WPM; SD = 25.62), on average, with two-on-one sessions at 151.10 WPM (SD = 23.98) compared to one-on-one at 148.94 WPM (SD = 26.50), p < 0.001. While statistically different, these average rates are qualitatively quite similar, given that average talking speeds are 140–160 words per minute. Another way to conceptualize this is that a two WPM difference is only about 40 extra words over the course of a 20 min session, where an average of 1277 total words are spoken.
7.2. RQ2 Finding: Varying Time Use and Substantial Difference in Tutor Attention
7.2.1. Time Use Across Different Interaction Types
Figure 2 displays the distribution of tutor talk time (in minutes) by interaction type, including content instruction, relationship building, and classroom management. In a typical tutoring session, without disaggregating by group size, tutors spent approximately six minutes of talk time on content instruction and close to one minute each on relationship building and classroom management. Relative to one-on-one sessions, tutors in two-on-one sessions spent an average of 0.53 min more on content instruction (
p < 0.001), 0.02 min more on relationship building (
p < 0.01), and 0.31 min more on classroom management (
p < 0.001). Tutors spent approximately 31.55 s more on content instruction during two-on-one sessions, while also devoting roughly 18.50 s more to classroom management. In terms of percentages, time spent in two-on-one sessions is 9.81 percent greater for content instruction and 63.33 percent greater for classroom management compared to one-on-one sessions. These percentages are large, especially for classroom management, but the magnitudes for the amount of time spent on management are quite small for both groups.
7.2.2. Allocation of Tutor Attention in Two-on-One Sessions
Within the time use described above, tutors allocate their attention differently in the two types of tutoring. In one-on-one sessions, a single student is the primary recipient of the tutor’s attention. In two-on-one sessions, by contrast, tutors must distribute their attention across two students. Although tutors in both formats may experience brief periods of disengagement, these moments are likely to occur across tutoring conditions and we do not have evidence that they systematically differ in one-on-one from two-on-one sessions.
Figure 3 shows the percentages of utterances that tutors directed to each student by interaction type across group sizes. In one-on-one sessions, 65.20 percent of utterances were related to content instruction. In two-on-one sessions, tutors dedicated an average of 65.66 percent of utterances to content instruction, shared between two students. In these two-on-one sessions, an average of 23.13 percent of utterances were content instruction directed to both students, 13.26 percent to one student (“Student A”), 14.64 percent to the other student (“Student B”), and 14.63 percent ambiguous (either to Student A or B). Assuming equal distribution of ambiguous utterances, each student receives 21.27 percent individualized content instruction (13.95 percent is the average of utterances classified as directed at student A and B plus 7.32 percent from half of the ambiguous utterances). In contrast, a student in a one-on-one session receives direct individualized content instruction in 65.20 percent of utterances.
A student in a two-on-one session also receives content instruction in other non-individualized ways: these students receive content instruction directed at both them and their peer in 23.13 percent of utterances and watch their peer receive direct instruction in 21.27 percent of utterances. However, instruction directed at two students, rather than individualized for one, may be less effective than individualized instruction given that tutoring relies on personalization. The value of watching instruction directed to a peer is likely even smaller because the instruction may not even be relevant to the student who is watching.
Tutor attention is also split between the two students during two-on-one tutoring for interactions related to relationship building (
Figure 3). Again assuming equal distribution of ambiguous utterances, students in two-on-one sessions receive nearly three times less relationship building directed only at them than those in one-on-one tutoring. To be clear, they also receive relationship building directed at them and their peer, but it may be less personalized. Even if we assume that this type of interaction is equivalent to relationship building directed at only them, they still receive only two thirds of the amount of relationship building as one-on-one students. Note that we are not considering relationship building utterances directed only at the other student because we believe it is unlikely that exposure to the tutor building a relationship with the other student is beneficial to the first student.
Simply put, students in two-on-one sessions receive substantially less personalized content instruction and relationship building. Not only are large portions of the session directed at both them and their peer, and thus are less personalized, but substantial parts of the session are directed only to their peer, which may not be relevant to them at all.
7.2.3. RQ3 Finding: Details of Personalized Instruction and Relationship Building
Through topic modeling and our multi-class relationship building classifier, we find that one-on-one and two-on-one tutors employ different cultural, linguistic, and relational approaches to enhance content learning and foster relationships.
7.2.4. Individualized Content Instruction
Figure 4 illustrates the clusters of topics derived from BERTopic modeling, summarizing tutors’ utterances during content instruction. Over 75 percent of the content-instruction utterances consist of brief, spontaneous positive feedback (Topic 1), including phrases like “good job,” “great,” “well done,” “excellent,” “nice,” “perfect,” etc. Other utterances pertain to specific learning tasks, such as phonics, blending sounds, punctuation, comprehension, and tests.
The percentage differences across tutoring formats for topics are small, except for Topic 3 (referring to a specific student) and Topic 7 (in Spanish). The difference in Topic 3 (one-on-one sessions having lower occurrences compared to two-on-one) aligned with the turn-taking and attention-splitting nature of two-on-one tutoring, as demonstrated by the attention classification analysis. Topic 7 highlights that tutors utilize their shared cultural and linguistic resources, especially Spanish, substantially more frequently during one-on-one sessions. The percentage difference for Topic 7 (0.13 percent of all utterances in one-on-one sessions versus 0.02 percent in two-on-one sessions) translates to 0.17 and 0.03 utterances per session in one-on-one and two-on-one sessions, respectively, p < 0.001.
Table 2 provides examples of tutor utterances in Spanish, showcasing how tutors leverage Spanish as a valuable resource for both content instruction and relationship building. This increased use of linguistic resources could happen for two reasons linked to personalization. First, a two-person conversation is generally more individualized than a three-person conversation (e.g., all three members of a three-person conversation must speak Spanish or else the use of Spanish will exclude someone from the conversation). Second, one-on-one tutors can get to know their student better than two-on-one tutors can (one-on-one students receive a larger percentage of direct relationship building utterances than two-on-one students), and, as a result, one-on-one tutors may be more likely to learn important cultural and linguistic information about their student, such as whether they speak Spanish.
We also examined the use of Spanish by tutors throughout the program. We noticed that the increased use of Spanish towards the end of the program coincided with tutors selecting the nursery rhyme “See You Later, Alligator,” which includes the line “Mañana, Iguana.” We also found that approximately 60 tutors regularly employ Spanish during their sessions, while others do not. This consistent use indicates that those who were familiar with these linguistic resources applied them consistently, potentially fostering a welcoming environment for multilingual learners.
7.2.5. Personalized Relationship Building
Next we explore the frequency, in percentage of utterances, of relationship building subcategories (
Figure 5). We find that tutors provide more motivational praise (mean difference (MD = 0.16,
p < 0.001), make more personal connections (MD = 0.23,
p < 0.001), and spend a larger percentage of utterances on having fun with students in one-on-one sessions compared to two-on-one sessions (MD = 0.15,
p < 0.001). In contrast, tutors express similar levels of care or affection in both session types (MD = 0.08;
p = 0.10).
The differences in relationship building may stem from the social nature of one-on-one versus two-on-one tutoring sessions. Two person conversations allow for more individualized interactions between the tutor and the one student, while three person conversations require a focus on communal interests between the tutor and their two students.
Tic-tac-toe is a popular game in OYM tutoring. Tutors mentioned “tic-tac-toe” in 25.18 percent of one-on-one sessions and 21.29 percent of two-on-one sessions (p < 0.001). OYM encouraged tutors to play tic-tac-toe as a means for engaging students because the game is simple to learn and quick to run, fitting well within the 20 min session duration. Tutors used tic-tac-toe both as an instructional tool and as a form of positive reinforcement (reward). For example, in instructional contexts, tutors might say, “We’re playing tic-tac-toe, but you need to say the word correctly to place your mark,” or “Now, it’s time for a tic-tac-toe game, but we’ll use sight words.” As part of positive reinforcement, tutors might use language such as, “If we finish all of these sentences, do you want to play tic-tac-toe?” or “If we have enough time, we will definitely play tic-tac-toe.” The game typically occurred at the end of the session but could also be played at the beginning as a make-up reward if time ran out in a previous session.
Tutors brought up “tic-tac-toe” more frequently during one-on-one sessions, and they also played the game differently in those sessions.
Table 3 contains excerpts from tutor utterances related to the game in both two-on-one and one-on-one settings. Tic-tac-toe is designed for two players. In two-on-one tutoring sessions, students typically play against each other (“Okay, let’s do a speed run on tic-tac-toe. You guys have to take turns, okay?”); if one student does not want to play, the other misses out (“I don’t blame Student A for not wanting to play tic-tac-toe. He and I played a lot last semester.”). In contrast, during one-on-one sessions, the student has the chance to play against the tutor, receiving undivided attention from an educator.
The dynamics of the interactions may depend on whether the engagement is between students or between an adult and a student. Student–student interactions might create more tension or competition, while adult–student interactions tend to encourage trust and engagement. The caring adult can guide the student with strategies for winning (e.g., “No, you don’t want to go there, do you? You should go somewhere else.”) and boost the student’s confidence with compliments (“It’s hard to beat you.” “How did you do that? You beat me!”).
8. Discussion
This study analyzes transcription data to examine the differences in tutor pedagogy between two-on-one and one-on-one tutoring sessions and how these differences may account for the gap in effectiveness between one-on-one and two-on-one tutoring, as reported in the RCT (
C. D. Robinson et al., 2024). Our transcripts come from a highly structured early literacy tutoring program held virtually during the school day. Tutors and students were randomly assigned to the one-on-one or two-on-one setting, and, as a result, our comparisons are not confounded by selection bias and are unlikely to be confounded by missing variable bias.
We find that structural features, such as session frequency and tutor speaking speed, are similar between two-on-one and one-on-one settings. Our analysis also finds that the amount of time spent on content instruction and relationship building is similar across settings, though more time is spent on classroom management in two-on-one sessions. However, the attention classifier that we construct reveals that students in two-on-one sessions receive substantially less personalized instruction and relationship building. Tutors in two-on-one sessions spend large portions of the session addressing both students at once or directly addressing the other student in the session. In our analysis of the details of personalization, we find that tutors in one-on-one sessions were more likely to speak Spanish with their students and have a larger portion of utterances dedicated to the relationship building subcategories of motivational praise, personal connections, and having fun together. While the magnitude of these differences is small, over the course of an academic year, these differences add up.
These results suggest that the differing learning outcomes may stem from variations and constraints in how tutors deliver content and build rapport. Specifically, students assigned to two-on-one tutoring receive far less individualized attention than students assigned to one-on-one sessions in both content instruction and relationship building. Students in two-on-one sessions also have to tolerate additional distractions from the extra time spent on classroom management, which could lead to disengagement.
This study is the first to use NLP methods to systematically compare tutor pedagogy in one-on-one and two-on-one tutoring sessions. While a large body of research has examined effective instructional strategies, prior work has largely relied on small-scale qualitative approaches that are difficult to generalize or deploy at scale. By leveraging transcription data from a virtual tutoring program, we quantitatively capture instructional and relational features—such as personalization and relationship building—that are typically studied qualitatively. The alignment of our findings with prior theory and evidence provides confidence that our NLP measures meaningfully capture instructional dynamics, while also demonstrating how these methods can be used to uncover mechanisms underlying differences in tutoring effectiveness and to inform improvements to lower-cost tutoring models.
Peer-learning is one of the potential strengths of small-group tutoring relative to one-on-one tutoring, yet peer learning does not happen organically; it requires intentional facilitation of meaningful interactions, especially in virtual learning environments. In this study, students typically attend two-on-one tutoring sessions individually using their own computers and headphones. Student pairs occasionally are in the same room, but they are often in separate spaces. Regardless of their physical proximity, students often struggle to connect with peers through a computer screen without external prompting (
Vrieling-Teunter et al., 2022). The early literacy curriculum further complicates matters, as it relies heavily on the deliberate practice of lower-cognitive skills, such as recall and reproduction. Mastery of these foundational concepts requires one-on-one attention from tutors, as opposed to cooperative discussions that are more suitable for higher-order problem-solving in subjects like mathematics or reading comprehension.
Given the existing funding shortages and associated cost implications, two-on-one or small-group tutoring will likely become more prevalent, underscoring the need for understanding how to make these settings as effective as they can be. This study shows that the approaches—such as playing tic-tac-toe for rapport-building—that work effectively in the one-on-one setting may not work as well in the two-on-one setting. Tutoring in the two-on-one setting may benefit from explicitly leveraging peer learning techniques and from using curricular materials that incorporate collaborative teamwork to reduce individual students’ downtime. Using games that everyone in the session, including tutors and students, can participate in may also better foster teamwork and a sense of belonging. Some providers already enhance their tutoring sessions with well-designed, high-quality computer-assisted learning tools in small group contexts, which have demonstrated positive outcomes in both student learning and cost-effectiveness (
Bhatt et al., 2024;
Cortes et al., 2025;
Slavin et al., 2011). Well-organized professional development for tutors may also improve two-on-one tutoring, especially in helping novice tutors develop their engagement skills. Recent advancements in generative artificial intelligence have created opportunities to provide coaching to tutors in real time at low cost (
Agarwal et al., 2024;
Demszky et al., 2023;
Kim, 2024;
L’Enfant, 2024;
Wang et al., 2024).
9. Limitations
This study has several limitations. First, although we leverage student-level random assignment to one-on-one or two-on-one tutoring to examine tutor practices associated with the greater effectiveness of one-on-one tutoring, our analysis does not identify which specific pedagogical strategies causally affect student learning. This limitation reflects the fact that tutor practices are not themselves randomly assigned and may be correlated with unmeasured tutor or student characteristics. Nonetheless, documenting systematic differences in tutor practices across tutoring formats provides useful descriptive evidence that can inform future experimental work on effective tutoring pedagogy.
Second, the OYM program, like many early literacy tutoring programs, is highly structured, and while this structure may reduce variation among tutors, it may also limit tutors’ ability to adapt instruction to new instructional contexts, such as serving two students simultaneously. The OYM program was originally designed to be delivered one-on-one. Because this study examines the first ever implementation of the program’s two-on-one tutoring model, the observed approaches in the two-on-one sessions may not fully reflect pedagogical approaches optimized for student engagement and learning in that format. Future iterations of this research might better capture instructional approaches specifically designed for effective two-on-one tutoring.
Third, we include only tutor transcripts, not student transcripts, in our analysis due to the poor quality of student transcripts, which were affected by background noise and the challenges of transcribing student voices. Consequently, our conclusions focus on tutor attention and language. We anticipate future iterations of this work that will incorporate student transcriptions, perhaps by examining student responses to the tutor’s language or analyzing the extent and quality of student-student interactions during two-on-one sessions.
Finally, our session-level data collection spanned 10 weeks, from February to May 2023, whereas OYM tutoring was implemented from September through May in Uplift Schools. Collecting data across the full academic year would have strengthened the analysis by enabling the examination of longitudinal changes in tutor language. Nevertheless, the 10 weeks of data we collected, encompassing over 16,000 tutoring sessions, represent a robust and continuous sample of tutor language.
10. Conclusions
We examine mechanisms that may drive the differences in student outcomes between two-on-one and one-on-one tutoring. By leveraging NLP to assess tutor attention and time allocation, this study suggests that differences in the quality and quantity of instructional attention may help explain why Robinson and colleagues (2024) found one-on-one virtual tutoring to be effective in improving student learning gains, while finding less evidence of impacts for two-on-one virtual tutoring. In the near future, integrating artificial-intelligence-powered feedback with these analytics could create a comprehensive framework for professional development, empowering tutors to refine their skills in real-time. This integration, in turn, may lead to improved educational outcomes for students. The findings of this study highlight the importance of data-driven approaches in identifying best practices, and suggest the potential for tailoring professional development efforts to meet the diverse needs of tutors. This research opens up opportunities for future studies that expand on these methods, leading to the development of more innovative teaching strategies that enhance student learning and engagement.
Author Contributions
Conceptualization, H.H., D.G., C.D.R. and S.L.; methodology, H.H. and D.G.; software, H.H. and D.G.; validation, H.H. and D.G.; formal analysis, H.H., D.G., C.D.R. and S.L.; investigation, H.H. and D.G.; resources, H.H., D.G. and C.D.R.; data curation, H.H. and D.G.; writing—original draft preparation, H.H. and D.G.; writing—review and editing, H.H., D.G., C.D.R. and S.L.; visualization, H.H.; supervision, C.D.R. and S.L.; project administration, H.H.; funding acquisition, C.D.R. and S.L. All authors have read and agreed to the published version of the manuscript.
Funding
This research was conducted within grant #283995—Evaluating and enhancing the effectiveness of high-impact early literacy tutoring, funded by the Overdeck Family Foundation, and grant #2023-3173, funded by the Smith Richardson Foundation. The APC was funded by the Smith Richardson Foundation.
Institutional Review Board Statement
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of Stanford University (protocol code: 68027; date of approval: 18 September 2025).
Informed Consent Statement
The study was granted a waiver of parent consent and student assent due to minimal risk.
Data Availability Statement
The datasets presented in this article are not readily available because of restrictions in the Data Usage Agreements. Requests to access the datasets should be directed to the corresponding author.
Acknowledgments
We are grateful to our partners in this research, On Your Mark and Uplift Education. We especially thank Nick Erber, Mindy Sjoblom, Ashley Chin Morefield, and Aaron Schlessman for their invaluable contribution to project implementation. We also thank Pearl for their technical support. We thank the Overdeck Family Foundation for their generous support of this research. We also thank the Smith Richardson Foundation for their support of our full research program. We received insightful feedback and support from the National Student Support Accelerator team, in particular Xander Beberman, JP Martinez Claeys, Natalia Ortega, Ana Trindade Ribeiro, Nancy Waymack, and Ashley Zhang.
Conflicts of Interest
The authors declare no conflict of interest.
Appendix A
Table A1.
Correlations between Tutor Utterances and Group Size.
Table A1.
Correlations between Tutor Utterances and Group Size.
| | Session Length | Tutor Talk | Content Talk | Relationship Building Talk | Classroom Management Talk |
|---|
| Predictors | Estimate | p | Estimate | p | Estimate | p | Estimate | p | Estimate | p |
|---|
| Two-on-One | 0.85 | <0.001 | 0.80 | <0.001 | 0.54 | <0.001 | 0.04 | 0.091 | 0.31 | <0.001 |
| EL | 0.16 | 0.003 | −0.12 | 0.308 | 0.05 | 0.563 | −0.08 | 0.001 | 0.01 | 0.613 |
| Female | 0.00 | 0.921 | −0.00 | 0.988 | −0.11 | 0.185 | 0.02 | 0.406 | −0.01 | 0.570 |
| Hispanic | 0.02 | 0.752 | 0.01 | 0.949 | 0.04 | 0.641 | −0.00 | 0.879 | 0.01 | 0.666 |
| SPED | 0.05 | 0.616 | −0.16 | 0.521 | −0.06 | 0.739 | −0.08 | 0.015 | 0.08 | 0.089 |
| MOY Test Score | 0.00 | 0.032 | −0.00 | 0.403 | −0.01 | <0.001 | 0.00 | 0.080 | 0.00 | 0.971 |
| (Intercept) | 17.90 | <0.001 | 9.02 | <0.001 | 8.74 | <0.001 | 0.60 | 0.001 | 0.47 | 0.001 |
| Observations | 16,535 | 16,535 | 16,535 | 16,535 | 16,535 |
| R2/R2 adj. | 0.072/0.072 | 0.031/0.031 | 0.033/0.033 | 0.008/0.008 | 0.089/0.089 |
References
- Abdelshiheed, M., Jacobs, J. K., & D’Mello, S. K. (2024). Aligning tutor discourse supporting rigorous thinking with tutee content mastery for predicting math achievement. arXiv. Available online: https://arxiv.org/abs/2405.06218.
- Agarwal, N., Sangwan, M., & Agarwal, E. (2024). Revolutionizing pedagogy: Exploring AI coaching in enhancing teacher professionalism. In Integrating generative AI in education to achieve sustainable development goals (pp. 264–281). IGI Global. Available online: https://www.igi-global.com/chapter/revolutionizing-pedagogy/348807 (accessed on 28 April 2025).
- Bain, M., Huh, J., Han, T., & Zisserman, A. (2023). Whisperx: Time-accurate speech transcription of long-form audio. arXiv, arXiv:2303.00747. [Google Scholar]
- Beckner, C., Blythe, R., Bybee, J., Christiansen, M., Croft, W., Ellis, N. C., Holland, J., Ke, J., Larsen-Freeman, D., & Schoenemann, T. (2009). Language is a complex adaptive system: Position paper. Language Learning, 59(1), 1–26. [Google Scholar] [CrossRef]
- Bhatt, M. P., Guryan, J., Khan, S. A., LaForest-Tucker, M., & Mishra, B. (2024). Can technology facilitate scale? Evidence from a randomized evaluation of high dosage tutoring (No. w32510). National Bureau of Economic Research. Available online: https://www.nber.org/papers/w32510 (accessed on 28 April 2025).
- Bloom, B. S. (1984). The 2 sigma problem: The search for methods of group instruction as effective as one-to-one tutoring. Educational Researcher, 13(6), 4–16. [Google Scholar] [CrossRef]
- Booth, B. M., Jacobs, J., Bush, J. B., Milne, B., Fischaber, T., & DMello, S. K. (2024, March 18–22). Human-tutor coaching technology (htct): Automated discourse analytics in a coached tutoring model. 14th Learning Analytics and Knowledge Conference (pp. 725–735), Kyoto, Japan. [Google Scholar] [CrossRef]
- Breiseth, L. (2020). 10 Strategies for building relationships with ELLs. Colorín Colorado. Available online: https://www.colorincolorado.org/article/building-relationships-ells (accessed on 28 April 2025).
- Carlana, M., & La Ferrara, E. (2021). Apart but connected: Online tutoring and student outcomes during the COVID-19 pandemic. (EdWorkingPaper: 21-350). Annenberg Institute at Brown University. [Google Scholar] [CrossRef]
- Center on Reinventing Public Education [CRPE]. (2023). The state of the American student: Fall 2023, we are failing older students: Bold ideas to change course. Available online: https://crpe.org/wp-content/uploads/The-State-of-the-American-Student-2023.pdf (accessed on 28 April 2025).
- Center on Reinventing Public Education [CRPE]. (2024). The state of the American student: Fall 2024, Solve for the most complex needs: A path forward as pandemic effects reverberate. Available online: https://crpe.org/wp-content/uploads/CRPE_SOS2024_FINAL.pdf (accessed on 28 April 2025).
- Compton-Lilly, C., Spence, L. K., Thomas, P. L., & Decker, S. L. (2023). Stories grounded in decades of research: What we truly know about the teaching of reading. The Reading Teacher, 77(3), 392–400. [Google Scholar] [CrossRef]
- Cortes, K. E., Kortecamp, K., Loeb, S., & Robinson, C. D. (2025). A scalable approach to high-impact tutoring for young readers. Learning and Instruction, 95, 102021. [Google Scholar] [CrossRef]
- Demszky, D., Liu, J., Hill, H. C., Sanghi, S., & Chung, A. (2023). Improving teachers’ questioning quality through automated feedback: A mixed-methods randomized controlled trial in brick-and-mortar classrooms. (EdWorkingPaper: 23-875). Retrieved from Annenberg Institute at Brown University. [Google Scholar] [CrossRef]
- Demszky, D., Liu, J., Mancenido, Z., Cohen, J., Hill, H., Jurafsky, D., & Hashimoto, T. (2021). Measuring conversational uptake: A case study on student-teacher interactions. arXiv, arXiv:2106.03873. [Google Scholar] [CrossRef]
- Devlin, J., Chang, M. W., Lee, K., & Toutanova, K. (2019, June 2–7). Bert: Pre-training of deep bidirectional transformers for language understanding. 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1 (Long and Short Papers), pp. 4171–4186), Minneapolis, MN, USA. [Google Scholar] [CrossRef]
- Dietrichson, J., Bøg, M., Filges, T., & Jørgensen, A.-M. K. (2017). Academic interventions for elementary and middle school students with low socioeconomic status: A systematic review and meta-analysis. Review of Educational Research, 87(2), 243–282. [Google Scholar] [CrossRef]
- Feldon, D. F. (2007). Cognitive load and classroom teaching: The double-edged sword of automaticity. Educational Psychologist, 42(3), 123–137. [Google Scholar] [CrossRef]
- Ferraro, A., Galli, A., La Gatta, V., & Postiglione, M. (2023). Benchmarking open source and paid services for speech to text: An analysis of quality and input variety. Frontiers in Big Data, 6. [Google Scholar] [CrossRef]
- Fiester, L. (2010). Early warning! Why reading by the end of third grade matters (KIDS COUNT special report). Annie E. Casey Foundation. Available online: https://www.aecf.org/resources/early-warning-why-reading-by-the-end-of-third-grade-matters (accessed on 28 April 2025).
- Gee, J. P. (2017). A sociocultural perspective on early literacy development. In Handbook of early literacy research (Vol. 1, pp. 30–42). The Guilford Press. [Google Scholar]
- Gortazar, L., Hupkau, C., & Roldan, A. (2023). Online tutoring works: Experimental evidence from a program with vulnerable children. (EdWorkingPaper: 23-743). Annenberg Institute at Brown University. [Google Scholar] [CrossRef]
- Gromada, A., & Shewbridge, C. (2016). Student learning time: A literature review. (OECD education working papers, No. 127). OECD Publishing. [Google Scholar] [CrossRef]
- Groom-Thomas, L., Leung, C., Loeb, S., Pollard, C., Waymack, N., & White, S. (2023). Challenges and solutions: Scaling tutoring programs. IDB. [Google Scholar] [CrossRef]
- Grootendorst, M. (2022). BERTopic: Neural topic modeling with a class-based TF-IDF procedure. arXiv, arXiv:2203.05794. [Google Scholar]
- Guryan, J., Christenson, S., Cureton, A., Lai, I., Ludwig, J., Schwarz, C., Shirey, E., & Turner, M. C. (2021). The effect of mentoring on school attendance and academic outcomes: A randomized evaluation of the Check & Connect program. Journal of Policy Analysis and Management, 40(3), 841–882. [Google Scholar] [CrossRef]
- Guryan, J., Ludwig, J., Bhatt, M. P., Cook, P. J., Davis, J. M., Dodge, K., Farkas, G., Fryer, R. G., Jr., Mayer, S., Pollack, H., Steinberg, L., & Stoddard, G. (2023). Not too late: Improving academic outcomes among adolescents. American Economic Review, 113(3), 738–765. Available online: https://www.aeaweb.org/articles?id=10.1257/aer.20210434 (accessed on 28 April 2025).
- Hashim, S., Miles, K. P., & Croke, E. (2025). Experimental evidence on the impact of tutoring format and tutors: Findings from an early literacy tutoring program. (EdWorkingPaper: 25-1176). Annenberg Institute at Brown University. [Google Scholar] [CrossRef]
- Holland, P., Alfaro, P., & Evans, D. (2015). Extending the school day in Latin America and the Caribbean (SSRN scholarly paper no. 2620017). Social Science Research Network. Available online: https://papers.ssrn.com/abstract=2620017 (accessed on 15 October 2025).
- Ichii, N. (2022, January 6–9). Students’ perceptions toward dyads and triads in the English classroom. ISSN: 2189-1036—The IAFOR International Conference On Education—Hawaii 2022 Official Conference Proceedings (pp. 245–255), Honolulu, HI, USA. [Google Scholar] [CrossRef]
- Jacobs, J., Scornavacco, K., Harty, C., Suresh, A., Lai, V., & Sumner, T. (2022). Promoting rich discussions in mathematics classrooms: Using personalized, automated feedback to support reflection and instructional change. Teaching and Teacher Education, 112, 103631. [Google Scholar] [CrossRef]
- Johnson, D. W., & Johnson, R. T. (2009). An educational psychology success story: Social interdependence theory and cooperative learning. Educational Researcher, 38(5), 365–379. [Google Scholar] [CrossRef]
- Kahneman, D. (1973). Attention and effort (Vol. 1063, pp. 218–226). Prentice-Hall. Available online: https://philpapers.org/rec/kahaae (accessed on 28 April 2025).
- Kennedy, A. I., & Strietholt, R. (2023). School closure policies and student reading achievement: Evidence across countries. Educational Assessment, Evaluation and Accountability, 35(4), 475–501. [Google Scholar] [CrossRef]
- Kim, J. (2024). Leading teachers’ perspective on teacher-AI collaboration in education. Education and Information Technologies, 29(7), 8693–8724. [Google Scholar] [CrossRef]
- Kraft, M. A., & Lovison, V. S. (2025). The effect of student–tutor ratios: Experimental evidence from a pilot online math tutoring program. In Educational evaluation and policy analysis. Sage. [Google Scholar] [CrossRef]
- Lee, M. G., Loeb, S., & Robinson, C. D. (2024). Effects of high-impact tutoring on student attendance: Evidence from the OSSE HIT initiative in the District of Columbia. (EdWorkingPaper: 24-1107). Annenberg Institute at Brown University. [Google Scholar] [CrossRef]
- L’Enfant, J. (2024). AI as a reflective coach in graduate ESL practicum: Activity theory insights into student-teacher development. European Journal of Open, Distance & E-Learning, 26(1), 1–19. [Google Scholar] [CrossRef]
- Liu, X., Zhang, J., Barany, A., Pankiewicz, M., & Baker, R. S. (2024). Assessing the potential and limits of large language models in qualitative coding. In Y. J. Kim, & Z. Swiecki (Eds.), Advances in quantitative ethnography (Vol. 2278, pp. 89–103). Springer. icqe 2024, Communications in Computer and Information Science. [Google Scholar] [CrossRef]
- Liu, Y., Ott, M., Goyal, N., Du, J., Joshi, M., Chen, D., Levy, O., Lewis, M., Zettlemoyer, L., & Stoyanov, V. (2019). Roberta: A robustly optimized bert pretraining approach. arXiv, arXiv:1907.11692. [Google Scholar]
- Mason, J. M., & Sinha, S. (1992). Emerging literacy in the early childhood years: Applying a Vygotskian model of learning and development. Available online: https://eric.ed.gov/?id=ED348667 (accessed on 28 April 2025).
- Moats, L. C. (2020). Teaching reading “is” rocket science: What expert teachers of reading should know and be able to do. American Educator, 44(2), 4. Available online: https://files.eric.ed.gov/fulltext/EJ1260264.pdf (accessed on 28 April 2025).
- National Student Support Accelerator [NSSA]. (n.d.). Toolkit for tutoring programs: Relationship building. National Student Support Accelerator. Available online: https://nssa.stanford.edu/tutoring/instruction/relationship-building (accessed on 28 April 2025).
- Neitzel, A. J., Lake, C., Pellegrini, M., & Slavin, R. E. (2022). A synthesis of quantitative research on programs for struggling readers in elementary schools. Reading Research Quarterly, 57(1), 149–179. [Google Scholar] [CrossRef]
- Nickow, A., Oreopoulos, P., & Quan, V. (2024). The promise of tutoring for PreK–12 learning: A systematic review and meta-analysis of the experimental evidence. American Educational Research Journal, 61(1), 74–107. [Google Scholar] [CrossRef]
- O’Connor, C., & Michaels, S. (2019). Supporting teachers in taking up productive talk moves: The long road to professional learning at scale. International Journal of Educational Research, 97, 166–175. [Google Scholar] [CrossRef]
- Patall, E. A., Cooper, H., & Allen, A. B. (2010). Extending the school day or school year: A systematic review of research (1985–2009). Review of Educational Research, 80(3), 401–436. [Google Scholar] [CrossRef]
- Radford, A., Kim, J. W., Xu, T., Brockman, G., McLeavey, C., & Sutskever, I. (2023, July 23–29). Robust speech recognition via large-scale weak supervision. International Conference on Machine Learning (pp. 28492–28518), Honolulu, HI, USA. Available online: https://proceedings.mlr.press/v202/radford23a.html (accessed on 28 April 2025).
- Robinson, C., Kraft, M., Loeb, S., & Schueler, B. (2021). Design principles for accelerating student learning with high-impact tutoring. Design principles brief# 30: Academic acceleration. Updated. EdResearch for action. Available online: https://files.eric.ed.gov/fulltext/ED656753.pdf (accessed on 28 April 2025).
- Robinson, C. D., & Loeb, S. (2021). High-impact tutoring: State of the research and priorities for future learning. (EdWorkingPaper: 21-384). Annenberg Institute at Brown University. [Google Scholar] [CrossRef]
- Robinson, C. D., Pollard, C., Novicoff, S., White, S., & Loeb, S. (2024). The effects of virtual tutoring on young readers: Results from a randomized controlled trial. Educational Evaluation and Policy Analysis, 47(4), 1245–1265. [Google Scholar] [CrossRef]
- Roorda, D. L., Jak, S., Zee, M., Oort, F. J., & Koomen, H. M. Y. (2017). Affective teacher–student relationships and students’ engagement and achievement: A meta-analytic update and test of the mediating role of engagement. School Psychology Review, 46(3), 239–261. [Google Scholar] [CrossRef]
- Scales, P. C., Pekel, K., Sethi, J., Chamberlain, R., & Van Boekel, M. (2020). Academic year changes in student-teacher developmental relationships and their linkage to middle and high school students’ motivation: A mixed methods study. The Journal of Early Adolescence, 40(4), 499–536. [Google Scholar] [CrossRef]
- Slavin, R. E., Lake, C., Davis, S., & Madden, N. A. (2011). Effective programs for struggling readers: A best-evidence synthesis. Educational Research Review, 6(1), 1–26. [Google Scholar] [CrossRef]
- Smolkowski, K., & Cummings, K. D. (2016). Evaluation of the DIBELS diagnostic system for the selection of native and proficient English speakers at risk of reading difficulties. Journal of Psychoeducational Assessment, 34(2), 103–118. [Google Scholar] [CrossRef]
- Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. [Google Scholar] [CrossRef]
- Valdés, G. (2015). Latin@s and the intergenerational continuity of Spanish: The challenges of curricularizing language. International Multilingual Research Journal, 9(4), 253–273. [Google Scholar] [CrossRef]
- Vrieling-Teunter, E., Henderikx, M., Nadolski, R., & Kreijns, K. (2022). Facilitating peer interaction regulation in online settings: The role of social presence, social space and sociability. Frontiers in Psychology, 13. [Google Scholar] [CrossRef]
- Vygotsky, L. S. (1978). Mind in society: Development of higher psychological processes. Harvard University Press. [Google Scholar] [CrossRef]
- Vygotsky, L. S. (1981). The genesis of higher mental function. In J. V. Wertsch (Ed.), The concept of activity in Soviet psychology (pp. 144–188). Sharp. Available online: https://www.taylorfrancis.com/chapters/edit/10.4324/9781003575429-5/genesis-higher-mental-functions-vygotsky (accessed on 28 April 2025).
- Vygotsky, L. S. (1986). Thought and language (A. Kozulin, Ed.). MIT Press. [Google Scholar]
- Wang, R. E., Ribeiro, A. T., Robinson, C. D., Loeb, S., & Demszky, D. (2024). Tutor copilot: A human-ai approach for scaling real-time expertise. arXiv, arXiv:2410.03017. [Google Scholar]
- Wertsch, J. (Ed.). (1985). Culture, communication, and cognition: Vygotskian perspectives. Cambridge University Press. [Google Scholar]
- Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2), 89–100. [Google Scholar] [CrossRef]
- Yeşil Dağlı, Ü. (2019). Effect of increased instructional time on student achievement. Educational Review, 71(4), 501–517. [Google Scholar] [CrossRef]
- Zajonc, R. B. (1965). Social facilitation. Science, 149(3681), 269–274. [Google Scholar] [CrossRef]
- Zhang, Q., Wang, R. E., Ribeiro, A. T., Demszky, D., & Loeb, S. (2025). Educator attention: How computational tools can systematically identify the distribution of a key resource for students. arXiv, arXiv:2502.20135. [Google Scholar] [CrossRef]
- Zuengler, J., & Miller, E. (2006). Cognitive and sociocultural perspectives: Two parallel SLA worlds? TESOL Quarterly, 40(1), 35–58. [Google Scholar] [CrossRef]
| Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content. |