Next Article in Journal
Number Line Strategies of Students with Mathematical Learning Difficulties and Students with General Learning Difficulties: Findings Through Eye Tracking
Previous Article in Journal
An Investigation into the Career Aspirations of First-Year Trainee Teachers at Széchenyi István University
Previous Article in Special Issue
How Morphology, Context, Vocabulary and Reading Shape Lexical Inference in Typical and Dyslexic Readers
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Bridging Accessibility Gaps in Dyslexia Intervention: Non-Inferiority of a Technology-Assisted Approach to Dyslexia Instruction

1
Scottish Rite for Children, Dallas, TX 75219, USA
2
Division of Developmental-Behavioral Pediatrics, University of Texas Southwestern Medical Center, Dallas, TX 75390, USA
*
Author to whom correspondence should be addressed.
Educ. Sci. 2025, 15(11), 1460; https://doi.org/10.3390/educsci15111460
Submission received: 31 August 2025 / Revised: 17 October 2025 / Accepted: 18 October 2025 / Published: 2 November 2025
(This article belongs to the Special Issue Students with Special Educational Needs in Reading and Writing)

Abstract

Dyslexia is a highly prevalent learning disability characterized by deficits in specific cognitive and linguistic skills which impair accurate and fluent reading of written words. Intensive, comprehensive, multicomponent interventions are effective in improving outcomes for students with dyslexia, but effective curriculum delivery has traditionally required an educator with substantial training. Many school systems lack sufficient staff with this level of expertise to successfully meet the needs of all their struggling readers. Towards this end, a technology-assisted dyslexia intervention was developed to provide teacher support through a virtual human avatar, resulting in significantly reduced training time while maintaining the comprehensive scope and structure of a traditional intervention model. This paper evaluates the comparative efficacy of the tech-assisted delivery model and the traditional model across two independent substudies. Results from quasi-experimental observational substudies in both laboratory school (n = 82) and public-school (n = 157) samples demonstrate non-inferiority, i.e., comparable student progress in reading and spelling, of the tech-assisted instruction compared to the traditional delivery model. Furthermore, despite differences in the educator training model, implementation fidelity was equivalently strong (>90%) in both groups. Findings provide encouraging evidence towards the scalability of effective dyslexia intervention by providing technology-based support at the level of the teacher. Implications for practice and questions for future research are discussed.

1. Introduction

Dyslexia is a highly prevalent learning disability, affecting approximately 10% of the population (Wagner et al., 2020). Although phenotypic variations are seen across languages, characteristic deficits of dyslexia for alphabetic languages such as English are associated with disruptions in phonological processing, along with other cognitive and linguistic risk factors (Catts & Petscher, 2022; Ring & Black, 2018). In English, the characteristic deficits of dyslexia are code-based, including impairments in phonological processing, word reading, and/or spelling. These impairments manifest in inaccurate and/or inefficient processing of word-level skills (e.g., Melby-Lervåg et al., 2012; Reis et al., 2020). Many individuals with dyslexia also exhibit weaknesses in meaning-based skills which are often secondary to word-level reading and spelling deficits. Primary deficits in word-level skills which are characteristic of dyslexia may be compounded by reduced reading experience, resulting in relative weaknesses in broader meaning-based skills such as reading comprehension and vocabulary. Regardless of etiology, reading and listening comprehension deficits are also common in individuals with dyslexia (Georgiou et al., 2022).
Children with reading difficulties are at greater risk of academic failure and poor educational achievement. However, the consequences of failure to learn to read extend well beyond an academic setting. Children with dyslexia also experience increased rates of internalizing and externalizing problems (e.g., anxiety, depression, or behavioral problems; Francis et al., 2019; Georgiou et al., 2024). In the United States, legislative efforts at both the federal and state levels have aimed at increasing protection and support the identification of and intervention for students with reading disabilities such as dyslexia. The number of students receiving special education services in public schools continues to rise, exceeding 15% of the total student population (Irwin et al., 2024). Under the U.S. Individuals with Disabilities Education Act, all students in public schools are entitled to education which meets their unique learning needs, emphasizing the importance of evidence-based effective instruction at all levels of the curriculum (D. Fuchs & Fuchs, 2006; U.S. Department of Education, 2004). At present, forty-nine states in the US have additional laws in place related to dyslexia identification and/or intervention (National Center on Improving Literacy, 2025). Many states specify characteristics and/or targets of intervention to ensure practices align with the evidence base (Odegard et al., in press). In turn, these efforts have increased the demand for highly trained educators and therapists to provide high quality instruction to the growing population of identified struggling readers. However, the national shortage in certified special educators and reading specialists continues to grow, and these positions are increasingly difficult to fill (Bureau of Labor Statistics, U.S. Department of Labor, 2025; Irwin et al., 2024; National Center for Education Statistics, 2023). Many school systems struggle to implement best practices due to logistical constraints such as insufficient resources and inadequate training (Piasta et al., 2020).
As technological advances make their way into the educational realm, they bring with them the capacity to improve pedagogical efficiency and improve scalability of reading instruction by reducing costs and teacher training demands (e.g., Stein et al., 2021). Many technology-based learning platforms are redesigning educational format at the level of the student—digitizing instruction and using sophisticated algorithms to enable personalized learning and practice experiences. Little is known about the effectiveness of these approaches in supporting the educator for the intensive instructional sequence of a dyslexia intervention.

1.1. Dyslexia Intervention

The heterogenous nature of deficits experienced by struggling readers necessitates comprehensive, multicomponent reading instruction. This instruction supports both code-based (phonological processing and orthographic pattern recognition), as well as supports meaning-based skills (vocabulary instruction, comprehension strategies), and provides ample opportunity for practice and application in a variety of connected and authentic text settings (Castles et al., 2018).
The positive effects of multicomponent interventions on reading outcomes for struggling readers is well documented (Boucher et al., 2024; Gersten et al., 2020; Hall et al., 2023). In addition to providing a breadth of content covering various code and meaning based skills, intervention should be increasingly intensive, often by providing daily, small group or 1-on-1 instructional settings (L. S. Fuchs et al., 2017). For severely impacted students, interventions are also often extensive, spanning more than one academic year, with longer durations yielding stronger effects, particularly for those who demonstrate limited response (Al Otaiba et al., 2023; Wanzek & Vaughn, 2008). Together, these factors can strain the infrastructure of a modern school system, leading to high caseloads, large intervention group sizes, and other logistical challenges which can be a barrier to student learning.

1.2. Traditional Approaches to Teacher Training

In Texas and other states in the US, educators who provide dyslexia intervention are required to complete additional training related to the identification of dyslexia and effective instructional practices to address characteristic deficits of the disorder. However, this level of training does not cover the depth of linguistic structure and therapeutic intervention practices which are characteristic of the advanced training certification requirements for advanced dyslexia specialization. Furthermore, most teacher preparation programs do not cover depth of English language structure required for dyslexia intervention (Cox, 1985). As few as 20% of teacher training programs adequately prepare teachers to deliver evidence-based reading instruction, including advanced phonological awareness and orthographic structure (Greenberg et al., 2014).
Therapy-level training programs designed to develop the skills and knowledge required to provide rigorous intervention for students with dyslexia are extensive, involving both classroom instruction and supervised teaching over several years. Educators who receive this level of training demonstrate greater depth and breadth of knowledge in literacy-related constructs relative to standard teacher preparation, regardless of teacher experience and degree (McMahan et al., 2019). Teachers with this level of training also tend to elicit better outcomes for their students (Porter et al., 2022). However, this level of knowledge takes time to amass and consolidate, with peak performance on measures of teacher knowledge appearing years after completing training (McMahan et al., 2019).
The combined barriers of extensive investment of time and resources necessary for a teacher to attain the necessary credentials, as well as the limited capacities of qualified training facilities to provide training, place significant constraints on the number of extensively trained dyslexia professionals in schools (e.g., dyslexia therapists). Simultaneously, the number of public-school students who are identified as needing access to high quality dyslexia instruction is rising. For example, the number of students with dyslexia in Texas public schools has more than doubled in the past five years (Texas Education Agency, 2024). Hence, the current demand for dyslexia therapists exceeds the number available to provide this intensive and extensive instruction. Together, these factors create a perennial challenge for school systems to provide consistent, high-quality, evidence-based instruction across a diverse educational landscape.

1.3. Rise of Technology-Assisted Interventions

Technology has become an invaluable tool in education, particularly in providing personalized learning experiences, transforming the way both students and educators engage with learning content. Digital solutions can help to provide cost-efficient, scalable, and standardized instruction. Tech-based learning platforms can provide flexible, adaptable content while remaining aligned with legislative or other governing educational standards (Stein et al., 2021; Torgesen et al., 2010). Additionally, tech-based solutions provide opportunities for increased student engagement in an increasingly digital world. However, there is yet to be consensus in the literature on the effectiveness of technology-based instructional methods, particularly for students with learning disabilities and other unique educational needs.
Digitized reading instruction can be used to increase accessibility, individualization, and intensity of reading instruction, while reducing teacher training needs (e.g., Stein et al., 2021; Torgesen et al., 2010). Efforts have long been underway to develop effective, scalable solutions to integrating effective technologies into core learning systems (Connor et al., 2022; Nye et al., 2014). Although many programs demonstrate promising results, findings surrounding efficacy and implementation factors vary (Cheung & Slavin, 2013). A resounding conclusion is the irrefutable importance of teacher involvement: technology may supplement but should not supplant live instruction. This is particularly important for vulnerable learners, such as those at risk for learning disabilities such as dyslexia.
A handful of studies to date have investigated the impact of computer-assisted instruction for struggling readers in a Tier 2-type instructional setting. For example, Chambers et al. (2011) developed a computerized adaptation of a traditional English reading intervention which integrates collaborative learning strategies with computer-based multicomponent reading instruction for students in Grades 1–2 (e.g., Peer Assisted Learning Strategies, McMaster et al., 2006). Lesson components targeted a wide range of reading skills, including phonological awareness, decoding, fluency, vocabulary, reading comprehension, and writing, delivered in 30 min daily sessions over the course of a full academic year. Importantly, the lessons were not led by certified teachers but rather were facilitated by a trained tutor or paraprofessional who could answer questions, monitor engagement, and provide feedback. Results revealed non-inferiority of the computer-assisted instruction in comparison to the traditional live instruction, in that students benefitted as much or more from the computer-assisted instruction as those in traditional settings. A follow-up study using a similar approach confirmed these findings and demonstrated superiority of the computer-assisted instruction over a business-as-usual control in early elementary grades (Madden & Slavin, 2017). In both studies, authors propose that the triadic approach (tutor-student-computer) is a cost-effective alternative to live instruction. A similar study in the Netherlands examined the effects of an extensive computerized Dutch reading intervention for first grade students working one-on-one with trained paraprofessionals, reporting significantly greater outcomes compared to a business-as-usual control group (Regtvoort et al., 2013). The extent to which students benefit from technology-based interventions may also vary based on learner characteristics such as age and baseline reading abilities, and the type of skill targeted (c.f., Barnes et al., 2024; McMaster et al., 2023).
For students with dyslexia, immediate corrective feedback over extended practice opportunities can help to provide an accurate reading model and establish solid foundational knowledge of their language’s orthographic structure. One study has investigated the use of technology-assisted multicomponent reading interventions to supplement teacher-led direct instruction using validated instructional programs (Torgesen et al., 2010). Students received direct instruction from trained teachers, followed by integrated computer-based practice and application activities aligned with the teacher’s lesson content. The instructional timeline was consistent with an extensive reading intervention, with daily 50 min lessons over the course of a full academic year. In one experimental group, the lesson content was developed for use on a computer. A second experimental group received previously developed instruction which was adapted for digital implementation. Students who received either technology-assisted intervention outperformed a business-as-usual control group, though the two experimental groups did not differ from each other. It is not clear whether technology facilitated student learning better than the traditional program designs. Furthermore, in this instance, the implementation of technology at the level of the student did not alleviate the logistical burden of highly trained personnel. Rather, these teachers had extensive training and experience in reading intervention, received 18 h of pre-service training from leading educational researchers, and received more than 50 contact hours over the course of the intervention year to assist with student behavior and performance issues. These students were not identified with dyslexia and their teachers were not trained dyslexia therapists.

1.4. Rationale for the Study

Technology holds promise in both ensuring consistency in the delivery of instructional content while also supporting teacher preparation process, but technology-assisted instruction at the level of the teacher has yet to be examined in the reading intervention literature. In much of the available research, technology is leveraged to provide individualized instruction and practice for students in general education and Tier 2-type intervention settings, with promising results. However, a critical gap in the scalability of specially designed instruction for dyslexia lies at the level of the teacher. Technology can support scalability by improving fidelity and reducing training demands (Benner et al., 2011; Piasta et al., 2020). However, little is known about how technology can be leveraged to support teachers to deliver effective dyslexia intervention while reducing training demands and increasing access to high quality interventions across diverse educational contexts. For example, phonemic awareness and decoding are critical aspects of reading intervention in developing knowledge of alphabetic principle, but these are key elements that many educators are least trained to deliver (Greenberg et al., 2014). Little research has investigated the effectiveness of utilizing technology to present complex aspects of instruction with the same rigor (explicit, systematic, multi-sensory) as dyslexia interventionists to support educators without therapy-level training. No studies to date have evaluated whether and how technology may support dyslexia intervention at the level of the teacher: reducing training demands, providing in-depth documentation through scripted teacher manuals and tutorials, and using technology to present the more complex aspects of instruction to ensure fidelity and accuracy.
Given the rapid rise of technology in education and the need for increased access to evidence-based instruction, a comparison of growth across these two instructional approaches is warranted to examine whether technology-assisted instruction is a viable approach to improving scalability of dyslexia interventions. In educational research, like many other fields, the translational process involves various steps of empirical evaluation to understand factors related to the generalizability of an intervention (e.g., Solari et al., 2020). Efficacy studies are typically conducted under tightly controlled environments, such as a private clinic or through researcher-led implementation studies in schools.
Generalizability of an intervention’s impact is documented through replication of findings and implementation in real-world settings (i.e., effectiveness studies). Effectiveness studies are critical in the translation of intervention science because routine educational settings typically involve a more diverse array of learners and school-related factors (e.g., under-resourced schools, schools with high poverty rates, rural schools) than do clinical settings. These factors, along with methodological considerations such as implementation fidelity and observational study design can contribute to smaller effects for field-based studies (e.g., Cheung & Slavin, 2016; Hulleman & Cordray, 2009; O’Donnell, 2008; Solari et al., 2020; Varghese et al., 2021).
The current study had two objectives: to evaluate the efficacy of a systematic multicomponent dyslexia intervention across traditional and technology-assisted classrooms, and to investigate factors related to the effectiveness and scalability of technology-assisted dyslexia intervention in a routine (i.e., public school) setting. Towards these goals, we conducted two observational substudies with the following aims:
Substudy 1
Aim 1: To provide evidence confirming the effectiveness of the traditional instructional method under study. We expected reading and spelling performance to significantly improve over the course of intervention, in contrast to stable performance during a pre-intervention waitlist period.
Aim 2: To establish evidence supporting the non-inferiority of a technology-assisted approach to delivering the same intervention in a well-controlled setting. We predicted that children across class types would not differ in post-intervention reading and spelling performance.
Substudy 2
Aim 3: To compare characteristics of intervention implementation across groups at the level of lesson delivery through classroom observation. We predicted that groups would not differ in adherence to lesson structure, but that the technology-assisted group would demonstrate more errors related to the quality of delivery (e.g., pacing, redirection, use of Socratic questioning).
Aim 4: To investigate differences across groups in rates of skill development over the course of intervention. We expected students to progress at similar rates of skills development, with foundational skills improving early during the intervention timeline, compared to protracted rates of development for higher-order skills.

2. Methods

2.1. Substudy 1: Effects of the Instruction

2.1.1. Participants

Participants came from a clinically referred sample of students who presented to the clinic with complaints of learning difficulties in an academic setting. Highly trained and experienced clinical staff performed all evaluations. A diagnosis of dyslexia was determined based on child performance on a series of formal and informal reading and language measures, reported family history, school history, and parent and teacher questionnaires. Final diagnostic status was confirmed by both the clinician and attending developmental pediatrician. Children who received a confirmed diagnosis of dyslexia and who did not have access to adequate services at home or school were referred for educational services at the clinic’s laboratory school.
Parents/guardians of laboratory school students were invited to participate in the research study each year. Families were recruited by the research team; parental consent and child assent were obtained for all participants following IRB-approved procedures. Participation in the study was voluntary and all data was collected from standard procedures recorded in the children’s medical record. Whether or not a family decides to participate in the study does not affect the care that they receive at the hospital. Participants were recruited from thirteen consecutive cohorts that were treated for a diagnosed reading disability at a hospital-based learning disabilities clinic.
A sample of 106 students received dyslexia treatment across thirteen consecutive cohorts of students (median cohort size = 5 students, ranging from 2 to 12 students). Of the students that received services during the sample period, 24 did not complete the entire intervention sequence. A total of 82 students completed the curriculum sequence and were included for analysis. The grade range for the sample of students receiving either type of instruction was 2nd to 7th grade (median = 4) with an average age of 9 years and 6 months (SD = 1y; 5 m). The sample was 50% female, 60% white/Caucasian, 10% Black/African American, and 17% Hispanic/Latino. Approximately 32% of the sample had a concurrent diagnosis of ADHD and 7% had a comorbid language disorder. A binomial variable representing maternal education was used as proxy for socioeconomic status. The majority of the sample reported maternal education of a bachelor’s degree or higher (51%).

2.1.2. Procedure

Children who enrolled in the laboratory school in grades 2–8 received one of two approaches to an Orton Gillingham-based treatment program: a traditional, therapist-led approach (TRAD) or a novel technology-assisted (TECH) approach. Intervention assignment was determined by the medical and educational teams at the clinic and generally focused around forming groups of students at similar ages and ability levels. All students completed standardized assessment batteries to track progress over the course of intervention. Scores from these measures were also collected as available from the initial diagnostic evaluation for comparison. All assessments were conducted by highly trained and experienced members of the diagnostic clinic staff, which included educational diagnosticians, speech pathologists, and psychologists. Each case was overseen by an attending developmental pediatrician with expertise in learning disabilities and neurodevelopmental disorders. Results of each assessment were reviewed by a team of physicians, diagnostic staff, and educational staff.
Traditional Dyslexia Instruction
Classrooms following a traditional approach to instruction used Take Flight: A Comprehensive Intervention for Students with Dyslexia (Avrit et al., 2006). The Take Flight intervention is based on Orton-Gillingham principles and is designed for implementation over a minimum of two academic years (i.e., 230 lesson plans). The curriculum was developed for use as a pull-out program in public schools for small groups of four to six students. The structure and efficacy of this intervention has been described in detail elsewhere (Ring et al., 2017). The core decoding instruction covers 96 grapheme-phoneme correspondence situations in English and their written production (handwriting), as well as 44 common affixes and 45 Greek and Latin word elements (approximately 35% of lesson time). The introduction of each reading concept is complemented by phoneme articulation, phoneme awareness, and spelling activities to emphasize associations between graphemes and phonemes and encoding speech sounds in written form (17% of lesson time). Scaffolded repeated reading of words and phrases are designed to facilitate automatic pattern recognition of target phonics concepts and are incorporated in daily lessons as well as systematically throughout the intervention (18% of lesson time). Direct instruction in reading comprehension and vocabulary incorporates multiple evidence-based practices including strategic collaborative reading, comprehension monitoring, and inferencing (30% of lesson time; Beck et al., 1996; Ogle, 1986; Palinscar & Brown, 1984; Klingner & Vaughn, 1998).
The Take Flight intervention is taught by Certified Academic Language Therapists who have extensive training and knowledge in language and reading development, the identification and support of students with dyslexia and reading disabilities, legislative and historical contexts of dyslexia intervention, and an in-depth understanding of the complex, integrated structure of written language (Academic Language Therapy Association, 2024). The training program for this level of certification is extensive: 200 instructional hours (25 total training days over two academic years), with a minimum of 700 practicum teaching hours, completed over the course of two academic years (see McMahan et al., 2019; Porter et al., 2022; Ring et al., 2017 for more information regarding therapist training). All educators delivering traditional instruction in the current study were Certified Academic Language Therapists who had received training from qualified instructors in the implementation of the Take Flight curriculum. Therapists who complete this training meet Texas Education Agency’s requirements for licensure as a Dyslexia Therapist (Texas Education Agency, 2024).
Technology-Assisted Dyslexia Instruction
Technology-assisted classrooms utilized the curriculum Bridges: A Dyslexia Intervention Connecting Teacher, Avatar, and Student. The Bridges program is designed to maintain high-quality, research-based, and highly effective dyslexia instruction while reducing teacher preparation demands. This is achieved by training teachers in evidence-based dyslexia instruction, incorporating a virtual human avatar which presents a portion of the lesson each day. The Bridges program follows the exact scope and sequence of the Take Flight curriculum, with approximately 25% of lesson time incorporating the virtual human avatar and digitized lesson components, presented on a single large screen adjacent to the teacher’s instructional area (i.e., interactive whiteboard) and visible to the entire class (see Figure 1). The selection of lesson components which would be integrated into the avatar’s instruction were determined through a process of extensive interviews and feasibility trials with both experienced and novice curriculum users and students. Specifically, practice item sets were predetermined during curriculum development to target new learning and review previously learned concepts. Prior to beginning practice activities (e.g., decoding application), all relevant phonics concepts, derivative rules and/or spelling patterns, and practice procedures are reviewed by the avatar and lesson-specific content is projected using interactive whiteboard technology (see Table 1 for a demonstrative example). The live teacher is trained to monitor student engagement and understanding and is provided with detailed implementation manuals to help guide practice activities, monitor student performance, and provide feedback as needed.
The avatar is an animated 3-D virtual dyslexia therapist that can reproduce important articulatory gestures relevant for phoneme production in a human-like and social manner. Audio and video performance-capture recordings of the Take Flight curriculum’s lead author delivering each lesson were analyzed using Faceware computer graphics software and animated through Autodesk Maya to replicate accurate facial expressions and articulatory movement in the avatar’s behavior and lesson delivery. The avatar presents complex aspects of structured dyslexia intervention, which considerably alleviates teacher training demands and reduces teacher training time by more than half. The avatar delivers the focus of the lesson content (e.g., auditory and visual discovery of phonics concepts), provides a motion model for written production of the grapheme(s) in cursive, presents curriculum materials in a dynamic and engaging way, and reviews learned decoding concepts and procedures with precise fidelity and accuracy. Teachers are trained to monitor student performance, provide scaffolding and feedback, and ensure instruction proceeds at an appropriate pace based on student need. See Figure 2 for a schematic representation of teacher-led and avatar-led lesson components.
Traditional training programs which prepare educators for an advanced credential in reading intervention and dyslexia (Certified Academic Language Therapist; Academic Language Therapy Association, 2024) require a masters degree in education, psychology, or a related field, 200 additional hours of coursework, and a minimum of 700 supervised teaching hours. The Bridges program was designed to reduce these training requirements while making minimal adjustments to the content or structure of the intervention at the student level. To participate in Bridges training, educators must be credentialed teachers and must complete 40 h of pre-service training, with 40 additional training hours over the first year of intervention. The training program includes curriculum content, design, and practice, as well as lecture-based training in reading development, dyslexia characteristics, and dyslexia identification. The training meets state requirements for teachers to be a Provider of Dyslexia Instruction (Texas Education Agency, 2024).

2.1.3. Measures

Demographic information was collected for each student including age, grade, sex, and race/ethnicity. Maternal education served as a proxy for socioeconomic status. General cognitive performance was measured using the Full Scale IQ composite score from the Wechsler Abbreviated Scale of Intelligence or the Wechsler Intelligence Scale for Children (Wechsler, 2003, 2011). These measures have strong internal consistencies (rs > 0.90) and are highly correlated, r = 0.83.
To track progress over the course of the two-year intervention period, all children complete a standard battery of norm-referenced language and achievement measures at the onset of intervention (pre-test), after the completion of the first academic year (mid-test), and again at the completion of the second academic year (post-test). For participating students, data from these assessments, as well as relevant results and demographic information from the initial diagnostic evaluation, were collected retrospectively from the child’s medical record.
Phonological processing was measured using the Phonological Awareness composite from the Comprehensive Test of Phonological Processing (CTOPP; Wagner et al., 1999, 2013). The PA composite score is derived from three subtests. The Elision subtest requires participants to elide individual phonemes from verbally presented words to form real word responses. The Blending Words subtest requires participants to combine verbally presented phonemes to form real word responses. The Phoneme Isolation subtest requires participants to provide the first, last, or middle sound from verbally presented real words. The composite measure has reported internal consistency of α = 0.92.
Word recognition and spelling were assessed using the Wechsler Individual Achievement Test (WIAT; Wechsler, 2009). The word reading subtest is an untimed measure of letter and letter-sound knowledge and single word reading. Performance on this test involves both word recognition and phonic decoding skills. The word reading measure has reported an internal consistency of α = 0.97 and test–retest reliability of r = 0.92. The spelling subtest requires written spelling from dictation. For early items, examinees write letters that represent sounds, and for later items, examinees write words dictated within the context of a sentence. The spelling measure has reported an internal consistency of α = 0.95 and test–retest reliability of r = 0.94.
Oral reading rate was measured using the Gray Oral Reading Test (GORT; Wiederholt & Bryant, 2012). This measure requires participants to read passages aloud and respond to orally presented questions. Reading rate reflects the speed with which participants read passages aloud. The reading rate subscale has an internal consistency of α = 0.92 and test–retest reliability of r = 0.91.

2.1.4. Data Analysis

Demographic characteristics and baseline performance on literacy outcomes were first compared across groups. Next, we employed a series of analyses to confirm the efficacy of the TRAD treatment before establishing non-inferiority of the TECH treatment. Age-based standard scores were used for all models to examine changes in performance level relative to developmental norms. Group composition for Substudy 1 was first compared across intervention group using a series of t-tests and chi-squared analyses.
The first aim of Substudy 1 was to replicate previous findings supporting the efficacy of the TRAD instruction using a quasi-experimental longitudinal design (Ring et al., 2017). The data collected by the clinic-based laboratory school lends itself to an interrupted time-series design with a variable waitlist period. To address the question of comparative growth across time periods in reading scores, linear mixed effects modeling was used to measure change over time while accounting for dependencies within the data (i.e., time nested within student). Models were fit using maximum likelihood estimation in the nlme package in R (Pinheiro & Bates, 2000; Pinheiro et al., 2023). Unconditional growth models were first fit to assess the variance accounted for by student-level random effects. Intraclass correlations were large, ranging from 0.13 to 0.60, indicating substantial variability in individual student performance. Full models were then fit for each outcome separately including random intercepts and slopes, and fixed effects of age, maternal education, waitlist interval, time, and waitlist*time interactions. Time was coded as a 4-level factor (diagnosis, pre-test, mid-test, post-test); successive differences contrasts were used to compare each timepoint to the timepoint immediately preceding it. This resulted in three time periods of interest: the waitlist period (diagnosis to pre-test), Intervention Year 1 (pre-test to mid-test), and Intervention Year 2 (mid-test to post-test). Waitlist interval was positively skewed and thus log-transformed prior to analysis. Age and maternal education were included in the models to account for variability related to student demographic characteristics. Continuous covariates were centered prior to analysis. Post hoc pairwise comparisons were conducted using the emmeans package (Lenth, 2025). For all models, normality of residuals and random effects were evaluated using histograms and Q–Q plots. Residuals for all models were normally distributed with a mean of approximately zero.
The second aim of Substudy 1 was to demonstrate non-inferiority of the TECH sample in comparison to TRAD sample in a tightly controlled laboratory school setting. Propensity matching was used to ensure group equivalence on all covariates and baseline measures using MatchIt and optmatch packages in R (Hansen & Klopfer, 2006). Propensity matching reduces bias in estimating treatment effects in observational studies for which random assignment is not possible (Fortson et al., 2012; Leite, 2017). Propensity matching was first attempted with 1:1 fixed-ratio matching methods (nearest neighbor and optimal matching without replacement), using logistic regression to estimate propensity scores. Fixed-ratio approaches resulted in poor balance across groups. We then employed an optimal full matching approach, which uses subclassification to assign at least one treated unit and one control to a group, reducing bias in the covariates while preserving sample size and producing robust estimates of treatment effects (Hansen, 2004; Rosenbaum, 1991). Optimal full matching using logistic regression to estimate the propensity score yielded adequate balance across groups.
Propensity scores from the optimal full matching solution were used to weight subsequent models and subclass was used to calculate cluster-robust standard errors in estimating the average effect of treatment on the treated (ATT). All demographic covariates were included in the weighted models to further increase precision (Nguyen et al., 2017; What Works Clearinghouse, 2022). To estimate the ATT, separate weighted linear regression models were fit for each post-intervention outcome separately, with covariates, the pre-test autoregressor, treatment group, and group*pretest interaction as predictors. G-computation was performed to estimate the ATT using the marginaleffects package in R (Arel-Bundock et al., 2024). Matching subclass membership was used to estimate the ATT standard error. Non-inferiority was determined by examining the means and 95% Confidence Intervals for each computed ATT. Non-inferiority was defined a priori as an ATT which did not differ significantly from zero (Walker, 2019).

2.2. Substudy 2: Factors Related to Scalability

2.2.1. Participants

Students were recruited from 2nd through 5th grades at participating campuses across four school districts. Districts ranged in size from 5000 to 38,000 students, serving in urban, suburban, and rural areas in the Southwestern United States. Information about the study was initially shared with eligible educators by district leadership. Educators and other district personnel were invited to attend an information session held by the study team. Educator participation was voluntary and written informed consent was obtained. All participating educators were assigned by district leadership to serve one or more campuses as a dyslexia specialist. Educators providing TRAD instruction had extensive training in the TRAD intervention provided and had obtained certification as an Academic Language Therapist prior to participation in the study. Educators in the TECH group were certified teachers and special educators who had completed district and state requirements regarding dyslexia instruction. None of the TECH educators had prior experience with the TRAD or TECH interventions and were not Academic Language Therapists. Participating educators had an average tenure of 17.88 years in education (SD = 9.38), with 3.5 years’ experience in their current position at time of enrollment (SD = 3.41). Groups did not differ in overall tenure or experience (ps > 0.29).
Eligible students were previously classified by their school as demonstrating characteristics of dyslexia. Those who were assigned to participating teachers were invited to participate in the study. Eligible students and their families were first notified of their school’s participation in the study by their respective dyslexia instructor or district representative. Study information packets were then distributed to all eligible students, and appropriate consent/assent was obtained by the study team for students and their caregivers. Each student was enrolled in the study at entry to their school district-assigned dyslexia intervention program and followed for two academic years. A total of 200 students were recruited over two consecutive academic years as they were beginning their dyslexia intervention program. Of these, 14 were screen failures (outside of target age range, did not qualify for services per district protocol), 24 participated in an intervention other than the TECH and TRAD approaches under study (as part of a larger study on intervention outcomes), and five had incomplete baseline testing precluding them from the propensity matching analysis described below. A total of 157 students were included for analysis. The majority of the sample was female (54%), white/Caucasian (73%), non-Hispanic (66%), and economically disadvantaged (i.e., eligible for free/reduced lunch [FRL]; 50%). Approximately 18% of the students in the sample had an active EL status reported by their district.

2.2.2. Fidelity

The quantification of treatment fidelity is of particular importance when treatment is delivered in a routine setting, as variations in intervention structure and process can impact outcomes (O’Donnell, 2008; Varghese et al., 2021). Each teacher participating in the study was observed at least once per semester by a trained member of the research team and all reports were reviewed by the first author. Inconsistencies in delivery or other concerns related to implementation were discussed with the second author and other members of the study team responsible for teacher training. Observations were designed to measure fidelity across two domains: adherence and quality. Because lesson sequences differ across lesson plans (new learning vs. application days), adherence was measured by calculating the proportion of observed lesson components over the total number of expected components for that day’s lesson. Measured instructional components included PA, word and sentence level reading, connected text reading, spelling, dictation, repeated reading practice, and reading comprehension. Any omissions, substitutions, or other procedural variations were recorded as a deviation from the lesson sequence.

2.2.3. Procedure

Upon enrollment, select information was collected from each student’s school record, including demographic information and results of district dyslexia eligibility determination. Experienced educational diagnosticians, who were blinded to student intervention assignment, completed a comprehensive battery of language assessments with each participating student three times over the course of the intervention: at the start of the intervention (pre-test), at the end of the first year (mid-test), and at the end of the second year (post-test). All testing was conducted in a quiet location on the student’s home campus. Measures used included standardized, norm-referenced measures of encoding, basic reading, and complex reading skills; outcomes in the present analysis were aligned with those reported in Substudy 1 for consistency and comparison. Age-based standard scores were used for all analyses. The TECH and TRAD interventions utilized in Substudy 2 were the same as those used in Substudy 1.

2.2.4. Measures

For each participant, select demographic variables were collected, including age, grade, sex, race, and ethnicity, as well as free/reduced lunch status (as a proxy for SES) and related comorbidities (ADHD, SLI). General cognitive ability (Full Scale IQ) was recorded when available from the school record to help characterize the sample. The exact measures used varied across students, precluding the interpretation of differences in scores across groups.
Phonological awareness and oral reading rate were assessed with the same measures described in Substudy 1. Phonological awareness was measured using the Phonological Awareness composite score from the Comprehensive Test of Phonological Processing 2nd Edition (CTOPP-2; Wagner et al., 2013). Oral reading rate was measured using the GORT-5 (Wiederholt & Bryant, 2012).
Word reading was assessed using the Word Identification subtest of the Woodcock Reading Mastery Test, 3rd Edition (WRMT-3; Woodcock, 2011). Participants were required to read isolated real words with no time constraints. This untimed word-level reading measure has a reported split-half reliability of 0.98.
Spelling was assessed using the Word Identification and Spelling Test (WIST; Wilson & Felton, 2004). The WIST is a nationally standardized assessment designed specifically for students who are struggling with reading and spelling. This test was selected based on the breadth of orthographic patterns included in the stimuli and the structure of the test which allows for detailed error analysis. The Spelling measure has a reported internal consistency of α = 0.98, and strong convergent validity with other common and reliable measures of reading and spelling (i.e., WIAT-II, WRMT-R/NU, TWS; all rs ≥ 0.8).

2.2.5. Data Analysis

Demographic characteristics and baseline performance on literacy outcomes were first compared across groups. Demographic variables included student age, gender, race, ethnicity, SES (as represented by free/reduced lunch eligibility status), and comorbid diagnoses. The estimation of ATTs followed the same procedure described in Substudy 1.
To address the question of comparative growth across groups in reading scores over time, linear mixed effects modeling was used in line with procedures described in Substudy 1. Large ICCs were calculated from initial null models (ICCs > 0.6). Because estimated propensity weights are estimated at the level of the student (i.e., level 2 in the models) weights were not included in the following analyses. To improve model precision and enhance interpretability, each full model included all demographic covariates along with predictors of interest (group, time, group*time). For this study, time was coded as 3-level factor (pre-test, mid-test, post-test) to examine growth over each of the two academic years of intervention.

3. Results

3.1. Substudy 1

Groups were equivalent across all measured demographic characteristics (age, sex, race/ethnicity, maternal education, and comorbidities. The academic and demographic characteristics of the two intervention groups are shown in Table 2 and Table 3. Groups were also equivalent on all pre-intervention measures except for PA, which was significantly weaker for the TECH group compared to the TRAD group, t(76) = 2.73, p = 0.01). Interestingly, the two groups performed similarly on PA at diagnosis. The duration of the waitlist period (i.e., time between initial diagnosis and onset of intervention services) varied for each participant, with an average of 8 months (SD = 8 months). Groups did not differ in duration of waitlist period.
Cohorts did not differ in demographics or pre-intervention reading ability, except for reading rate, which was significantly weaker for one TRAD cohort consisting of only two students with equally low scores (SS = 65). Analyses were run with and without cohort as a covariate to examine potential impacts of variation in student performance each year. Results of models with and without cohort as a covariate did not differ; results of the models without including cohort as a covariate are reported here.

3.1.1. Aim 1: Effect of Traditional Intervention

Parameter estimates and model fit indices for each of the full models are presented in Table 4. Results of the full model estimating growth in PA skills revealed significant variability in intercepts across participants, χ2(1) = 5.46, p = 0.02. Slope also varied across participants, χ2(9) = 68.40, p < 0.001. There was a marginally significant fixed effect of waitlist interval (B = 4.51, SE = 2.62, p = 0.06). There were no significant interaction effects between waitlist interval and any of the time contrasts (ps > 0.51). The successive differences contrasts revealed a marginally significant change in PA skill during the waitlist period (B = 3.61, SE = 2.12, p = 0.09). PA significantly increased during both the first year (B = 9.07, SE = 1.08, p < 0.001) and the second year (B = 4.86, SE = 1.40, p < 0.001) of the intervention. Neither age nor maternal education were reliably associated with PA skill (ps > 0.77).
For word reading, significant variability across participants was found for both intercepts (χ2(1) = 45.47, p < 0.001) and slopes (χ2(9) = 107.05, p < 0.001). The relationship between waitlist interval and WR scores was not statistically reliable, nor did waitlist interval interact with any of the time contrasts. WR performance decreased during the waitlist period (B = −2.36, SE = 0.71, p = 0.001), compared to significant gains in WR during the first intervention year (B = 5.40, SE = 0.76, p < 0.001) and second intervention year (B = 6.24, SE = 0.90, p < 0.001). There was a reliable effect of age such that older students demonstrated weaker scores (B = −0.12, SE = 0.05, p = 0.01). Maternal education was positively associated with WR performance (B = 3.99, SE = 1.69, p = 0.02).
Significant variability in spelling performance was found for both intercepts (χ2(1) = 117.01, p < 0.001) and slopes (χ2(9) = 34.66, p < 0.001). Similar patterns were observed for spelling, such that waitlist interval neither predicted spelling performance nor interacted with any of the time contrasts during the intervention period. There was a moderating trend of waitlist interval on spelling gains during the waitlist period, such that longer waitlist periods were associated with decreases in standard score performance (B = −4.93, SE = 2.62, p = 0.06). Conversely, spelling performance increased during both years of intervention; the effect was only significant in Year 1 (B = 1.53, SE = 0.91, p = 0.09) and more robust in Year 2 (B = 1.45, SE = 0.74, p = 0.04). Post hoc pairwise comparisons confirmed a statistically significant change in spelling performance from pre-test to post-test (B = 3.04, SE = 0.95, p = 0.009). There was a marginally significant effect of maternal education on spelling performance, B = 3.06, SE = 1.61, p = 0.09.
Participants varied significantly in reading rate for both intercepts (χ2(1) = 67.13, p < 0.001) and slopes (χ2(9) = 87.84, p < 0.001). Oral reading rate performance did not reliably change during the waitlist period but significantly improved over the first year of intervention (B = 3.81, SE = 1.46, p = 0.01), and again the second intervention year (B = 5.45, SE = 1.20, p < 0.001). Waitlist interval did not predict reading rate performance, nor did it interact with time for any period of interest (ps > 0.19). Children whose mothers reported higher education levels demonstrated faster oral reading rates (B = 9.69, SE = 3.11, p = 0.004). Age at enrollment did not predict reading rate (p = 0.37).

3.1.2. Aim 2: Non-Inferiority Comparison in Clinical Setting

Propensity matching was first attempted using fixed-ratio approaches; these solutions resulted in poor balance across groups. Optimal full matching using logistic regression to estimate the propensity score yielded adequate balance across groups, with standardized mean differences for all covariates below 0.25 (see Table 5).
Estimated ATTs were small and non-significant for PA (ATT = 1.92, SE = 4.12, p = 0.64), word reading (ATT = 2.09, SE = 4.24, p = 0.62), spelling (ATT = 1.22, SE = 3.07, p = 0.69), and reading rate (ATT = −4.06, SE = 3.72, p = 0.27). Confidence intervals fell within the non-inferiority threshold for each of the four outcome variables, indicating non-inferiority of treatment for children in the TECH group in comparison to those in the TRAD group. Post-intervention outcomes did not reliably favor either instructional approach over the other.

3.2. Substudy 2

Demographic characteristics and baseline performance on literacy outcomes were first compared across groups to examine group equivalence. The TECH and TRAD groups did not differ in age, sex, race, SES, or comorbidities (Table 6). The TECH group consisted of a greater proportion of Hispanic/Latino students and students who were English Learners compared to the TRAD group.
The TECH group performed significantly below the TRAD across all baseline measures (see Table 7). To ensure equivalence across groups, propensity matching was used as described in Substudy 1 to minimize mean differences across covariates and pre-intervention skill level. Propensity matching using fixed-ratio matching methods again resulted in poor balance across groups. Optimal full matching using logistic regression to estimate the propensity score yielded adequate balance across groups, with standardized mean differences for all covariates below 0.25 (see Table 8).
Groups were also similar across fidelity variables. Adherence was high for TRAD (M = 0.96, SD = 0.04) and TECH (M = 0.95, SD = 0.05) teachers and did not differ across groups (p = 0.38). Measures of instructional quality included lesson pacing, use of direct and immediate feedback, and educator knowledge. Quality was also high for both groups, though the TRAD group (M = 0.99, SD = 0.02) marginally outperformed the TECH group (M = 0.96, SD = 0.05, p = 0.07).

3.2.1. Aim 3: Effectiveness in a Routine Setting

Estimated ATTs were small and non-significant for PA (ATT = −1.53, SE = 1.98, p = 0.44), word reading (ATT = 1.94, SE = 1.89, p = 0.31), and reading rate (ATT = −0.12, SE = 1.54, p = 0.94). There was a marginally significant positive effect of TECH instruction on spelling performance (ATT = 3.53, SE = 1. 08, p = 0.05). Post-test performance was similar for TECH and TRAD students across measures, indicating non-inferiority of treatment for children in the TECH group in comparison to those in the TRAD group.

3.2.2. Aim 4: Comparative Growth Across Skills in a Routine Setting

Parameter estimates and model fit indices for each of the full linear mixed models are presented in Table 9. Significant variability was found in intercepts across participants for PA, χ2(1) = 223.77, p < 0.001. Slope also varied across participants, χ2(5) = 12.72, p = 0.03. Results revealed significant gains in PA during the first year (B = 5.66, SE = 1.22, p < 0.001) but not the second year of intervention (p = 0.45). No interactions were found between group and PA growth during either time period. There was a marginally significant effect of group on performance, such that the TECH group performed below the TRAD group (B = −3.64, SE = 1.90, p = 0.06). Race was associated with PA performance (ps < 0.06).
Similar patterns of findings were found for each of the remaining three outcome measures. Significant variability was found across models in participant intercepts, (all ps < 0.001), and slopes, (all ps < 0.001). Significant increases in performance were found for word reading in Year 1 (B = 1.55, SE = 0.72, p = 0.03) and Year 2 (B = 2.93, SE = 1.06, p = 0.006). Word reading performance was marginally weaker for the TECH group relative to the TRAD group (B = −3.31, SE = 1.90, p = 0.08); there were no interactions between group and time. There was a negative association between word reading and baseline age (B = −0.22, SE = 0.07, p = 0.002), as well as race (Black: B = −5.38, SE = 2.57, p = 0.04) and SES (B = −2.25, SE = 1.13, p = 0.05). There were no associations between the other demographic variables and word reading. Spelling ability significantly improved over Year 1 (B = 3.32, SE = 0.72, p < 0.001) and Year 2 (B = 2.43, SE = 0.92, p = 0.009). The TECH group performed marginally below the TRAD group overall (B = −2.90, SE = 1.68, p = 0.09). Group status did not moderate growth over time (ps > 0.11). Of the demographic covariates, there was a negative association only between race/ethnicity and spelling (Hispanic: B = −2.06, SE = 0.88, p = 0.02). Oral reading rate also significantly improved over Year 1 (B = 2.69, SE = 0.84, p = 0.002) and Year 2 (B = 1.82, SE = 0.83, p = 0.03). Growth did not differ across groups over time (ps > 0.15), though the TECH group again performed below TRAD group overall (B = −2.91, SE = 1.61, p = 0.07). SES was associated with reading rate (B = −2.59, SE = 0.99, p = 0.01). There were no associations between the other demographic variables and reading rate (ps > 0.18).

4. Discussion

Over the past several decades, the evidence base for effective practices of reading intervention has established several key characteristics: instruction is most effective when direct, systematic, and cumulative. Although individualized instruction is critical in targeting specific deficits at the level of the child, this can be achieved through multicomponent interventions which integrate evidence-based practices across various reading skills (e.g., Al Otaiba et al., 2023; Castles et al., 2018; Hall et al., 2023; Wanzek et al., 2018). The instructional approach under study in the current paper employs evidence-based practices and is designed specifically for students with significant reading challenges (i.e., dyslexia).
The findings presented here confirm and extend previous evidence of the efficacy of the traditional approach to dyslexia intervention (Ring et al., 2017). Specifically, significant gains were made in reading skills during the intervention which were not evident during the pre-intervention control period for various reading skills including those more resistant to remediation. Findings also support the use of a technology-assisted intervention as an effective approach to dyslexia instruction which produces comparable results to traditional approaches. The two instructional groups did not differ in overall performance level or rate of growth over time, supporting the non-inferiority of technology-based instruction across educational contexts. Furthermore, non-inferiority was established when samples were matched on demographic characteristics and baseline reading scores, but also evident when baseline profiles were not controlled.

4.1. Effect of Instruction

Previous reports demonstrate the effectiveness of the traditional intervention in improving various reading skills for students with dyslexia (i.e., phonological awareness, decoding, word reading, and reading comprehension; Ring et al., 2017). The current study replicated these findings in a more recent sample of students receiving traditional instruction and extended them by examining growth in two additional intervention targets which are less readily remediated: spelling and reading rate. By the end of treatment, students demonstrated substantial growth in all outcome measures, closing or significantly narrowing the gap with their age-equivalent peers based on developmental norms. Furthermore, the gains observed during the intervention period were in stark contrast to patterns of performance during the pre-intervention period. Pre-intervention changes in PA performance were not statistically reliable and modest in comparison to the robust, significant growth observed over each year of the intervention. Post-intervention PA performance was equivalent to the population average (SS = 100, 50%ile). This is in line with previous findings demonstrating the malleability of phonological awareness to instruction, particularly when that instruction follows a developmental continuum of phonological skills and is integrated with phonics instruction (Clemens et al., 2021; NICHD, 2000; Lane et al., 2002). Phonological awareness has an established causal relationship with word-level reading and encoding (e.g., Melby-Lervåg et al., 2012). Growth in phonological awareness in the current study was evident alongside growth in both word reading and spelling skills.
Participants fell further behind their peers in word reading skills during the pre-intervention waitlist period. Conversely, word reading scores significantly improved each year of the intervention, bringing the sample within normal limits by the end of intervention. Spelling performance was stable during the pre-intervention period, but improved over the course of the intervention. The fact that spelling growth was less robust than that seen for PA and word reading was expected, as encoding can be a particularly challenging skill to develop for students with reading disabilities (e.g., Wanzek et al., 2006). In the current study, the highly structured, systematic approach to decoding and encoding instruction integrated with phonological awareness was associated with significant improvements in standard score performance for both reading and spelling. Reading rate was stable over the pre-intervention period, followed by significant growth over both years of intervention. The observed growth in reading rate is a notable finding given the generally slow response of higher order reading skills to intervention and relative stability of reading fluency deficits (e.g., Moll et al., 2020; Torgesen et al., 2010). The repeated reading exercises in the present interventions may have contributed to improvements in the automatic recognition of orthographic patterns and words, thereby improving reading speed (cf., Hudson et al., 2020).
Together with previous reports of intervention effects, observed outcomes provide converging evidence to support the efficacy of the traditional instructional approach in improving reading, and extend those findings to include spelling ability reading rate. Specifically, results revealed significant gains across target outcomes during the intervention period that were not evident during the pre-intervention waitlist period.
Although the duration of the waitlist period varied across participants, waitlist interval did not appear to moderate skill development. Interestingly however, patterns of skill development over the waitlist period varied across outcomes. For example, age-based performance on a measure of PA was greater for those with longer waitlist intervals between diagnosis and pre-test. This may reflect a general maturational process through which sensitivity to the phonological structure of language increases as children experience both spoken and written language, coupled with the fact that longer waitlist intervals result in longer overall time elapsed between diagnosis and completion of the intervention. Waitlist interval also appeared to moderate change in spelling performance during the waitlist period, such that longer intervals were associated with skill regression prior to the intervention. This is unsurprising given the relative difficulty of spelling tasks in populations of impaired readers and the lack of systematic instruction during this time.

4.2. Non-Inferiority of a Technology-Assisted Approach to Intervention

The objective of the study centered around evaluating non-inferiority of a technology-assisted approach as an extension of the validated methods of the traditional approach (Ring et al., 2017). Towards this end, the second aim of the study was to investigate whether the technology-assisted instructional model was inferior in remediating reading and spelling skills compared to traditional instruction across two educational contexts: a well-controlled, clinic-based laboratory school, and a routine public education setting. In both contexts, after matching the groups on demographic characteristics and baseline reading skills, post-intervention performance did not differ for students who received the technology-assisted approach and those who received the traditional approach.
Findings supported our hypotheses, as outcomes for the matched groups did not favor either instructional method for PA, word reading, or reading rate. After equating baseline profiles, the TECH group performed marginally better than those in the TRAD group on a measure of spelling. It is not possible to fully disentangle relative effects of student-level characteristics such as demographics and baseline achievement from the effects of the instruction received. However, one explanation for greater growth in spelling for the TECH group is that the greater baseline performance of the TRAD group corresponds to greater prior knowledge of orthographic structure, leading the TECH group to “narrow the gap” with their less impaired counterparts in TRAD classes. Furthermore, the additional structure imposed by the TECH intervention sequence may have helped bolster adherence to intervention delivery for these classes, as suggested by slightly higher rates of delivery for specific lesson components including spelling and dictation. Despite baseline differences across groups, students responded positively to the instruction, and no instances were reported in which the performance of the TECH group was inferior to that of the TRAD group. As reported in previous studies demonstrating similar effects for traditional and technology-assisted approaches to intervention (e.g., Madden & Slavin, 2017; Torgesen et al., 2010), the current study suggests that technology may be a viable solution to increasing scalability of high-quality instruction by providing support at the level of the teacher without reducing the overall impact of the instructional content.

4.3. Evidence of Scalability in a Routine Setting

4.3.1. Fidelity of Implementation

The second objective of the study was to examine factors related to the effectiveness and scalability of technology-assisted dyslexia intervention in a routine educational setting. Due to the observational nature of this study, an examination of intervention delivery was first warranted to understand how both methods of intervention were implemented in a routine setting. Fidelity observations indicated high levels of adherence and quality for teachers in both groups. Despite differences in the training models, implementation fidelity was equivalently strong in both instructional approaches. Missing lesson components were often due to slow lesson pacing and other restraints on session time. Given the precise timing of the lesson components delivered by the avatar, enhanced structure in lesson progression, and prescriptive nature of the TECH instructional activities, it is not surprising that teachers in this group demonstrated strong adherence to intervention structure.
Though the two groups did not differ in overall adherence, differences in observation rates were noted for several individual lesson components. On new learning days, teachers were most likely to omit lesson components occurring at the end of the lesson (i.e., spelling and dictation practice). On application days, teachers often delivered comprehension instruction or connected text guided reading, but not both. Although these patterns were observed for teachers of both class types, they were slightly (though not significantly) more common in TRAD classes. The structure and prescriptive support of the TECH approach may help teachers adhere to delivery of all lesson components during an intervention session.

4.3.2. Heterogeneity of Student Profiles

Constitutional differences in the school-based sample indicated that students assigned to traditional classes may have less complex risk profiles than those assigned to other reading interventions. Specifically, the TECH group represented higher rates of students from varied ethnic and linguistic backgrounds and were more likely to be from economically disadvantaged homes. These demographic factors are associated with elevated risk for reading problems and have a compounding effect on reading achievement (Solari et al., 2014). It is critical that these students, particularly those at risk for reading difficulties, receive prompt and comprehensive intervention, to mitigate widening achievement gaps over time (Miciak et al., 2022; Middleton et al., 2024). Given the heterogenous nature of the reading deficits demonstrated by students with dyslexia from linguistically diverse and economically disadvantaged homes, an intensive, evidence-based multicomponent reading intervention is well situated to develop multiple reading skills concurrently (Capin et al., 2021; Cho et al., 2019; O’Connor et al., 2019). The present findings provide promising evidence that the TECH approach is effective across educational contexts serving a range of student and teacher backgrounds. These findings may also point to inherent differentiation in the assignment of students to intervention services based on multiple student needs. For example, the greater prevalence of non-native English speakers in TECH classrooms is likely due in part to lower rates of ESL certification for participating TRAD teachers. As shortages in the number of special educators and teachers with specialized training in the education of vulnerable student populations continue to grow, flexibility and feasibility are critical considerations in the development of new approaches to teacher training (Connor et al., 2022). The reduced training demands of a TECH instructional approach like the one studied here may be a more feasible approach to bridging the credentialing gap for educators who support students with various linguistic needs to also provide intensive, multicomponent dyslexia intervention. The intensive and extensive nature of the multicomponent interventions such as those used in this study can provide the comprehensive instructional support which is also beneficial for those from diverse language backgrounds and with high-risk profiles (Capin et al., 2021; Miciak et al., 2022; Middleton et al., 2024; Solari et al., 2014).

4.4. Rate of Skill Development

The present findings are in line with established patterns of skill development during intervention: early, marked growth for foundational skills, along with reliable but relatively modest growth in more complex skills (Boucher et al., 2024; Hall et al., 2023; Hudson et al., 2020; Moll et al., 2020). Early and large standard score gains were found for PA, a foundational skill critical for acquiring the alphabetic principle. This is in line with previous reports of greater malleability of PA skills relative to other reading skills (e.g., Castles et al., 2018; NICHD, 2000). In the clinical sample, significant growth was observed over both years of intervention, whereas modest PA growth was observed during the second intervention year for the public-school sample. By the end of the first year, however, all groups were demonstrating age-appropriate PA skill. Given the continued growth in the other measured outcomes, it appears that this level of PA knowledge is sufficient to support reading and spelling skill development. Indeed, given the 3–5 min allocated to explicit PA activities during intervention lessons, students likely near the optimal cumulative dosage for PA instruction midway through the second intervention year, at which point the focus on PA is greatly reduced (Erbeli et al., 2024). For each of the other three outcomes, standard score performance improved over both years of intervention. Importantly, average post-intervention word reading scores were within normal limits for both groups in both samples, but not for spelling nor reading rate. Growth in these skills is often protracted and may require additional instruction and practice to further approach developmentally appropriate levels (Fletcher et al., 2018; Middleton et al., 2022; Torgesen et al., 2010). Together, these findings provide additional support for the necessity of interventions which are both intensive and extensive, particularly for severely impaired readers (Al Otaiba et al., 2023; L. S. Fuchs et al., 2017; Wanzek et al., 2018; Wanzek & Vaughn, 2007, 2008). Remediation of significant impairments requires an equally significant investment of time and effort, with severe deficits often requiring longer treatment durations.

4.5. Practical Implications

The importance of intensive, explicit phonics-based multicomponent instruction for struggling readers is well established (Castles et al., 2018; NICHD, 2000). Evidence-based best practices have been increasingly integrated into educational legislation mandating rigorous and high-quality instruction for students with dyslexia and other reading problems (Odegard et al., in press; Youman & Mather, 2013). However, the content and training required to deliver reading intervention in public schools varies widely, and most teacher preparation programs do not adequately prepare teachers to deliver comprehensive reading instruction (Greenberg et al., 2014). Although schools are increasingly required to provide evidence-based reading instruction, shortages in educators who are highly specialized in providing comprehensive intervention for students with severe reading deficits limit the scalability of robust interventions in public-school settings. Technology-assisted instruction holds promise in the ability to supplement teacher-led instruction not only through digital and adaptive student-level practice materials (c.f., Stein et al., 2021), but by providing a digital co-teacher who can serve as an additional source of instructional guidance for students, modeling critical concepts of structured literacy with consistency and absolute fidelity. The perfect accuracy of the digital avatar narrows the scope of training necessary for the teacher to focus on curriculum delivery and reading development, increasing the potential for accessibility and scalability of evidence-based dyslexia intervention, particularly in educational systems which are under resourced or geographically isolated.
Furthermore, the educational context in which these technologies are applied may also influence outcomes. In the current study, standard score gains observed for the school-based sample were modest relative to those in the clinic-based sample (Hulleman & Cordray, 2009). Contextual and structural variability in public education settings can reduce intervention impacts (O’Donnell, 2008; Varghese et al., 2021). Therefore, although the digital avatar presents content with perfect fidelity, it is of utmost importance that the educator receive comprehensive training and detailed support materials to ensure that the intervention is implemented correctly and the students are engaged and accurately applying new knowledge. Despite the reduced training time, the technology-assisted instruction in the current study was delivered with high adherence and implementation quality in routine settings. In addition to the avatar delivering elements of instruction with perfect consistency, the added scripting and lesson prompts for teachers may help ensure all lessons are completed within a given session.
Although significant improvements were observed for both samples over time, there was considerable variability at level of the individual student in terms of overall performance and rates of skill development. Age-based standard scores significantly improved for all reading measures, but growth was estimated at the group level and therefore is not generalizable at the individual student level. Individual ability, previous instruction, and a host of other child-level characteristics contribute to the progress achieved by any individual student. Further work is needed to understand how to optimize intervention outcomes for all students across all educational contexts, with and without the integration of curriculum technologies (Al Otaiba et al., 2023).

4.6. Limitations and Future Directions

The findings presented in this paper support the efficacy of instructional model under study and provide evidence toward the scalability of a technology-assisted intervention approach. However, findings should be interpreted within the context of several acknowledged limitations. Although the quasi-experimental approaches described here provide promising evidence for the efficacy of a technology-based intervention approach, groups were assigned through routine practice and not through probability-based assignment. Randomization of group assignment ensures validity of the comparisons under study and therefore the inferences which can be drawn. The non-inferiority of the intervention approaches demonstrated here may not be generalizable to other curricula or other educational contexts. Furthermore, while the current study was employed to understand potential differences in student outcomes elicited across traditional and technology-assisted intervention approaches, the lack of a no-treatment control group precludes any inference of overall efficacy of either treatment. It is not possible to disentangle general cognitive maturation from effects of intervention in the current study without a control comparison. Although students significantly improved in their age-based reading performance over time, the current study does not speak to long-term maintenance of attained skills. Dyslexia is a lifelong condition, and despite the significant gains demonstrated in the current study, these students will likely require continued support by way of intervention and accommodation, particularly as their grade levels advance and content becomes more challenging. Future studies should account for these methodological considerations, utilizing both randomization in assignment and longitudinal follow-up designs to evaluate short- and long-term treatment efficacy for both approaches to instruction. Finally, because the virtual therapist and digital lesson components for the intervention under study focused largely around phonological decoding for instruction in an alphabetic language such as English, and similar approaches to teacher support may not be generalizable to instruction in other orthographies which require less instruction in alphabetic principle and foundational decoding skill.

4.7. Conclusions

Technology has become an invaluable tool in education, particularly in providing personalized learning experiences for students and transforming the way both students and teachers engage with learning content. Technology-assisted instruction may hold promise in its scalability, improving both the efficiency and accessibility of evidence-based instruction in the classroom. By supplementing teacher-led instruction with carefully designed digital elements, more teachers can provide rigorous interventions to students with dyslexia with accuracy and fidelity to process. Findings provide encouraging evidence towards the scalability of dyslexia intervention by providing technology-based support at the level of the teacher.

Author Contributions

Conceptualization, A.E.M., K.J.A. and S.L.F.; Data curation, M.D.; Formal analysis, A.E.M.; Investigation, A.E.M., M.Z. and S.L.F.; Methodology, A.E.M.; Project administration, A.E.M., M.D. and S.L.F.; Resources, K.J.A. and S.L.F.; Software, M.Z. and E.D.; Supervision, S.L.F.; Writing—original draft, A.E.M.; Writing—review & editing, A.E.M., K.J.A., M.Z., E.D., M.D. and S.L.F. All authors have read and agreed to the published version of the manuscript.

Funding

This research received no external funding.

Institutional Review Board Statement

The substudies were conducted in accordance with the Declaration of Helsinki, and approved by the University of Texas Southwestern Medical Center Institutional Review Board (protocol code: STU062011-085, date of approval: 14 July 2011; protocol code: STU2020-0053, date of approval: 12 May 2020).

Informed Consent Statement

Informed consent was obtained from all the participants’ legal guardians/next of kin.

Data Availability Statement

The datasets presented in this article are not readily available because aspects of data are the property of participating districts. Requests to access the datasets should be directed to the corresponding author.

Acknowledgments

We would like to thank the following individuals for their contributions to this work: Jeffrey Black, Jerry Ring, and Gladys Kolenovsky for their roles in the conceptualization and development of the curriculum. We acknowledge Dachia Kearby, Ivonne Tennent, Kathy Robertson, Paul Entzminger, and the Luke Waites Center research team for their support in curriculum implementation and data collection. We also acknowledge Gerald Brinneman, Victor Villarreal, Rudy Avila, Tiffany Lo and other software development contributors for their work on the Virtual Dyslexia project. We thank all the students, families, educators, and other school district personnel who made this research possible. This work was supported in part by a donation from the Moody Foundation.

Conflicts of Interest

All authors are currently employed by Scottish Rite for Children, the not-for-profit publisher of both curricula under study. Authors do not benefit financially from the results reported here. All research was conducted in accordance with the institution’s conflict of interest policies.

References

  1. Academic Language Therapy Association. (2024, January). What is a CALT? Available online: https://www.altaread.org/about/what-is-calt/ (accessed on 14 August 2025).
  2. Al Otaiba, S., McMaster, K., Wanzek, J., & Zaru, M. W. (2023). What we know and need to know about literacy interventions for elementary students with reading difficulties and disabilities, including dyslexia. Reading Research Quarterly, 58(2), 313–332. [Google Scholar] [CrossRef]
  3. Arel-Bundock, V., Greifer, N., & Heiss, A. (2024). How to interpret statistical models using marginaleffects for R and Python. Journal of Statistical Software, 111(9), 1–32. [Google Scholar] [CrossRef]
  4. Avrit, K., Allen, C., Carlsen, K., Gross, M., Pierce, D., & Rumsey, M. (2006). Take flight: A comprehensive intervention for students with dyslexia. Texas Scottish Rite Hospital for Children. [Google Scholar]
  5. Barnes, M. A., Clemens, N. H., Simmons, D., Hall, C., Fogarty, M., Martinez-Lincoln, A., Vaughn, S., Simmons, L., Fall, A.-M., & Roberts, G. (2024). A randomized controlled trial of tutor- and computer-delivered inferential comprehension interventions for middle school students with reading difficulties. Scientific Studies of Reading, 28(4), 411–440. [Google Scholar] [CrossRef]
  6. Beck, I. L., McKeown, M. G., Sandora, C., Kucan, L., & Worthy, J. (1996). Questioning the author: A yearlong classroom implementation to engage students with text. The Elementary School Journal, 96(4), 385–414. [Google Scholar] [CrossRef]
  7. Benner, G. J., Nelson, J. R., Stage, S. A., & Ralston, N. C. (2011). The influence of fidelity of implementation on the reading outcomes of middle school students experiencing reading difficulties. Remedial and Special Education, 32(1), 79–88. [Google Scholar] [CrossRef]
  8. Boucher, A. N., Bhat, B. H., Clemens, N. H., Vaughn, S., & O’Donnell, K. (2024). Reading interventions for students in grades 3–12 with significant word reading difficulties. Journal of Learning Disabilities, 57(4), 203–223. [Google Scholar] [CrossRef]
  9. Bureau of Labor Statistics, U.S. Department of Labor. (2025). Occupational outlook handbook. Special Education Teachers. Available online: https://www.bls.gov/ooh/education-training-and-library/kindergarten-and-elementary-school-teachers.htm (accessed on 14 August 2025).
  10. Capin, P., Cho, E., Miciak, J., Roberts, G., & Vaughn, S. (2021). Examining the reading and cognitive profiles of students with significant reading comprehension difficulties. Learning Disabilities Quarterly, 44(3), 183–196. [Google Scholar] [CrossRef] [PubMed]
  11. Castles, A., Rastle, K., & K., N. (2018). Ending the reading wars: Reading acquisition from novice to expert. Psychological Science in the Public Interest, 19(1), 5–51. [Google Scholar] [CrossRef]
  12. Catts, H. W., & Petscher, Y. (2022). A cumulative risk and resilience model of dyslexia. Journal of Learning Disabilities, 55(3), 171–184. [Google Scholar] [CrossRef]
  13. Chambers, B., Slavin, R. E., Madden, N. A., Abrami, P., Logan, M. K., & Gifford, R. (2011). Small-group, computer-assisted tutoring to improve reading outcomes for struggling first and second graders. The Elementary School Journal, 111(4), 625–640. [Google Scholar] [CrossRef]
  14. Cheung, A., & Slavin, R. (2013). Effects of Educational technology applications on reading outcomes for struggling readers: A best-evidence synthesis. Reading Research Quarterly, 48(3), 277–299. [Google Scholar] [CrossRef]
  15. Cheung, A., & Slavin, R. (2016). How methodological features affect effect sizes in education. Educational Researcher, 45(5), 283–292. [Google Scholar] [CrossRef]
  16. Cho, E., Capin, P., Roberts, G., Roberts, G. J., & Vaughn, S. (2019). Examining sources and mechanisms of reading comprehension difficulties: Comparing English learners and non-English learners within the simple view of reading. Journal of Educational Psychology, 111(6), 982–1000. [Google Scholar] [CrossRef] [PubMed]
  17. Clemens, N. H., Solari, E., Kearns, D. M., Fien, H., Nelson, N. J., Stelega, M., Burns, M., St. Martin, K., & Hoeft, F. (2021). They say you can do phonemic awareness instruction “in the dark”, but should you? A critical evaluation of the trend toward advanced phonemic awareness training. Available online: https://osf.io/preprints/psyarxiv/ajxbv_v1 (accessed on 14 March 2022).
  18. Connor, C. M., May, H., Sparapani, N., Hwang, J. K., Adams, A., Wood, T. S., Siegal, S., Wolfe, C., & Day, S. (2022). Bringing assessment-to-instruction (A2i) technology to scale: Exploring the process from development to implementation. Journal of Educational Psychology, 114(7), 1495–1532. [Google Scholar] [CrossRef] [PubMed]
  19. Cox, A. R. (1985). Alphabetic phonics: An organization and expansion of Orton-Gillingham. Annals of Dyslexia, 35(1), 187–198. [Google Scholar] [CrossRef]
  20. Erbeli, F., Rice, M., Xu, Y., Bishop, M. E., & Goodrich, J. M. (2024). A meta-analysis on the optimal cumulative dosage of early phonemic awareness instruction. Scientific Studies of Reading, 28(4), 345–370. [Google Scholar] [CrossRef]
  21. Fletcher, J. M., Lyon, G. R., Fuchs, L. S., & Barnes, M. A. (2018). Learning disabilities: From identification to intervention (2nd ed.). Guilford Publications. [Google Scholar]
  22. Fortson, K., Verbitsky-Savitz, N., Kopa, E., & Gleason, P. (2012). Using an experimental evaluation of charter schools to test whether nonexperimental comparison group methods can replicate experimental impact estimates (NCEE Technical Methods Report 2012-4019). National Center for Education Evaluation and Regional Assistance, Institute of Education Sciences, U.S. Department of Education. Available online: https://ies.ed.gov/ncee/2025/01/20124019-pdf (accessed on 14 August 2025).
  23. Francis, D. A., Caruana, N., Hudson, J. L., & McArthur, G. M. (2019). The association between poor reading and internalising problems: A systematic review and meta-analysis. Clinical Psychology Review, 67, 45–60. [Google Scholar] [CrossRef]
  24. Fuchs, D., & Fuchs, L. S. (2006). Introduction to response to intervention: What, why, and how valid is it? Reading Research Quarterly, 41(1), 93–99. [Google Scholar] [CrossRef]
  25. Fuchs, L. S., Fuchs, D., & Malone, A. S. (2017). The taxonomy of intervention intensity. Teaching Exceptional Children, 50(1), 35–43. [Google Scholar] [CrossRef]
  26. Georgiou, G. K., Martinez, D., Vieira, A. P. A., Antoniuk, A., Romero, S., & Guo, K. (2022). A meta-analytic review of comprehension deficits in students with dyslexia. Annals of Dyslexia, 72(2), 204–248. [Google Scholar] [CrossRef]
  27. Georgiou, G. K., Parrila, R., & McArthur, G. (2024). Dyslexia and mental health problems: Introduction to the special issue. Annals of Dyslexia, 74(1), 1–3. [Google Scholar] [CrossRef] [PubMed]
  28. Gersten, R., Haymond, K., Newman-Gonchar, R., Dimino, J., & Jayanthi, M. (2020). Meta-analysis of the impact of reading interventions for students in the primary grades. Journal of Research on Educational Effectiveness, 13(2), 401–427. [Google Scholar] [CrossRef]
  29. Greenberg, J., McKee, A., & Walsh, K. (2014). Teacher prep review: A review of the nation’s teacher preparation programs. SSRN. [Google Scholar] [CrossRef]
  30. Hall, C., Dahl-Leonard, K., Cho, E., Solari, E. J., Capin, P., Conner, C. L., Henry, A. R., Cook, L., Hayes, L., Vargas, I., Richmond, C. L., & Kehoe, K. F. (2023). Forty years of reading intervention research for elementary students with or at risk for dyslexia: A systematic review and meta-analysis. Reading Research Quarterly, 58(2), 285–312. [Google Scholar] [CrossRef]
  31. Hansen, B. B. (2004). Full matching in an observational study of coaching for the SAT. Journal of the American Statistical Association, 99(467), 609–618. [Google Scholar] [CrossRef]
  32. Hansen, B. B., & Klopfer, S. (2006). Optimal full matching and related designs via network flows. Journal of Computational and Graphical Statistics, 15(3), 609–627. [Google Scholar] [CrossRef]
  33. Hudson, A., Koh, P. W., Moore, K. A., & Binks-Cantrell, E. (2020). Fluency Interventions for elementary students with reading difficulties: A synthesis of research from 2000–2019. Education Sciences, 10(3), 52. [Google Scholar] [CrossRef]
  34. Hulleman, C. S., & Cordray, D. S. (2009). Moving from the lab to the field: The role of fidelity and achieved relative intervention strength. Journal of Research on Educational Effectiveness, 2(1), 88–110. [Google Scholar] [CrossRef]
  35. Irwin, V., Wang, K., Jung, J., Kessler, E., Tezil, T., Alhassani, S., Filbey, A., Dilig, R., & Bullock Mann, F. (2024). Report on the condition of education 2024 (NCES 2024-144). U.S. Department of Education, National Center for Education Statistics. Available online: https://nces.ed.gov/pubsearch/pubsinfo.asp?pubid=2024144 (accessed on 14 August 2025).
  36. Klingner, J. K., & Vaughn, S. (1998). Promoting reading comprehension, content learning, and English acquisition through Collaborative Strategic Reading (CSR). The Reading Teacher, 52(7), 738–747. [Google Scholar]
  37. Lane, H. B., Pullen, P. C., Eisele, M. R., & Jordan, L. (2002). Preventing reading failure: Phonological awareness assessment and instruction. Preventing School Failure: Alternative Education for Children and Youth, 46(3), 101–110. [Google Scholar] [CrossRef]
  38. Leite, W. (2017). Practical propensity score methods using R. Sage Publications. [Google Scholar]
  39. Lenth, R. (2025). emmeans: Estimated marginal means, aka least-squares means (R package version 1.11.0). R Project.
  40. Madden, N. A., & Slavin, R. E. (2017). Evaluations of technology-assisted small-group tutoring for struggling readers. Reading & Writing Quarterly, 33(4), 327–334. [Google Scholar] [CrossRef]
  41. McMahan, K. M., Oslund, E. L., & Odegard, T. N. (2019). Characterizing the knowledge of educators receiving training in systematic literacy instruction. Annals of Dyslexia, 69(1), 21–33. [Google Scholar] [CrossRef]
  42. McMaster, K. L., Fuchs, D., & Fuchs, L. S. (2006). Research on peer-assisted learning strategies: The promise and limitations of peer-mediated instruction. Reading & Writing Quarterly, 22(1), 5–25. [Google Scholar] [CrossRef]
  43. McMaster, K. L., Kendeou, P., Kim, J., & Butterfuss, R. (2023). Efficacy of a technology-based early language comprehension intervention: A randomized control trial. Journal of Learning Disabilities, 57(3), 139–152. [Google Scholar] [CrossRef] [PubMed]
  44. Melby-Lervåg, M., Lyster, S. A., & Hulme, C. (2012). Phonological skills and their role in learning to read: A meta-analytic review. Psychological Bulletin, 138(2), 322–352. [Google Scholar] [CrossRef] [PubMed]
  45. Miciak, J., Ahmed, Y., Capin, P., & Francis, D. J. (2022). The reading profiles of late elementary English learners with and without risk for dyslexia. Annals of Dyslexia, 72(2), 276–300. [Google Scholar] [CrossRef]
  46. Middleton, A. E., Davila, M., & Frierson, S. L. (2024). English learners with dyslexia benefit from English dyslexia intervention: An observational study of routine intervention practices. Frontiers in Education, 9, 1495043. [Google Scholar] [CrossRef]
  47. Middleton, A. E., Farris, E. A., Ring, J. J., & Odegard, T. N. (2022). Predicting and evaluating treatment response: Evidence toward protracted response patterns for severely impacted students with dyslexia. Journal of Learning Disabilities, 55(4), 272–291. [Google Scholar] [CrossRef]
  48. Moll, K., Gangl, M., Banfi, C., Schulte-Körne, G., & Landerl, K. (2020). Stability of deficits in reading fluency and/or spelling. Scientific Studies of Reading, 24(3), 241–251. [Google Scholar] [CrossRef]
  49. National Center for Education Statistics. (2023). Teacher openings in elementary and secondary schools. Condition of education. U.S. Department of Education, Institute of Education Sciences. Available online: https://nces.ed.gov/programs/coe/indicator/tls (accessed on 14 August 2025).
  50. National Center on Improving Literacy. (2025, May). Dyslexia by the numbers. Available online: https://www.stateofdyslexia.org (accessed on 14 August 2025).
  51. National Institute of Child Health and Human Development (NICHD). (2000). Report of the national reading panel. Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction. U.S. Government Printing Office.
  52. Nguyen, T.-L., Collins, G. S., Spence, J., Daurès, J.-P., Devereaux, P. J., Landais, P., & Le Manach, Y. (2017). Double-adjustment in propensity score matching analysis: Choosing a threshold for considering residual imbalance. BMC Medical Research Methodology, 17, 78. [Google Scholar] [CrossRef]
  53. Nye, B. D., Graesser, A. C., & Hu, X. (2014). AutoTutor and family: A review of 17 years of natural language tutoring. International Journal of Artificial Intelligence in Education, 24(4), 427–469. [Google Scholar] [CrossRef]
  54. O’Connor, M., Geva, E., & Koh, P. W. (2019). Examining reading comprehension profiles of grade 5 monolinguals and English language learners through the lexical quality hypothesis lens. Journal of Learning Disabilities, 52(3), 232–246. [Google Scholar] [CrossRef]
  55. Odegard, T. N., Farris, E. A., Middleton, A. E., Rimrodt-Frierson, S. L., & Washington, J. A. (in press). Recent trends in dyslexia legislation. In C. M. Okolo, N. Patton Terry, & L. E. Cutting (Eds.), Handbook of learning disabilities (3rd ed.). Guilford.
  56. O’Donnell, C. L. (2008). Defining, conceptualizing, and measuring fidelity of implementation and its relationship to outcomes in K–12 curriculum intervention research. Review of Educational Research, 78(1), 33–84. [Google Scholar] [CrossRef]
  57. Ogle, D. M. (1986). K-W-L: A teaching model that develops active reading of expository text. The Reading Teacher, 39(6), 564–570. [Google Scholar] [CrossRef]
  58. Palinscar, A. S., & Brown, A. L. (1984). Reciprocal teaching of comprehension-fostering and comprehension-monitoring activities. Cognition and Instruction, 1(2), 117–175. [Google Scholar] [CrossRef]
  59. Piasta, S. B., Farley, K. S., Mauck, S. A., Soto Ramirez, P., Schachter, R. E., O’Connell, A. A., Justice, L. M., Spear, C. F., & Weber-Mayrer, M. (2020). At-scale, state-sponsored language and literacy professional development: Impacts on early childhood classroom practices and children’s outcomes. Journal of Educational Psychology, 112(2), 329–343. [Google Scholar] [CrossRef]
  60. Pinheiro, J. C., & Bates, D. M. (2000). Mixed-effects models in S and S-PLUS. Springer. [Google Scholar]
  61. Pinheiro, J. C., Bates, D. M., & R Core Team. (2023). nlme: Linear and nonlinear mixed effects models (R package version 3.1-163). Available online: https://CRAN.R-project.org/package=nlme (accessed on 1 June 2024).
  62. Porter, S. B., Odegard, T. N., McMahan, M., & Farris, E. A. (2022). Characterizing the knowledge of educators across the tiers of instructional support. Annals of Dyslexia, 72(1), 79–96. [Google Scholar] [CrossRef]
  63. Regtvoort, A., Zijlstra, H., & van der Leij, A. (2013). The effectiveness of a 2-year supplementary tutor-assisted computerized intervention on the reading development of beginning readers at risk for reading difficulties: A randomized controlled trial. Dyslexia, 19(4), 256–280. [Google Scholar] [CrossRef]
  64. Reis, A., Araujo, S., Morais, I. S., & Faisca, L. (2020). Reading and reading-related skills in adults with dyslexia from different orthographic systems: A review and meta-analysis. Annals of Dyslexia, 70(3), 339–368. [Google Scholar] [CrossRef]
  65. Ring, J. J., Avrit, K. J., & Black, J. L. (2017). Take Flight: The evolution of an Orton Gillingham-based curriculum. Annals of Dyslexia, 67(3), 383–400. [Google Scholar] [CrossRef]
  66. Ring, J. J., & Black, J. L. (2018). The multiple deficit model of dyslexia: What does it mean for identification and intervention? Annals of Dyslexia, 68(2), 104–125. [Google Scholar] [CrossRef] [PubMed]
  67. Rosenbaum, P. R. (1991). A characterization of optimal designs for observational studies. Journal of the Royal Statistical Society: Series B (Methodological), 53(3), 597–610. [Google Scholar] [CrossRef]
  68. Solari, E. J., Petscher, Y., & Folsom, J. S. (2014). Differentiating literacy growth of ELL students with LD from other high-risk subgroups and general education peers: Evidence from grades 3–10. Journal of Learning Disabilities, 47(4), 329–348. [Google Scholar] [CrossRef]
  69. Solari, E. J., Terry, N. P., Gaab, N., Hogan, T. P., Nelson, N. J., Pentimonti, J. M., Petscher, Y., & Sayko, S. (2020). Translational science: A road map for the science of reading. Reading Research Quarterly, 55, S347–S360. [Google Scholar] [CrossRef]
  70. Stein, B. N., Solomon, B. G., Kitterman, C., Enos, D., Banks, E., & Villanueva, S. (2021). Comparing technology-based reading intervention programs in rural settings. The Journal of Special Education, 56(1), 14–24. [Google Scholar] [CrossRef]
  71. Texas Education Agency. (2024). Dyslexia handbook: Procedures concerning dyslexia and related disorders. Available online: https://tea.texas.gov/academics/special-student-populations/dyslexia-and-related-disorders (accessed on 14 August 2025).
  72. Torgesen, J. K., Wagner, R. K., Rashotte, C. A., Herron, J., & Lindamood, P. (2010). Computer-assisted instruction to prevent early reading difficulties in students at risk for dyslexia: Outcomes from two instructional approaches. Annals of Dyslexia, 60(1), 40–56. [Google Scholar] [CrossRef] [PubMed]
  73. U.S. Department of Education. (2004). Individuals with disabilities education improvement act, 20 U.S.C. § 1400. Available online: https://sites.ed.gov/idea/statute-chapter-33/subchapter-i/1400 (accessed on 14 August 2025).
  74. Varghese, C., Bratsch-Hines, M., Aiken, H., & Vernon-Feagans, L. (2021). Elementary teachers’ intervention fidelity in relation to reading and vocabulary outcomes for students at risk for reading-related disabilities. Journal of Learning Disabilities, 54(6), 484–496. [Google Scholar] [CrossRef]
  75. Wagner, R. K., Torgesen, J. K., & Rashotte, C. A. (1999). Comprehensive test of phonological processing. Pro-ed. [Google Scholar]
  76. Wagner, R. K., Torgesen, J. K., Rashotte, C. A., & Pearson, N. A. (2013). Comprehensive test of phonological processing (2nd ed.). Pro-Ed. [Google Scholar]
  77. Wagner, R. K., Zirps, F. A., Edwards, A. A., Wood, S. G., Joyner, R. E., Becker, B. J., Liu, G., & Beal, B. (2020). The prevalence of dyslexia: A new approach to its estimation. Journal of Learning Disabilities, 53(5), 354–365. [Google Scholar] [CrossRef]
  78. Walker, J. (2019). Non-inferiority statistics and equivalence studies. BJA Education, 19(8), 267–271. [Google Scholar] [CrossRef] [PubMed]
  79. Wanzek, J., Stevens, E. A., Williams, K. J., Scammacca, N., Vaughn, S., & Sargent, K. (2018). Current evidence on the effects of intensive early reading interventions. Journal of Learning Disabilities, 51(6), 612–624. [Google Scholar] [CrossRef]
  80. Wanzek, J., & Vaughn, S. (2007). Research-based implications from extensive early reading interventions. School Psychology Review, 36(4), 541–561. [Google Scholar] [CrossRef]
  81. Wanzek, J., & Vaughn, S. (2008). Response to varying amounts of time in reading intervention for students with low response to intervention. Journal of Learning Disabilities, 41(2), 126–142. [Google Scholar] [CrossRef] [PubMed]
  82. Wanzek, J., Vaughn, S., Wexler, J., Swanson, E. A., Edmonds, M., & Kim, A.-H. (2006). A synthesis of spelling and reading interventions and their effects on the spelling outcomes of students with LD. Journal of Learning Disabilities, 39(6), 528–543. [Google Scholar] [CrossRef]
  83. Wechsler, D. (2003). Wechsler Intelligence scale for children (4th ed.). Psychological Corporation. [Google Scholar]
  84. Wechsler, D. (2009). Wechsler individual achievement test (3rd ed.). Psychological Corporation. [Google Scholar]
  85. Wechsler, D. (2011). Wechsler abbreviated scale of intelligence (2nd ed.). Psychological Corporation. [Google Scholar]
  86. What Works Clearinghouse. (2022). Procedures and standards handbook (Version 5). Available online: https://ies.ed.gov/ncee/wwc/Docs/referenceresources/Final_WWC-HandbookVer5.0-0-508.pdf (accessed on 21 September 2023).
  87. Wiederholt, J. L., & Bryant, B. R. (2012). Gray oral reading test—Fifth edition: Examiner’s manual. Pro-Ed. [Google Scholar]
  88. Wilson, B. A., & Felton, R. H. (2004). Word identification and spelling test (WIST). Pro-Ed. [Google Scholar]
  89. Woodcock, R. W. (2011). Woodcock reading mastery test (3rd ed.). WRMT-III. Pearson. [Google Scholar]
  90. Youman, M., & Mather, N. (2013). Dyslexia laws in the USA. Annals of Dyslexia, 63(2), 133–153. [Google Scholar] [CrossRef] [PubMed]
Figure 1. The digital avatar is presented on a monitor adjacent to the interactive whiteboard in the instructional area.
Figure 1. The digital avatar is presented on a monitor adjacent to the interactive whiteboard in the instructional area.
Education 15 01460 g001
Figure 2. Representation of curriculum components led by teacher and digital avatar. Components outlined in red are delivered by the co-teacher avatar and implemented by the live teacher. All elements of other components are led by the live teacher.
Figure 2. Representation of curriculum components led by teacher and digital avatar. Components outlined in red are delivered by the co-teacher avatar and implemented by the live teacher. All elements of other components are led by the live teacher.
Education 15 01460 g002
Table 1. Example of learning sequence for Code and Read Sentence Activity Across Traditional and Technology-Assisted Classrooms.
Table 1. Example of learning sequence for Code and Read Sentence Activity Across Traditional and Technology-Assisted Classrooms.
Lesson Emphasis
Target grapheme: trigraph  ¿  =  ( ‹ )
Activity: Decoding sentences
Example: Thüt  lÅ  d  sÅ  nd  g‰vé  h£  -  ‹té  Ç  fr¿  t!
Activity
Component
TraditionalTechnology-Assisted
PreparationTherapist provides review and preparation of target content for decoding sentences on the whiteboard (e.g., diacritical marking formulas for target concept and other relevant GPCs, affixes, syllable division patterns).Avatar narrates while a simulated whiteboard presented on the monitor builds target decoding concepts, diacritic formulas, and other content to be applied in the practice sentences.
Monitoring and FeedbackTherapist monitors student engagement and understanding prior to assigning practice items, using Socratic questioning and clarification as needed.Educator monitors student engagement and understanding prior to assigning practice items, repeating the presentation of concepts and providing additional clarification and guidance as needed.
PracticeTherapist assigns practice items to each child based on diagnostic teaching principles. While students complete the activity in their student workbook, the Therapist monitors for accuracy and provides immediate corrective feedback.Educator assigns practice items to each child. While students complete the activity in their student workbook, the Educator monitors for accuracy by checking student responses against an answer key and provides corrective feedback.
Review and
Activity Closure
Therapist concludes the activity by reviewing the target new learning concept for the lesson.Educator concludes the activity by reviewing the target new learning concept for the lesson.
Table 2. Substudy 1 sample demographics across groups.
Table 2. Substudy 1 sample demographics across groups.
TRAD
n = 69
TECH
n = 13
Age (y;m)9;7 (1;6)9;4 (1;1)t(80) = 0.58, ns
Gender (%F)49.353.8χ2(1) = 0.09, ns
Race/Ethnicity (%) χ2(3) = 1.75, ns
 White60.975.0
 Black11.60.0
 Hispanic17.416.7
 Other10.18.3
Maternal Education (% High)49.353.8χ2(1) = 0.09, ns
ADHD33.330.8χ2(1) = 0.03, ns
SLI8.70.0χ2(1) = 1.22, ns
Full Scale IQ *101.1 (8.5)102.5 (8.5)t(75) = −0.54, ns
* Full Scale IQ was not available for 5 TRAD participants, nTRAD = 64.
Table 3. Substudy 1 Mean Performance Across Groups.
Table 3. Substudy 1 Mean Performance Across Groups.
Diagnosis Pre-Test Mid-Test Post-Test
nM (SD)nM (SD)nM (SD)nM (SD)
TRAD
 PA6986.25 (8.58)6586.52 (8.66)6995.55 (9.8)69100.43 (12.13)
 Word Reading6475.95 (9.91)6473.89 (8.33)6979.12 (7.79)6985.28 (9.58)
 Spelling6677.91 (10.8)6476.91 (10.02)6978.3 (7.87)6979.83 (8.27)
 Reading Rate5672.14 (10.21)6473.12 (10.40)6977.68 (10.34)6982.75 (11.23)
TECH
 PA1084.20 (7.76)1379.38 (8.43)1391.00 (10.82)1398.46 (12.06)
 Word Reading979.78 (8.33)1373.38 (6.36)1382.12 (10.17)1388.15 (11.99)
 Spelling1077.70 (7.41)1375.46 (7.02)1376.85 (7.70)1379.38 (7.75)
 Reading Rate872.50 (9.63)1374.23 (9.75)1378.84 (7.94)1381.15 (9.16)
Table 4. Substudy 1 Linear Mixed Effects Model Parameters.
Table 4. Substudy 1 Linear Mixed Effects Model Parameters.
Phonological AwarenessWord ReadingSpellingOral Reading Rate
Null
Model
Conditional
Model
Null
Model
Conditional
Model
Null
Model
Conditional
Model
Null
Model
Conditional
Model
Fixed Effects
 (Intercept)91.8 (0.98) ***91.14 (1.46) ***78.65 (0.87) ***76.23 (1.29) ***78.19 (0.94) ***76.45 (1.26) ***76.75 (1.08) ***71.97 (1.56) ***
 Age −0.01 (0.05) −0.12 (0.05) * −0.19 (0.04) *** −0.05 (0.06)
 Maternal Education 0.53 (1.87) 3.99 (1.69) * 3.06 (1.61) + 7.79 (2.06) **
 Waitlist Duration 4.51 (2.62) + −1.42 (2.28) −0.83 (2.28) −2.41 (2.74)
 Waitlist Period 3.61 (2.12) + −2.36 (0.71) ** −0.98 (0.93) 0.40 (0.96)
 Intervention Year 1 9.07 (1.08) *** 5.4 (0.76) *** 1.53 (0.91) + 4.75 (0.99) ***
 Intervention Year 2 4.86 (1.40) *** 6.24 (0.9) *** 1.45 (0.74) * 5.01 (0.86) ***
 Waitlist * Contrast1 1.25 (5.52) −2.75 (2.03) −4.93 (2.62) + −3.64 (2.76)
 Waitlist * Contrast2 −1.95 (2.93) −2.98 (2.09) −0.48 (2.52) −0.58 (2.75)
 Waitlist * Contrast3 −1.45 (3.65) 2.38 (2.34) −1.99 (1.92) −1.92 (2.25)
Random Effects
 σ2157.2814.1159.775.8433.706.992.647.93
 τ00 subject23.6056.5837.0046.8252.0946.422.4766.83
 τ11subject.Contrast1 222.20 17.40 38.96 32.01
 τ11subject.Contrast2 45.96 25.05 38.32 46.25
 τ11subject.Contrast3 100.50 41.26 21.64 33.23
Model Fit
 AIC2097.791976.561932.981779.891843.541809.141922.681812.49
 BIC2108.492051.411943.731855.141854.311884.551933.341887.10
 Marginal R2 0.28 0.27 0.17 0.25
 Conditional R2 0.92 0.94 0.92 0.94
Note: AIC = Akaike Information Criterion, BIC = Bayesian Information Criterion, *** p < 0.001, ** p < 0.01, * p < 0.05, + p < 0.10.
Table 5. Substudy 1 Group Means Before and After PSM Balancing.
Table 5. Substudy 1 Group Means Before and After PSM Balancing.
UnweightedWeighted
TECHTRADStd. Mean
Diff.
TECHTRADStd. Mean
Diff.
distance0.300.130.910.300.280.08
Age−2.55−0.02−0.20−2.55−4.280.13
Maternal Education0.540.56−0.040.540.540.01
White0.770.620.360.770.83−0.14
Black0.000.12−0.400.000.03−0.09
Hispanic0.150.16−0.020.150.100.16
Other Race0.080.10−0.100.080.050.10
PA79.3886.31−0.8279.3879.74−0.04
Word Reading73.3873.43−0.0173.3872.270.18
Spelling75.4676.65−0.1775.4676.44−0.14
Reading Rate74.2372.650.1674.2373.720.05
Table 6. Substudy 2 Sample Demographic Comparisons Across Groups.
Table 6. Substudy 2 Sample Demographic Comparisons Across Groups.
TRAD
n = 78
TECH
n = 79
Age (y;m)8;8 (1;2)8;6 (1;0)t(155) = 0.73, ns
Grade (median)43
Sex (%F)52.654.4χ2(1) = 0.09, ns
Race/Ethnicity (%) χ2(3) = 11.61, p = 0.009
 White55.130.4
 Black12.820.3
 Hispanic20.539.2
 Other11.510.1
SES (% Eligible FRL)41.058.2χ2(1) = 4.64, p = 0.03
ELL9.027.8χ2(1) = 9.28, p = 0.002
ADHD12.83.8χ2(1) = 4.20, p = 0.04
SLI9.012.7χ2(1) = 0.55, ns
Full Scale IQ *105.4 (13.0)92.9 (12.3)
* nTRAD = 42, nTECH = 38.
Table 7. Substudy 2 Mean Performance Across Groups.
Table 7. Substudy 2 Mean Performance Across Groups.
Pre-TestMid-TestPost-Test
TRAD TECH TRAD TECH TRAD TECH
nM (SD)nM (SD)nM (SD)nM (SD)nM (SD)nM (SD)
PA7891.35 (14.69)7986.54 (14.42)7897.01 (14.11)7791.09 (13.34)6599.17 (12.89)6492.29 (13.47)
WR7883.81 (13.1)7978.23 (11.86)7885.35 (14.86)7779.62 (12.58)6689.27 (13.26)6486.26 (14.61)
SP7876.25 (9.79)7971.57 (7.49)7779.57 (11.82)7774.53 (10.47)6683.25 (14.38)6480.19 (15.38)
RATE7883.78 (13.53)7977.97 (11.69)7885.71 (12.34)7782.40 (11.08)6588.38 (11.46)6483.91 (10.59)
PA = Phonological Awareness, WR = Word Reading, SP = Spelling, RATE = Oral Reading Rate.
Table 8. Substudy 2 Group Means Before and After PSM Balancing.
Table 8. Substudy 2 Group Means Before and After PSM Balancing.
UnweightedWeighted
TECHTRADStd. Mean
Diff.
TECHTRADStd. Mean
Diff.
distance0.560.450.810.560.560.01
Age−0.670.85−0.13−0.67−2.200.13
FRL0.580.410.350.580.62−0.07
White0.700.76−0.130.700.670.07
Black0.230.140.210.230.27−0.09
Hispanic0.430.240.380.430.410.05
Other0.080.10−0.100.080.070.03
PA86.5491.35−0.3386.5486.410.01
Word Reading78.2383.81−0.4778.2380.38−0.18
Spelling71.5776.25−0.6371.5771.94−0.05
Reading Rate77.9783.62−0.4877.9780.67−0.23
Table 9. Substudy 2 Linear Mixed Effects Model Parameters.
Table 9. Substudy 2 Linear Mixed Effects Model Parameters.
PredictorsPhonological AwarenessWord ReadingSpellingOral Reading Rate
Null ModelCond. ModelNull ModelCond. ModelNull ModelCond. ModelNull ModelCond. Model
Fixed Effects
 (Intercept)92.36 (1.05) ***96.77 (1.45) ***83.16 (1.03) ***85.91 (1.44) ***76.98 (0.88) ***78.48 (1.25) ***83.21 (0.88) ***85.18 (1.23) ***
 Age −0.08 (0.07) −0.22 (0.07) ** −0.02 (0.05) −0.19 (0.06) **
 Black −13.33 (2.63) *** −5.38 (2.57) * −2.68 (1.86) −2.98 (2.23)
 Other/Multiple Race −6.34 (3.35) + −3.66 (3.27) −0.66 (2.37) −2.12 (2.84)
 Hispanic −1.84 (1.24) −1.56 (1.21) −2.06 (0.88) * −1.09 (1.05)
 SES (low) −1.59 (1.16) −2.25 (1.13) * −1.12 (0.82) −2.59 (0.99) **
 Treatment Group −3.64 (1.9) + −3.32 (1.9) + −2.90 (1.68) + −2.91 (1.61) +
 Intervention Year 1 5.66 (1.22) *** 1.55 (0.72) * 3.32 (0.72) *** 2.69 (0.84) **
 Intervention Year 2 0.85 (1.12) 2.92 (1.07) ** 2.43 (0.92) ** 1.82 (0.83) *
 Group * Year 1 −0.95 (1.72) −0.05 (1.02) −0.37 (1.02) 1.70 (1.18)
 Group * Year 2 −0.89 (1.59) 2.70 (1.52) + 2.08 (1.31) −0.62 (1.18)
Random Effects
 σ2 13.58 9.69 6.79 8.12
 τ00student 121.47 122.36 99.16 87.92
 τ11student:Year 1 85.16 20.16 25.69 35.43
 τ11student:Year 2 57.33 55.20 41.23 28.39
Model Fit
 AIC3310.583310.583286.193168.393249.103022.203158.413069.99
 BIC3384.183384.183298.463242.0293261.373095.813170.663143.47
 Marginal R2 0.24 0.22 0.14 0.23
 Conditional R2 0.94 0.95 0.95 0.94
Note: AIC = Akaike Information Criterion, BIC = Bayesian Information Criterion, *** p < 0.001, ** p < 0.01, * p < 0.05, + p < 0.10.
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Middleton, A.E.; Avrit, K.J.; Zielke, M.; DeFries, E.; Davila, M.; Frierson, S.L. Bridging Accessibility Gaps in Dyslexia Intervention: Non-Inferiority of a Technology-Assisted Approach to Dyslexia Instruction. Educ. Sci. 2025, 15, 1460. https://doi.org/10.3390/educsci15111460

AMA Style

Middleton AE, Avrit KJ, Zielke M, DeFries E, Davila M, Frierson SL. Bridging Accessibility Gaps in Dyslexia Intervention: Non-Inferiority of a Technology-Assisted Approach to Dyslexia Instruction. Education Sciences. 2025; 15(11):1460. https://doi.org/10.3390/educsci15111460

Chicago/Turabian Style

Middleton, Anna E., Karen J. Avrit, Marjorie Zielke, Erik DeFries, Marcela Davila, and Sheryl L. Frierson. 2025. "Bridging Accessibility Gaps in Dyslexia Intervention: Non-Inferiority of a Technology-Assisted Approach to Dyslexia Instruction" Education Sciences 15, no. 11: 1460. https://doi.org/10.3390/educsci15111460

APA Style

Middleton, A. E., Avrit, K. J., Zielke, M., DeFries, E., Davila, M., & Frierson, S. L. (2025). Bridging Accessibility Gaps in Dyslexia Intervention: Non-Inferiority of a Technology-Assisted Approach to Dyslexia Instruction. Education Sciences, 15(11), 1460. https://doi.org/10.3390/educsci15111460

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop