1. Introduction
There is ample evidence to suggest that education has not dramatically changed over recent centuries. Even after the introduction of textbooks, students continue to spend their class time by primarily listening to lectures and taking notes. Why does education seem so immune to transformations? Labaree [
1] argues that education is a far more complex domain than other areas. For example, he compares a typical nuclear power facility with a school. Since every component of a nuclear facility is causally interrelated with the others, it is much easier to trace the source of any deficiencies and fix them accordingly. Schools, conversely, are composed of completely independent units: isolated classrooms. If one classroom performs well, it does not immediately produce an effect in parallel classrooms. Superintendents and principals generally track mean performance across classrooms, and, on average, good and bad performances cancel each other out. As a whole, a school therefore remains highly stable.
However, after several decades of experimental studies introducing ICT for math teaching and learning in K–12, there is still a wide range of impacts. For example, a meta-analysis of 71 evaluations in the United States reported effects by time of use [
2]. The study shows that for evaluated programs where students spent less than 30 min a week, the average effect was 0.06 SD; where students spent between 30 and 75 min, it was 0.20 SD; finally, where students spent more than 75 min, contrary to what one would expect, the result was 0.14 SD. In a more recent study, [
3] reports 14 studies that strongly emphasize the use of technology. Most of them rotate students through technology and non-technology activities. The weighted mean effect size was +0.07. A 2019 study in 26 municipalities in Sweden [
4] found no significant impact of an ICT program on standardized tests in mathematics or language on average, but it could, unfortunately, increase inequality in education. Further, a systematic review on 85 independent evaluations found that shorter ICT programs were much more effective in promoting mathematics achievements than longer ones with a mean effect size of 0.35 SD [
5]. The theoretical framework for the study of ICT in schools highlights the importance of the implementation process and the context in which this implementation is situated [
6]. The integration and final adoption of technological tools relies heavily on these factors. This framework has been supported by empirical evidence of the effect of practice with immediate feedback from peers and teachers and the inclusion of writing justifications for math problems [
7,
8]. In developing countries, results are also diverse. A review of experimental evaluations in developing countries focused on mathematics [
9], reported effect sizes ranging from 0.14 SD for programs with 80 min of weekly computer time in China, to an effect size of 0.35 SD for a program with 120 min of weekly practice in India, and another program with an effect size of 0.28 for 300 min spent using computers during after-school sessions. However, another 300 min per week program in India had an effect size of −0.48. Cristia et al. [
10] and Beuermann et al. [
11] studied a randomized experiment with a 1:1 program in poor regions of rural Peru and found no significant impact on test scores in mathematics or language. De Melo, Machado, and Miranda [
12] found no effects on math or reading scores in the national implementation of a 1:1 program in primary schools in Uruguay. This divergent variety of effects sizes points to the possibility of strong dependence on the type of implementation of the programs.
According to the UNESCO 2013 TERCE assessment, Chile has the highest national average in 6th grade mathematics in Latin America [
13]. However, the Programme for International Student Assessment (PISA) test for fifteen years old students, positions Chile in 59th place out of 78 participating countries. Further, its score is not statistically significantly different from scores of countries such as Kazakhstan, Moldova, Baku (Azerbaijan), Thailand, Uruguay, and Qatar [
14]. Araya et al. [
7], present evidence and theoretical reasons to back the claim that guided technology programs focused on practice, can be effective, efficient, and relatively easy to scale up, under the Chilean context. In [
15], the use of the same platform with the originally designed platform exercises was reported. The effect was computed in 15 fourth grade classes from 11 vulnerable Chilean schools where the platform was used during the full educational year. Measured with the National Standardized test, a paper-based assessment implemented by an independent government agency yearly in all schools, the improvement over previous years was 0.26 SD higher than the national improvement.
Later, in Araya et al. [
16], reported the results of three years of implementation of the same platform in 11 public schools from a low SES urban district in Chile. This included 43 fourth grade classes and 1355 students. Improvement over previous years on the National Standardized fourth grade math test was 0.28 SD higher than the improvement made by a neighboring district with a similar population. Next, in [
17], eight years of use of the same platform and exercises were analyzed. The authors found that on the national standardized test scores, the 80 classes that were under treatment obtained 0.30 SD higher results than on the 32 classes that were not treated. In a more recent study, Araya et al. [
18], experimentally evaluated the platform with a RCT with 48 classes from 24 low-performing primary schools in Chile, where at each school a class was randomly assigned to treatment. It was implemented with two weekly sessions in a computer lab during the whole educational year. The impact was measured with the Chilean National Standardized Exam by using a multilevel model, and a positive effect on math learning of 0.27 SD was found.
Moreover, the Ministry of Education in 2011 and 2012 implemented a paper-based implementation of the “Plan de Apoyo Compartido” (PAC) program. This is a standardized teaching material program that included the support of internal and external pedagogic teams. It was implemented in under-performing schools in Chile. In [
19], Bassi et al. conducted a RCT to estimate the effectiveness of PAC. The intervention improved performance in math for the first cohort of students (effect size of 0.068 and it was statistically significant), but not in the second cohort. Thus, in this paper we study the impact of exclusively using PAC exercises in the ConectaIdeas platform, instead of the standard ConectaIdeas exercises that were previously designed and tested. According to Bowen [
20], the need for customizable platforms that allow teachers to customize materials is perhaps the largest obstacle to widespread adoption of interactive online learning. This study can help determine whether the effect is due to the exercises or the implementation in an online platform.
Hill et al. [
21] reviewed experimental evaluations in education in the US and documented that the average effect on broad standardized tests was 0.07 SD, compared to an average effect of 0.23 SD for narrow standardized tests and to 0.44 SD for specialized tests developed for specific interventions. According to Cheung et al. [
22] effect sizes are roughly twice as large for published articles, small-scale trials, and experimenter-made measures than for unpublished documents, large-scale studies, and independent measures, respectively. In addition, effect sizes are significantly higher in quasi-experiments than in randomized experiments. Moreover, across seven WWC-accepted math studies, the mean effect size was +0.45 for measures with treatment-inherent measures and −0.03 for measures used in the same studies that were not inherent to the treatment [
22].
In this paper, we explore the use of an online platform in an unforeseen environment. First, instead of using the originally designed and improved exercises for the platform, this study implements paper-based exercises designed by the Chilean Ministry of Education. These are valuable exercises product of an extensive recompilation, updated and upgraded in a previous program developed by the Ministry of Education. Moreover, this upgraded program and its exercises have been previously studied [
19] and have shown positive results in math during its first year of implementation, but not for the second year. According to [
19], a possible explanation for this decline is the decrease in rigor of implementation compared to the first year. The first research question is then to estimate whether the effect size is maintained or hopefully increased in the online version. Second, in the middle of the semester and until the end of the implementation, a huge social outbreak shook the country. Several schools closed due to teacher strikes and to social unrest. Thus, the second research question aims to determine whether the online version platform could still impact student learning under this unstable condition, which involved a huge level of student absenteeism, and how much erosion occurred when comparing with effects of previous evaluation of the use of the same online platform in math, for fourth graders.
Particularly in this study, we used a large-scale test to estimate the impact of the program on students’ outcomes. The main contribution of this paper is to measure the effect of a platform when a completely new set of exercises are exclusively used, or from another point of view, when a set of materials with exercises is used in an online platform. Moreover, the social turmoil during the second half of the implementation period had a huge impact on attendance, which turned this study into a rare opportunity to estimate the robustness of the effect of the intervention under difficult contextual circumstances. Missing data was one of the main unexpected challenges faced because of the social turmoil. This paper illustrates the application of multilevel multiple imputation models to deal with missingness in the outcome variable together with the use of multilevel regression models to estimate the program effect size. Finally, we analyzed the effect of the inclusion of at least one open-ended question in each session with written answers and peer review.
3. Results
Relevant student-level covariates in this study include continuous variables such as SEPA-math baseline (SEPA Math Pre), overall attendance, grade point average (GPA), the total number of performed math exercises (NumberExercises), and average length of open-ended math questions (AnswerLength), as well as sex and treatment group indicators. Descriptive statistics for each predictor are presented in
Table 4. On average, students scored 559.67 points in the pretest and had a mean GPA of 5.87. In general, mean attendance was 89.42 percent and the average number of platform exercises was 202. Open-ended questions had a length of 7.2 words on average. Further, the correlation between pre-test and post-test scores was 0.72.
Equation (3) specifies the linear mixed model used to impute missing SEPA-math post results. Further,
Figure 3 illustrates the distribution of SEPA-math post scores for observed (blue) and imputed (red) values after generating twenty multiple imputed datasets. Results suggest that the imputed SEPA-math post imputed values follow a distribution similar to that shown in the observed values:
For each completed data set, the implementation effect size was estimated by fitting the HLM shown in Equation (4). Later, the estimates from each analysis were combined following Rubin’s rules. Results showed a positive significant effect of the treatment on SEPA-math post scores (t(78) = 2.802,
p = 0.035). The intervention effect size was estimated using the covariate adjusted mean difference (regression coefficient) and the unadjusted post-test standard deviation. Thus, the estimated treatment effect size was 0.13 SD and had a variance of 0.0016. Moreover, SEPA pre-test results (t(79) = 16.49,
p = 0.000) and overall GPA (t(35) = 3.85,
p = 0.000) were also significant and had a positive effect on students post-test scores. On the other hand, male students on average showed 2.6 points higher than female students for these scores, but this difference is not significant (t(186) = 1.34,
p = 0.182). Attendance appears to have had a negative effect on overall results (t(136) = −0.41,
p = 0.002).
Table 5 summarizes the final effect size estimates from the HLM model:
Similarly, we estimated the effect size of the average length of students’ answers to math problems by fitting the HLM presented in Equation (5) for each completed data set and then pooling the results. Findings suggested a positive and significant effect of the answer length on SEPA-math post scores (t(222) = 2.053,
p = 0.041). Moreover, the effects of SEPA pre-test results, overall GPA, attendance, and sex followed the same patterns as in Model 4 estimates.
Table 6 summarizes these results.
4. Discussion
The aim of this study was to estimate whether the impact of the use of nationally developed math exercises is maintained or even increased when integrated in an online platform. Results show a 0.13 SD positive effect size of the implementation on students’ outcomes when measured with a large-scale test SEPA-math. This effect corresponds to almost two extra months of learning after translating learning gains according to US year-long learning gains in math for fourth graders [
51]. Further, the effect size achieved by the online platform intervention was double the effect achieved by using the same exercises on a paper version for a whole year.
The vast majority of Chilean urban schools have fiber optic internet connection, which allowed us to convert a paper-based government program of math exercises to an online version. Further, the selection criteria applied in this project, made it possible to ensure that all participating schools had the technological infrastructure required for the use of the online platform in the classroom. During the implementation, students were able to carry out all the activities without internet connection problems.
According to Gottfried [
26], chronic absenteeism can not only have a negative effect on students missing excessive school days, but also has the potential to lower outcomes for other students within a similar educational context. The results shown in this study shed light on the effectiveness of the ContectaIdeas platform under unstable conditions due to the social outbreak, despite the increase in absenteeism during the application period. Moreover, the estimated effect size is almost half the impact achieved with two sessions per week for a complete year when using the same online platform but using pre-designed exercises. Thus, the current implementation has shown to be promising when compared with effects of previous evaluations on the use of the same online tool for fourth graders.
Further, as discussed by Kuhfeld et al. [
52], the effect size in the second semester is 0.01 SD lower than the effect size in the first semester in 4th grade in the US, even though the second semester is longer. Thus, the Average Monthly RIT Gains is 2.00 in the second semester and 2.02 in the first semester. We can then estimate that the yearly overall effect size for this implementation would have been 0.27 SD. However, this estimation does not consider the extra absenteeism in the second semester due to the social outbreak, we could then estimate that under normal conditions, the yearly overall impact of the implementation would have been even higher.
There is evidence that writing can improve learning. A meta study with 6th to 8th graders of 48 writing-to-learn programs [
53] shows that writing can have a small, positive impact on conventional measures of academic achievement. According to the authors, writing can prompt and support the use of rehearsal strategies, elaboration strategies, organization strategies, and comprehension-monitoring strategies. In another more recent meta-analysis of 12 studies Bicer et al. [
8] found an overall effect size of 0.42. Similarly, our findings show a significant positive effect of average length of students’ answers to math problems on math learning. Likewise, recent studies show that incorporating real time monitoring and feedback into online platforms can have a positive impact on overall students’ outcomes in math [
24,
25]. We argue that both components are an essential part of the positive results of the ConectaIdeas online implementation presented here.
Finally, this paper provides evidence of the positive impact of incorporating regular paper-based math exercises into an online platform, as well of the robustness of the effect of an intervention under unique contextual circumstances. Furthermore, it exemplifies the use of multilevel multiple imputation models to handle missing data in the outcome variable, as opposed to purposely deleting observations—complete case deletion—which would have reduced the power of our study and biased the estimated results.
5. Conclusions
The implementation evaluated in this work has important practical implications. First, converting paper-based mathematic exercises—previously used and refined for years by the Ministry of Education—to an online platform, proved to improve the effectiveness of such exercises. This effect is doubled and its significant despite the fact that the number of sessions was reduced from twice to once per week, and the fact that the intervention only lasted one semester. Moreover, the effect was achieved despite the social turmoil that affected the country in the middle of the semester that increased absenteeism to levels much higher than the historical ones.
Second, in each session students were required to answer at least one open question, which included arguments of the procedures and the logic used to solve the problem. These written answers were shared with their peers, who reviewed and commented on the answers. This activity shown to have an effect on student learning. These results contribute to informing policy decisions regarding the use of existing math exercises under an online platform.
Although these findings have shown promise, there are several aspects that require further study and will be addressed in future work. For instance, studying not only the length of the written answers but how they relate to the type of question posted by the teacher. Araya R., et al. [
54] addressed this issue and found that the presence of certain keywords in the question demonstrated to be relevant. However, it is necessary to further extend the study of type of questions using topic models or the natural language processing methods. It is also necessary to analyze the type and “quality” of answers given by students and its relationship to learning.
A second aspect that needs to be further studied is the effect of the strategy of peer collaboration through student assistants, implemented in ConectaIdeas. In each session, a platform module preselects students who are performing well to become candidates for classroom assistants. A couple of students are then selected by the teacher to be teaching assistants during the session. Students can then request help from any assistant or the teacher itself to solve an exercise. Once the assistant is finished, students can evaluate the quality of the help received, and the teacher assistant can also evaluate how well he or she thinks the person who helped understood the explanation. Evaluating the impact of this strategy will require a different experimental study.
A third feature that is important to address is the impact of the platform on teachers’ didactic strategy. In Araya R., et al. and Uribe P., et al. [
55,
56,
57], various classroom observation protocols are used to classify each moment of the session, and different machine learning algorithms are also used to perform automatic analysis of teaching discourse transcriptions. We have been using both methodologies to determine the impact of the use of platforms on teaching strategies. This is work in progress.
Finally, one of the main limitations of this implementation is related to its sustainability and was revealed this year during the quarantine in response to COVID-19. Although most urban underserved schools in Chile have optic fiber internet connections, students at home have very unstable internet. In addition, a big proportion of them rely on their parents’ smartphones for internet connections. Even though the ConectaIdeas platform requires very little internet bandwidth, it does need a stable connection. Thus, the challenge is to adapt the platform to work offline and to accommodate both the interface and the exercises, to facilitate its use on small screen devices. In a future study, we will analyze an offline version of the platform for smartphones that is now being tested by students from vulnerable sectors in Chile and Peru.