Next Article in Journal
Exploring Parent and Teacher Perceptions of Multimodal Educational Games for Engaging Girls in STEM
Next Article in Special Issue
From Theory to Practice, and Back: Student Evidence Testing ZPD, APOS, CLT, and Constructivism in Mathematical Thinking Workshops
Previous Article in Journal
How Children See Geometric Shapes: Eye-Movement Evidence of Developing Structural Reasoning
Previous Article in Special Issue
Cultivating Belonging Through Longitudinal Engagement: Shifts in Student Motivation, Competence, and Agency in a Networked Improvement Community
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Early Gains, Fading Effects: A Quasi-Experimental Evaluation of Mathematical Thinking Workshops for the School-to-University Mathematics Transition in South Africa

by
Mashudu Mokhithi
1,* and
Anita Lee Campbell
2,3
1
Department of Mathematics and Applied Mathematics, University of Cape Town, Cape Town 7700, South Africa
2
Academic Support Programme for Engineering, University of Cape Town, Cape Town 7700, South Africa
3
Centre for Research in Engineering Education & Centre for Wellbeing and Flourishing, University of Cape Town, Cape Town 7700, South Africa
*
Author to whom correspondence should be addressed.
Educ. Sci. 2026, 16(3), 378; https://doi.org/10.3390/educsci16030378
Submission received: 2 December 2025 / Revised: 3 February 2026 / Accepted: 7 February 2026 / Published: 2 March 2026
(This article belongs to the Special Issue Engaging Students to Transform Tertiary Mathematics Education)

Abstract

This study evaluates whether theory-informed, mathematically focused support can ease the school-to-university transition in an unequal South African STEM context. First-year students could voluntarily attend Mathematical Thinking Workshops (MTWs) grounded in constructivism, the zone of proximal development, APOS theory, and cognitive load theory, providing low-threat, collaborative practice with non-routine, representation-rich tasks. Because attendance was self-selected, we used a quasi-experimental design: participation was modeled from pre-university covariates (school-leaving Mathematics and English grades and standardized university preparedness tests in Mathematics and Quantitative Literacy), and MTW participants were matched to comparable non-participants using nearest-neighbor propensity-score matching. Average treatment effects on the treated were estimated for multiple assessments and for a composite score capturing performance on higher-order items within those assessments. MTW participants outperformed matched peers on early first-semester assessments, especially those containing the most higher-order items, indicating that workshops helped when cognitively demanding tasks first appeared. Effects on later, more distal assessments were positive but attenuated, producing an “early gains, fading effects” pattern. Although estimates were imprecise, benefits appeared largest for students who had scored 70–84% in school-leaving mathematics. Overall, the findings suggest that transitional workshops can deliver timely, assessment-visible gains, although these effects may weaken over time when they are not reinforced or well aligned with later summative assessment.

1. Introduction

1.1. Transition Challenges in University Mathematics

The transition from school to university mathematics is widely reported as one of the sharpest academic discontinuities students face, because it requires a move from familiar, procedure-heavy school mathematics to university courses centered on abstraction, formal definitions, and proof (Di Martino et al., 2023; Selden, 2012). This shift is not only cognitive but also affective: once previously reliable strategies no longer work, students often report a loss of confidence, alongside increased anxiety and a sense of not belonging in mathematics (Di Martino et al., 2023; Ellis et al., 2016; Solomon, 2007). These patterns are evident internationally, including in the UK, Australia, and North America, and are especially pronounced in introductory STEM gateway courses that often act as gatekeepers to progression (Rylands & Coady, 2009; Lawson, 2015). In the USA, such courses are commonly described as “weed-out” classes (Seymour & Hunter, 2019); in South Africa, they have been called “Courses Impeding Graduation” (Shay et al., 2020). In South Africa, the challenge is further compounded by historically entrenched inequalities in schooling, multilingual classrooms, and uneven access to experienced mathematics teachers (Böhmer & Wills, 2025; Khoza-Shangase & Kalenga, 2024; Msomi & Rzyankina, 2024), so students with identical admission scores may arrive with markedly different opportunities to learn.

1.2. Limits of Existing Support and the Design Gap

South African universities have invested in extended degrees (Garraway & Bozalek, 2019), foundation programs (Bernard, 2015), supplemental instruction (Nkonki et al., 2023), peer mentoring (Olivier & Burton, 2020), and bridging programs (Hay & Marais, 2004). These supports have value, but most of them privilege content remediation and throughput and pay less attention to the kinds of mathematical thinking, metacognitive regulation, and identity/belonging work that university mathematics demands. This creates a design gap: a student can be helped to pass the current topic but still lack the strategic, reflective, and collaborative habits that make later success more durable. Several authors have noted that many support programs also lack explicit theoretical grounding (e.g., Garraway & Bozalek, 2019), which makes it difficult to specify how they are supposed to work or for whom they work best. MTWs (Mokhithi et al., 2025) address this gap by making the design rationale explicit and by evaluating it rigorously.

1.3. A Theoretically Integrated Workshop Model

Together with the design teams’ experiences and intuition, the MTWs were deliberately grounded in theories of learning to create workshop spaces where students could attempt higher-order, representation-rich tasks with calibrated facilitation and low threat. In broad terms, the zone of proximal development (ZPD) emphasizes socially mediated scaffolding within a learner’s current zone (Vygotsky, 1980; Wood et al., 1976), Cognitive load theory (CLT) focuses on managing intrinsic and extraneous load so that schema construction is possible (Sweller, 1988), and APOS theory conceptualizes students’ mathematical growth as a progression from actions to processes, objects, and coordinated schemas (Arnon et al., 2014; Dubinsky, 1991). Constructivism serves as the overarching pedagogical orientation. The MTWs sit at the intersection of these perspectives: they aimed to create collaborative, low-threat spaces (ZPD) in which carefully designed tasks manage cognitive demand (CLT) while supporting students through APOS-style transitions in their mathematical thinking.
When planning the workshops, constructivism justified open, discussion-orientated tasks; ZPD justified mediated peer work; APOS guided task sequencing from actions to processes and objects; and CLT ensured that the tasks did not overwhelm working memory in already-stressed first-year students. Explicitly articulating and integrating such theoretical frameworks remains relatively uncommon in South African mathematics support programs, where designs are often implemented without an explicit statement of theory. In contrast, the MTWs were deliberately framed as a theory-led intervention, and in a related qualitative study, we show how student voices map onto the ZPD, CLT, and APOS frameworks (Mokhithi et al., 2026).

1.4. Aim and Research Questions

A theory-led design alone does not demonstrate impact. Because workshop attendance was not randomly assigned, the central empirical question becomes causal: did MTW participation actually improve mathematics performance once we account for pre-university achievement? A second, equally important question follows: were any effects uniform, or did some subgroups benefit more? We employed a quasi-experimental design using propensity score methods, cluster-robust inference, Holm correction, and sensitivity analysis to answer these questions.
Beyond establishing whether MTWs “worked,” this article foregrounds a second interpretive concern: whether any observed benefits persist as the semester unfolds. The quantitative analysis revealed a temporal pattern: MTW participants outperformed matched non-participants on early semester assessments and on higher-order items, but the size and certainty of the effect diminished on later, more summative assessments (final exams). In other words, the intervention produced early gains but fading effects. This pattern is important because many transition interventions are evaluated only on end-of-course grades; such an approach would have missed the MTWs’ strongest effects.
This paper reports on the quantitative strand of the MTW study. It determines the extent to which participation in MTWs was associated with improved first-year mathematics performance in an unequal South African STEM context, once selection bias is reduced through propensity score methods. The qualitative strand is already reported in our earlier work (Mokhithi et al., 2025). That study documents plausible mechanisms—increased confidence, time-on-task, peer support, and changes in task approach—that help interpret the quantitative patterns observed here.
Even after conditioning on pre-university performance and applying these relatively strong analytic tools, there remain hidden mechanisms that we cannot directly measure. Because attendance was self-selected, we cannot completely rule out that workshop attendees differed from non-attendees on these unmeasured factors. However, within the limits of the available data, the findings still provide credible, policy-relevant evidence that MTW participation was associated with meaningful, assessment-visible gains. The study addresses the following research questions:
Research Question 1 (RQ1). To what extent do Mathematical Thinking Workshops impact students’ academic performance when controlling for pre-university academic achievement?
Research Question 2 (RQ2). How does the impact of the workshops vary across different levels of pre-university mathematics achievement?
Research Question 3 (RQ3). To what extent do the Mathematical Thinking Workshops impact students’ performance on higher-order assessment items in first-year university mathematics?

2. Materials and Methods

2.1. Context and Setting

The study took place in the first-year mathematics course for science and actuarial science majors at a large, research-intensive South African university. Within the Bachelor of Science programs it serves, this course has long been identified as a bottleneck to graduation: roughly one third of students graduate on time, one third after at least one additional year, and one third leave without completing. Admission to this first-year mathematics course requires a minimum of 70% in high school Mathematics. This entry requirement means that the cohort is already academically selected relative to many first-year mathematics contexts, and the quantitative findings should be interpreted as pertaining to students who meet this threshold within a research-intensive institutional setting. In particular, students scoring below 70% for school-leaving Mathematics, typically served through extended or foundation programs, are not represented in this course and thus fall outside the study’s observable achievement range. The period of interest, 2023, is the post-pandemic return to in-person teaching, when many students had completed the final two years of high school during COVID-19 disruptions and arrived with uneven preparation. The MTWs were offered as additional support, not a replacement for lectures or tutorials. Importantly, MTW participants were fully embedded in the mainstream course: they attended the same lectures and tutorials as non-participants and wrote the same assessments.
The workshops were specifically recommended to students who had obtained below 85% in high school Mathematics, but enrolment was open to all students in the course, and some higher-scoring students chose to attend.

2.2. Design

Because students self-selected into the MTWs, we used a quasi-experimental design based on propensity score methods to approximate the counterfactual “What would these students have achieved without the workshops?” Let T i 0,1 denote workshop participation and Y i 1 ,   Y i 0 the corresponding potential outcomes. Under the standard assumptions of conditional ignorability, ( Y i   ( 1 ) ,   Y i   ( 0 ) ) T i | X i , and stable unit treatment value assumption (SUTVA) (Rosenbaum & Rubin, 1983), comparisons between participants and non-participants with similar propensity scores can recover the average treatment effect on the treated (ATT). Conditional ignorability requires that, once we condition on the observed covariates X, there remain no systematic differences between treated and control units in their potential outcomes—in other words, treatment assignment is as good as random conditional on X; this assumption is not directly testable but is made more plausible by strong covariate balance after matching. SUTVA requires that one student’s participation does not affect another student’s potential outcomes (no interference) and that the treatment is delivered in a well-defined way with no hidden versions; violations of SUTVA or ignorability would bias ATT estimates, which is why we supplement matching with Rosenbaum sensitivity analysis to probe robustness to unmeasured confounding (see Section 2.5).
We estimated each student’s propensity score e ( X i ) = P r ( T i = 1 | X i ) using a logistic regression of treatment status on pre-treatment covariates (marks for school-leaving Mathematics and English and on standardized university preparedness tests in Mathematics and Quantitative Literacy):
l o g i t { e ( X i ) } = l o g e ( X i ) 1 e ( X i ) = β 0 + β 1 S L M a t h i + β 2 S L E n g i + β 3 M a t h P r e p i + β 4 Q L P r e p i
where S L M a t h i and S L E n g i denote student i’s school-leaving Mathematics and English percentage scores, and M a t h P r e p i and Q L P r e p i denote their percentage scores on standardized preparedness tests in Mathematics and Qualitative Literacy. We then implemented 1:1 nearest-neighbor matching without replacement, within a caliper on the logit of the propensity score and restricted to the region of common support, to construct a matched comparison group. Post-matching, ATTs were estimated by comparing outcomes between MTW participants and their matched counterparts using linear models with cluster-robust standard errors and Holm-adjusted p-values (Austin, 2011a; Rosenbaum & Rubin, 1983). In this study, we report student performance using percentage-based grades (commonly referred to as ‘marks’ in South Africa) to ensure clarity for an international audience.

2.3. Participants and Data Sources

The analytic sample consisted of first-year mathematics students enrolled in the target course in 2023. Data were drawn from (a) university admissions records, including marks for school-leaving Mathematics and English, and on standardized university preparedness tests in Mathematics and Quantitative Literacy; (b) workshop attendance registers; and (c) the course assessment database. In 2023, the course assessment structure comprised two tests and one final exam per semester (Test 1F, Test 2F, Exam F, Test 1S, Test 2S and Exam S. Each assessment consisted of three sections: Section A (multiple-choice items), Section B (short-answer questions), and Section C (long-answer questions).
Students with missing key covariates or who fell outside common support on the propensity score or who attended less than 80% of the workshops were excluded from the matched analysis. Additionally, the analytic sample was restricted to first-time BSc entrants and excluded Bachelor of Commerce and Humanities students.
In the institutional dataset extracted for this evaluation (prior to matching restrictions), there were 31 MTW participants and 121 non-participants with the required pre-treatment covariates and assessment records. After applying the pre-specified exclusions (including ≥80% MTW attendance), common-support restrictions, and covariate completeness checks, the matched analytic sample used for ATT estimation consisted of 23 MTW participants and 17 distinct non-participants (with some controls matched to multiple treated students, yielding 23 matched control observations).

2.4. Higher-Order Item Coding

To align the quantitative outcomes with the aims of the MTWs, we conducted an item-level audit of the long-answer section (Section C) of each major assessment (Test 1F, Test 2F, Exam F, Test 1S, Test 2S, Exam S). For each assessment, we flagged as “higher-order” questions that required students to (a) construct or analyze mathematical proofs; (b) apply standard theorems in unfamiliar or non-routine contexts; (c) engage in multi-step conceptual reasoning; or (d) justify solutions with an explicit mathematical argument. Items closely mimicking lecture examples or past papers were excluded, even if they were proof-styled. This avoided conflating genuine higher-order performance with rehearsal or recall.
Within Section C, this process identified the number of higher-order questions in each assessment. Specifically, there were 3 of 7 in Test 1F, 6 of 7 in Test 2F, 3 of 6 in Exam F, 3 of 6 in Test 1S, 2 of 6 in Test 2S, and 3 of 7 in Exam S. The resulting higher-order subscore (HO%) for each assessment was defined as the percentage of available marks earned on the included higher-order items, capped at 100 where bonus marks were available.

2.5. Statistical Analysis

We first estimated each student’s propensity to attend the MTWs using logistic regression (Austin, 2011a; Rosenbaum & Rubin, 1983) with school-leaving Mathematics and English percentages and scores on standardized university-preparedness tests in Mathematics and Quantitative Literacy as predictors. Workshop participants were then matched to non-participants using 1:1 nearest-neighbor matching with replacement and a caliper set to 0.2 of the standard deviation of the logit of the propensity score (Austin, 2011a, 2011b); treated students without an adequately similar control were dropped. Because some control students were matched to more than one treated student, observations in the matched sample are not independent.
Covariate balance before and after matching was assessed using absolute standardized mean differences (ASMDs), and matching was deemed satisfactory when all ASMDs were reduced to below 0.10, indicating negligible residual imbalance (Austin, 2009; Stuart, 2010).
For each assessment outcome, we then estimated the Average Treatment Effect on the Treated (ATT) by fitting linear regression models to the matched sample and computing cluster-robust standard errors to account for the reuse of controls (Abadie & Imbens, 2006). We report ATTs for the total scores on six assessments (four class tests and two final exams). We also report ATTs for the corresponding higher-order (HO) item subscores; to control the familywise error rate across these correlated outcomes, we applied Holm’s step-down procedure.
To examine heterogeneity by prior achievement (RQ2), we did not re-run propensity score matching within each school-leaving mathematics band because the sample sizes within bands were small and overlap was limited. Instead, we estimated an average treatment effect in the overlap population (ATO) using overlap weighting (Orihara et al., 2024): propensity scores were estimated for the full analytic sample, overlap weights were constructed to emphasize students with comparable probabilities of MTW participation in each band, and band-specific effects and treatment × school-leaving mathematics band (70–84% vs. 85–100%) interactions were obtained from overlap-weighted regressions with cluster-robust standard errors.
Finally, we conducted Rosenbaum sensitivity analyses (Wilcoxon signed-rank) on the matched pairs to assess how strong an unobserved confounder would need to be to render the ATTs non-significant at α = 0.05., We interpreted the hidden-bias parameter Γ as the factor by which two matched students may differ in their odds of workshop participation due to unmeasured covariates (Rosenbaum, 1987, 2005).

2.6. Ethical Considerations

The study used secondary analysis of institutional data and workshop records, under ethics clearance from the university. Data were de-identified prior to analysis.

3. Results

This section presents the quantitative findings in three parts. We first report the overall estimated impacts of MTW participation on assessment performance across the semester using propensity score methods. We then examine whether effects vary across students with different levels of pre-university mathematics achievement. Finally, we report results for higher-order assessment performance to evaluate whether MTW participation is associated with gains on more conceptually demanding items and whether this pattern changes over time.

3.1. Early Gains

Propensity-score matching produced a comparison sample in which workshop participants and non-participants were closely aligned on all pre-treatment variables (school-leaving Mathematics and English percentages and scores on standardized university-preparedness tests in Mathematics and Quantitative Literacy). Absolute standardized mean differences (ASMDs) for these covariates were reduced to within commonly accepted thresholds (most ASMDs < 0.10, with preparedness-test Mathematics just above this cut-off), indicating that post-matching contrasts can be interpreted as differences associated with participation rather than pre-existing school advantages.
In Table 1, the calculated post-matching mean for the control group is for 23 students (17 distinct students plus the students who were used multiple times). The post-matching ASMD for NBT Mathematics is >0.10; researchers generally pick 0.10 as a threshold to indicate imbalance (Austin, 2009; Chang et al., 2022), but 0.25 is acceptable (Stuart et al., 2013).
Against this balanced sample, participation in the MTWs was associated with clear, positive effects on the earliest course assessments (Table 2). Average Treatment Effects on the Treated (ATTs) for all main outcomes (four class tests, two exams, and the two-semester final marks) were estimated on the matched sample using cluster-robust standard errors and Holm-adjusted p-values.
On the first test of the first semester (Test 1F), workshop attendees obtained higher mean scores than their matched counterparts. The ATT remained positive after adjusting standard errors for the reuse of controls (cluster-robust SEs) and correcting for multiple tests (Holm’s procedure). In practical terms, the gain was large enough to move a student from barely passing to comfortably passing, or from a marginal pass to mid-range.
The pattern strengthened on the second early assessment (Test 2F): the ATT was even larger and remained highly significant after Holm adjustment, with effect sizes in the large range. These results suggest that the MTWs boosted performance at the very beginning of the course, where many students have historically begun to fall behind.
We assessed robustness to unobserved confounding using Rosenbaum bounds (Wilcoxon signed-rank) on the matched pairs (one-sided, testing for improvement among workshop attendees) (Table 3). For Test 1F, the estimated effect remained statistically significant for hidden-bias parameters up to Γ ≈ 4.2; for Test 2F, up to Γ ≈ 3.35, indicating that a relatively strong unmeasured advantage would be required to explain away these early gains. For most later outcomes, by contrast, Γ* was close to 1, signaling that those estimates are more sensitive to potential hidden bias.
These early-assessment results show that once selection bias is reduced, MTW participation is linked to measurable, assessment-visible improvement at the start of the course. These gains establish the “early” in the “early gains, fading effects” pattern developed in the remainder of Section 3.

3.2. Fading Effects on Later Outcomes

The strong early gains did not translate into equally large or statistically robust improvements on more distal outcomes. For the first semester exam (Exam F) and the first semester final mark (Final F), ATTs remained positive (0.91 and 2.20 percentage points, respectively) but were small in magnitude, with wide confidence intervals spanning zero and negligible effect sizes (Table 2). After Holm adjustment, neither outcome approached conventional significance thresholds, suggesting that the large boosts seen on Test 1F and Test 2F largely dissipated by the end of the semester.
A similar pattern emerged in the second semester. Point estimates for Test 1S, Test 2S, Exam S, and Final S were consistently positive (ranging from about 4.9 to 9.3 percentage points), with small-to-moderate effect sizes, but standard errors were large enough that none of these ATTs survived Holm correction. Test 2S showed the strongest signal among the later assessments (unadjusted p = 0.114; Hedges’ g ≈ 0.57–0.70), yet this still fell short of the more decisive gains observed in the first-semester tests.
Rosenbaum sensitivity analyses reinforce this attenuation. Whereas the early gains on Test 1F and Test 2F remained robust to relatively large hidden-bias parameters (Γ* ≈ 4.20 and 3.35, respectively; Table 3), most later outcomes had Γ* ≈ 1.00, indicating that even modest unmeasured advantages could account for the small, imprecise ATTs. Taken together, the pattern is one of fading effects: clear, sizeable benefits at the start of the course, followed by weaker, less certain advantages on end-of-semester and second-semester outcomes.

3.3. Stronger Effects for Mid-Range Prior Achievement and Higher-Order Items

To address RQ2, we examined whether workshop effects varied by pre-university mathematics achievement. Table 4 shows ATTs estimated separately for students who had scored 70–84% in school-leaving mathematics and those who had scored 85–100%. In both bands, MTW participation was associated with large and statistically robust gains on the early first-semester tests. For students in the 70–84% band, the estimated effects on Test 1F and Test 2F were 22.23 [SE 3.44] and 31.63 [SE 6.96] percentage points, respectively (both Holm-adjusted p-values < 10−5). For students in the 85–100% band, the corresponding gains were slightly smaller but still substantial—19.19 [5.05] and 25.42 [5.42] percentage points—with Holm-adjusted p-values on the order of 10−5–10−4, indicating very strong statistical evidence for a positive workshop effect.
For later outcomes (Exam F, Final F, and all second-semester assessments), point estimates remained generally positive in both school-leaving mathematics bands but were small and imprecise, with Holm-adjusted p-values close to 1. In other words, the “fading effects” pattern documented above held across prior achievement levels. Crucially, formal interaction tests provided no evidence that the early benefits differed systematically by band: p-values for the treatment × band interaction were all > 0.39 (Table 4). The data therefore suggest that students in both achievement bands benefited from the MTWs, with numerically larger but not statistically distinguishable effects for the 70–84% group.
To align more directly with the workshops’ emphasis on non-routine, representation-rich problem solving, we also analyzed higher-order (HO) item subscores (Table 5). Here, the clearest effect again appeared early in the course. On Test 2F, which also had the highest density of HO items in Section C (6 of 7 long-answer questions), MTW participants outperformed matched controls by an estimated 13.53 percentage points on HO items (SE 5.99), with a moderate-to-large effect size (Hedges’ g ≈ 0.74) and an unadjusted p-value of 0.034 (Holm-adjusted p = 0.205). For all other assessments, ATTs on HO items were positive but smaller, with wide confidence intervals spanning zero and no Holm-corrected significance.
These results indicate that the MTWs were most successful for students in the middle-to-high prior achievement (70–84% in school-leaving mathematics) and that the clearest measurable impact lay in performance on higher-order items early in the course. However, given the modest subgroup sample sizes and non-significant interaction tests, these patterns should be read as suggestive rather than definitive: the workshops appear to support students across the full range of prior achievement, with particularly visible early gains on cognitively demanding tasks.

4. Discussion

In this section, we discuss the findings in relation to the three guiding questions of the study. RQ1 asked to what extent MTWs impact students’ academic performance when controlling for pre-university achievement. RQ2 asked whether these effects varied across different levels of pre-university mathematics achievement. RQ3 focused on performance in higher-order assessment items.

4.1. RQ1—Impact on Overall Academic Performance

Research Question 1 asked: To what extent do MTWs impact students’ academic performance when controlling for pre-university academic achievement? The quasi-experimental analyses provide a clear, but time-sensitive, answer. After matching MTW participants to non-participants on covariates, workshop attendance was associated with very large gains on the first two tests of the first semester and much smaller, statistically uncertain differences on later outcomes. Within a bottleneck first-year mathematics course, where prior work shows that a weak mathematical background is strongly associated with failure and attrition (Rylands & Coady, 2009), such early gains are educationally substantial.
These results resonate with classic work on intensive mathematics support and more recent accounts of high-impact educational practices. Treisman’s (1992) pioneering calculus workshops combined high challenge with collaborative, out-of-class problem-solving in small groups and produced substantial performance gains for under-represented students. MTWs share several of these features—regular small-group meetings, demanding non-routine tasks, and a strong social-academic community (Mokhithi et al., 2025)—and the large early ATTs suggest that this model can be productively adapted to a South African STEM context. At a broader level, the MTWs resemble what Kuh (2008) describes as “high-impact practices”: structured experiences that demand time and effort, foster deep engagement with challenging material, and are associated with improved grades and persistence when they are embedded early in students’ programs. The pattern here is similar: when students are given an early, high-engagement mathematical experience, their performance on proximal assessments improves markedly.
The absence of significant effects on final exam performances and cumulative course marks aligns with work showing that the impact of academic support interventions often diminishes as courses progress and content becomes more complex (Wan et al., 2021). This kind of “fade-out” is well documented in education more broadly, including university settings where early-term gains on quizzes or midterm tests do not always carry through to end-of-course examinations and final grades (Bailey et al., 2020; Freeman et al., 2007). Recent evidence from mathematics learning support post-COVID further suggests that students’ engagement with support is strongly shaped by convenience and workload pressures, which can reduce sustained participation as the semester intensifies (Walsh & Guerin, 2025).
Although alignment between workshop emphases and proximal assessments is a plausible contributor to the attenuation of effects, several other mechanisms could also produce an “early gains, fading effects” pattern. First, students’ study practices may shift as the year progresses: early participation can prompt more regular, structured problem-solving and help-seeking, whereas later in the year, intensified time constraints may lead students to revert to shorter, more instrumental study routines (Lorås & Aalberg, 2020). Second, competing academic demands typically intensify later in the year (e.g., overlapping assessments across courses) (Casey et al., 2023), reducing the time available to sustain the deliberate practice and reflection that MTWs aim to cultivate. Third, there may be shifts in instructional emphasis and pacing as content becomes more abstract and coverage pressures increase, which can reduce opportunities to rehearse the metacognitive and conceptual routines foregrounded in the workshops. None of these mechanisms were directly measured in the present study; we therefore treat them as plausible, complementary explanations that motivate future work tracking time-on-task, study strategies, and curriculum/assessment conditions across the year.
Finally, the quasi-experimental findings must be read in light of research on motivation and time-on-task in first-year mathematics. Lishchynska et al. (2023), for example, show that mathematical background and motivational factors (including time invested in independent learning) are both strong predictors of performance in service mathematics modules. Our matching strategy accounts for prior achievement but not for unmeasured variables such as intrinsic motivation, willingness to seek help, or simply the additional hours spent doing mathematics in workshops rather than studying alone. In our Rosenbaum sensitivity analysis (Table 3), an unmeasured confounder would need to be moderately strong to fully explain away the estimated early gains; however, as with any observational study, these analyses cannot eliminate the possibility of residual unmeasured confounding.
The literature and the present results support a cautiously optimistic answer to RQ1: MTWs, as a theoretically informed, workshop-style intervention, can deliver large improvements in early first-year mathematics performance, but these gains are fragile and require sustained, intentionally embedded high-impact practices if they are to persist across the full academic year.

4.2. RQ2—Variation by Pre-University Mathematics Achievement (School-Leaving Mathematics Bands)

RQ2 asked whether the impact of the MTWs differed across levels of pre-university mathematics achievement, operationalized here as two bands of school-leaving mathematics performance (70–84% and 85–100%). In the South African context, school-leaving Mathematics results and standardized preparedness scores function as key sorting mechanisms for entry into mathematically demanding programs, and they are consistently associated with performance in first-year university mathematics and related STEM courses (Mabizela & George, 2020; Mosia et al., 2025; Schoer et al., 2010). The present analysis therefore probes a subtle question: once students have crossed a relatively high school-leaving Mathematics threshold and are co-enrolled in the same first-year mathematics course, does an additional workshop-style intervention benefit those at the lower end of this high-achieving spectrum more, less, or about the same as those at the very top?
The interaction models showed that early MTW effects were positive and large in both bands of school-leaving mathematics performance, with somewhat larger point estimates in the 70–84% band on the first two tests. However, treatment × band interaction terms were not statistically significant, so we find no reliable evidence that early benefits differed systematically between bands. In other words, within the range of relatively strong school performers admitted to this course, the workshops acted more as a general accelerator than a targeted “rescue” for a particular subgroup. This complements system-level work showing that higher school-leaving mathematics thresholds substantially improve the odds of university success (Hunt et al., 2011; Schoer et al., 2010): among students who have already achieved those thresholds, additional, theoretically grounded support can still shift trajectories, with gains evident not only for students whose school-leaving mathematics marks lie within roughly fifteen percentage points of the entry cutoff but also across the broader range of prior achievement.
At the same time, the absence of significant interactions should not be over-interpreted as evidence that prior achievement is irrelevant to how students experience or use the workshops. International research increasingly shows that the factors associated with high and low mathematical performance differ across the achievement spectrum—motivational profiles, self-efficacy, and classroom affordances do not operate identically for high and low achievers (Saglam & Goktenturk, 2024). Even within relatively selective cohorts, high achievers often display more mature self-regulated learning patterns, while students closer to the threshold are more vulnerable to shifts in confidence and perceived competence (Hirt et al., 2021). The present findings suggest that a single workshop design, grounded in constructivism, ZPD, APOS and CLT, was sufficiently flexible to generate early benefits across these different profiles, at least in terms of assessment outcomes.
There is also a methodological caveat: because admission to the target mathematics course requires at least 70% in school-leaving mathematics, the analytic sample necessarily excludes students below this threshold, and sample sizes within each NSC band were modest. This means that the analysis cannot speak to the full “ability spectrum” often invoked in discussions of differentiated support, nor can it rule out practically meaningful but statistically undetected differences between the bands. Studies of differentiated instruction and targeted mathematics interventions routinely find that prior achievement moderates who benefits most, with some designs disproportionately supporting lower-achieving learners and others extending high achievers’ trajectories (Onyishi & Sefotho, 2021). In contrast, the MTWs appear to have functioned as a broadly beneficial, mixed-ability intervention for students who had already cleared demanding school-leaving mathematics thresholds. Future work with larger cohorts and a wider NSC range could test more finely tuned banding and explore whether adapting tasks or facilitation strategies yields differential gains without undermining the inclusive ethos of the workshops.

4.3. RQ3—Impact on Higher-Order Assessment Performance

RQ3 asked: To what extent do the MTWs impact students’ performance on higher-order assessment items in first-year university mathematics? The higher-order (HO) subscores give a more focused view of the “early gains, fading effects” pattern. Across all assessments, ATTs on HO items were directionally positive but generally modest and imprecise. The clearest signal appeared on Test 2F, where MTW participants outperformed matched controls by about 13.5 percentage points on HO items, with a moderate-to-large effect size and an unadjusted p-value below 0.05 (though not significant after Holm correction). This is also the assessment with the highest density of HO questions in Section C (6 of 7 long-answer items), making it structurally closest to the problems emphasized in the workshops.
This close correspondence between Test 2F and the workshop tasks is therefore notable. The HO items in Test 2F correspond closely to what the literature describes as high-level, cognitively demanding tasks—non-routine, representation-rich problems that require students to explain and justify their thinking rather than execute familiar procedures (Ni et al., 2017; Smith et al., 2008). The fact that the strongest HO effect emerges precisely on the assessment where such items are most concentrated suggests that at least some of the reasoning rehearsed in MTWs did transfer to individual, assessed performance, but mainly when assessment design and workshop design were pulling in the same direction.
By contrast, HO effects on later assessments were smaller and highly uncertain, with wide confidence intervals and no Holm-corrected significance. This is consistent with evidence that the cognitive demand of “high-level” tasks often erodes in practice and that high-stakes exams can push teaching and learning towards routinised past-paper preparation rather than sustained engagement with complex problems (Göloğlu Demir & Kaplan Keleş, 2021; Parrish & Bryd, 2022; Zakharov & Carnoy, 2021).
Overall, the RQ3 results suggest that MTWs can support higher-order performance, but only under fairly strict conditions: higher-order items must be prominent in the assessment, closely resemble the kinds of tasks used in the workshops, and appear relatively soon after workshop exposure. Where those conditions held—as in Test 2F—moderate gains were visible; where they did not, the workshops’ impact on HO subscores was much harder to detect in the quantitative results.

4.4. Theoretical Contributions

This study makes a theoretical contribution by examining a workshop model that was explicitly assembled from four strands: constructivism as the overarching orientation, with the ZPD, APOS theory, and CLT providing more fine-grained design guidance. The “early gains, fading effects” pattern suggests that this composite framework is applicable to a South African STEM context: when students encounter demanding, representation-rich tasks in a supportive setting closely aligned with early assessments, performance improves in ways that are both conceptually meaningful and assessment-visible. At the same time, the attenuation of effects on later assessments suggests that these theoretical resources are not sufficient on their own; their impact depends on sustained exposure and on how well the surrounding curriculum and assessment context take up the same principles.

4.5. Limitations and Future Work

Several limitations qualify these claims. First, because treatment was not randomly assigned, even with propensity-score methods, balance diagnostics, and sensitivity analysis, unmeasured differences (e.g., motivation, help-seeking, time-on-task) may partly account for the observed advantages.
Second, the analytic sample is modest, drawn from a single research-intensive institution, and restricted to students with school-leaving Mathematics ≥ 70%; as a result, the findings generalize most directly to academically selected cohorts in mainstream first-year mathematics settings with similar entry thresholds, curricula, and assessment structures. This is an important boundary condition: students below this threshold (often enrolled in extended curriculum or foundation pathways) may respond differently to MTWs, and the present design cannot determine those effects. Moreover, the size of learning gains in a larger (expanded) sample is not obvious. Lower-prepared students may show larger gains because they have more room to improve, yet they may also benefit less if workshop tasks presuppose prerequisite fluency that is not yet secure or if foundational gaps require a different sequencing of support. Similarly, transfer to other institution types (e.g., teaching-focused universities, historically disadvantaged institutions, or colleges) may be moderated by differences in intake, resourcing, class sizes, teaching emphases, and the local context of academic support. We therefore treat the present results as contextually situated evidence and prioritize replication across institutions and across a broader preparedness spectrum as a key direction for future work.
Third, higher-order performance was proxied by a limited number of long-answer items whose classification, while principled, necessarily involved judgment and was constrained by existing assessments. Fourth, outcomes were limited to marks within a single year; the study cannot speak to longer-term effects on retention, degree completion, or performance in subsequent courses. Fifth, the robustness checks themselves have caveats: the Rosenbaum sensitivity analysis relied on the Wilcoxon signed-rank statistic, which tends to report greater sensitivity to unmeasured bias than some alternative, equally reasonable test statistics, because it gives less weight to pairs with small absolute differences (Rosenbaum, 2010, 2012). In that sense, the Γ values reported here may somewhat overstate how vulnerable the estimates are to hidden bias, even though they do not remove the need for causal caution.
These limitations point directly to future work. A first priority is replication and extension: multi-cohort and multi-site studies with larger samples. Second, future studies should incorporate richer measures of mechanism, capturing not only marks but also non-cognitive elements, so that the pathways from participation to performance can be modeled explicitly; in particular, combining administrative indicators and survey measures of time-on-task and help-seeking with validated measures of affect (e.g., mathematics anxiety and self-efficacy), belonging, and mathematical identity would allow more direct tests of whether MTWs generate durable affective/identity shifts even when performance effects attenuate. Finally, there is scope for longitudinal tracking to examine whether early gains in higher-order reasoning and confidence, even if they fade in immediate exam scores, leave traces in later courses or in students’ persistence in mathematically demanding degrees. In practical terms, a next step would be to link MTW participation to administrative outcomes such as pass rates and marks in subsequent mathematics courses (e.g., second-year modules), year-to-year progression in mathematically demanding degrees, and longer-term retention/completion, to test whether early gains leave detectable traces beyond first-year assessments.

5. Conclusions

The quasi-experimental results provide the strongest support for the conclusion that MTWs are associated with meaningful improvements on proximal, early assessments, particularly where assessment tasks were structurally close to the kinds of representation-rich, non-routine problem solving emphasized in the workshops. In this sense, what can be inferred most directly from the quantitative evidence is a pattern of short-term academic benefit under conditions of tight alignment between workshop activity and the immediate assessment context.
At the same time, several claims remain more suggestive than conclusive on the basis of the present quantitative analyses. In particular, the weakening of effects on later outcomes should not be interpreted as definitive evidence that the workshops “stop working”; rather, it indicates that sustained or cumulative advantages were harder to detect in the measured outcomes under the evolving demands of the course and wider academic year. Moreover, because the design is observational, the estimates should be interpreted as consistent with a causal impact while still leaving room for unmeasured influences (e.g., shifting study practices, competing workload, and changes in instructional pacing) that could contribute to the observed fade-out. Accordingly, the most warranted interpretation is that MTWs can generate early performance gains, while claims about durable effects, the specific mechanisms driving attenuation, and the conditions needed for persistence require further longitudinal and process-focused investigation.
Practical recommendations follow from these findings for program designers and institutional leaders. First, alignment should be treated as an explicit design variable: departments can blueprint assessments to ensure that the higher-order competencies cultivated in workshops (interpretation, representation, justification, and non-routine reasoning) are sampled consistently across later tests and exams, rather than being concentrated early in the semester. Second, reinforcement should be built in deliberately through spacing and revisit structures (e.g., embedding “workshop-type” reasoning demands into subsequent tutorials, problem sets, and cumulative assessments), so that early gains are more likely to persist as content becomes more abstract and time pressures intensify. Third, implementation fidelity depends on institutional conditions: leaders should protect timetable space, provide suitable venues, and invest in facilitator preparation that supports productive struggle and psychological safety, since these features can shape whether students continue to use workshop-linked practices later in the year. Finally, programs should monitor participation and engagement (e.g., attendance, help-seeking, time-on-task proxies) and use this information to target encouragement and support, particularly during periods of peak assessment load when competing demands may otherwise erode sustained benefits.

Author Contributions

Conceptualization, M.M. and A.L.C.; Methodology, M.M.; Validation, M.M. and A.L.C.; Formal Analysis, M.M.; Investigation, M.M. and A.L.C.; Data Curation, M.M.; Writing—Original Draft Preparation, M.M.; Writing—Review & Editing, M.M. and A.L.C.; Supervision, A.L.C. All authors have read and agreed to the published version of the manuscript.

Funding

The research received no external funding.

Institutional Review Board Statement

The study was conducted in accordance with the Declaration of Helsinki and approved by the Ethics Committee of University of Cape Town (ethics approval code: FSREC 089-2022 on May 2022).

Informed Consent Statement

Institutional permission from University of Cape Town and approval from the Faculty of Science Research Ethics Committee were obtained to use de-identified cohort assessment data under a waiver of individual student informed consent.

Data Availability Statement

A restricted version of the data presented in this study is available to qualified researchers upon reasonable request from the corresponding author, due to the presence of identifiable information.

Acknowledgments

The authors thank the Mathematical Thinking Workshops (MTWs) design and implementation team, with special thanks to Neil Eddy (lead facilitator) and colleagues Jonathan Shock, Kate le Roux, Lizelle Niit, and Harry Wiggins for their contributions to the development and delivery of the workshops. During the preparation of this manuscript, the authors used ChatGPT-4 (OpenAI) for editing and refining the structure and language. The authors reviewed, validated, and revised all content, and take full responsibility for the final manuscript.

Conflicts of Interest

The authors declare no conflicts of interest.

Abbreviations

The following abbreviations are used in this manuscript:
APOSAction, Process, Object, Schema
ATOAverage Treatment effect in the Overlap population
ASMDAbsolute Standardized Mean Difference
ATTAverage Treatment Effect on the Treated
CLTCognitive Load Theory
HOHigher-Order (items/subscores)
MTWMathematical Thinking Workshop
SUTVAStable Unit Treatment Value Assumption
ZPDZone of Proximal Development

References

  1. Abadie, A., & Imbens, G. W. (2006). Large sample properties of matching estimators for average treatment effects. Econometrica, 74(1), 235–267. [Google Scholar] [CrossRef]
  2. Arnon, I., Cottrill, J., Dubinsky, E., Oktac, A., Roa-Fuentes, S., Trigueros, M., & Weller, K. (2014). APOS theory: A framework for research and curriculum development in mathematics education. Springer. [Google Scholar] [CrossRef]
  3. Austin, P. C. (2009). Balance diagnostics for comparing the distribution of baseline covariates between treatment groups in propensity-score matched samples. Statistics in Medicine, 28(25), 3083–3107. [Google Scholar] [CrossRef] [PubMed]
  4. Austin, P. C. (2011a). An Introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate Behavioral Research, 46(3), 399–424. [Google Scholar] [CrossRef]
  5. Austin, P. C. (2011b). Optimal caliper widths for propensity-score matching when estimating differences in means and differences in proportions in observational studies. Pharmaceutical Statistics, 10(2), 150–161. [Google Scholar] [CrossRef] [PubMed]
  6. Bailey, D. H., Duncan, G. J., Cunha, F., Foorman, B. R., & Yeager, D. S. (2020). Persistence and fade-out of educational-intervention effects: Mechanisms and potential solutions. Psychological Science in the Public Interest, 21(2), 55–97. [Google Scholar] [CrossRef] [PubMed]
  7. Bernard, T. (2015). The discursive construction of foundation programmes in South African media texts. South African Journal of Higher Education, 29(1), 238–261. [Google Scholar]
  8. Böhmer, B., & Wills, G. (2025). COVID-19 and inequality in reading outcomes in South Africa: PIRLS 2016 and 2021. Large-Scale Assessments in Education, 13, 24. [Google Scholar] [CrossRef]
  9. Casey, R., Teichman, S., & Acker, L. (2023). When the world paused for COVID-19, we finally caught our breath: Students’ schedules, stress, and the busy time of the semester. The Canadian Journal for the Scholarship of Teaching and Learning, 14(2), 3. [Google Scholar] [CrossRef]
  10. Chang, T. H., Nguyen, T. Q., Lee, Y., Jackson, J. W., & Stuart, E. A. (2022). Flexible propensity score estimation strategies for clustered data in observational studies. Statistics in Medicine, 41(25), 5016–5032. [Google Scholar] [CrossRef]
  11. Di Martino, P., Gregorio, F., & Iannone, P. (2023). The transition from school to university in mathematics in different contexts: Affective and sociocultural issues in students’ crisis. Educational Studies in Mathematics, 111(3), 79–106. [Google Scholar] [CrossRef]
  12. Dubinsky, E. (1991). Constructive aspects of reflective abstraction in advanced mathematics. In L. P. Steffe (Ed.), Epistemological foundations of mathematical experience (pp. 160–202). Springer. [Google Scholar] [CrossRef]
  13. Ellis, J., Fosdick, B. K., & Rasmussen, C. (2016). Women 1.5 times more likely to leave STEM pipeline after calculus compared to men: Lack of mathematical confidence a potential culprit. PLoS ONE, 11(7), e0157447. [Google Scholar] [CrossRef] [PubMed]
  14. Freeman, S., O’Connor, E., Parks, J. W., Cunningham, M., Hurley, D., Haak, D., Dirks, C., & Wenderoth, M. P. (2007). Prescribed active learning increases performance in introductory biology. CBE—Life Sciences Education, 6(2), 132–139. [Google Scholar] [CrossRef]
  15. Garraway, J., & Bozalek, V. (2019). Theoretical frameworks and the extended curriculum programme. Alternation, 26(2), 8–35. [Google Scholar] [CrossRef]
  16. Göloğlu Demir, C., & Kaplan Keleş, Ö. (2021). The impact of high-stakes testing on the teaching and learning processes of mathematics. Journal of Pedagogical Research, 5(2), 119–137. [Google Scholar] [CrossRef]
  17. Hay, H., & Marais, F. (2004). Bridging programmes: Gain, pain or all in vain. South African Journal of Higher Education, 18(2), 59–75. [Google Scholar] [CrossRef]
  18. Hirt, C. N., Karlen, Y., Merki, K. M., & Suter, F. (2021). What makes high achievers different from low achievers? Self-regulated learners in the context of a high-stakes academic long-term task. Learning and Individual Differences, 92, 102085. [Google Scholar] [CrossRef]
  19. Hunt, K., Ntuli, M., Rankin, N., Schöer, V., & Sebastiao, C. (2011). Comparability of NSC mathematics scores and former SC mathematics scores: How consistent is the signal across time? Education as Change, 15(1), 3–16. [Google Scholar] [CrossRef]
  20. Khoza-Shangase, K., & Kalenga, M. (2024). English additional language undergraduate students’ engagement with the academic content in their curriculum in a South African speech-language and hearing training programme. Frontiers in Education, 9, 1258358. [Google Scholar] [CrossRef]
  21. Kuh, G. D. (2008). High-impact educational practices: What they are, who has access to them, and why they matter. Association of American Colleges and Universities. Available online: https://navigate.utah.edu/_resources/documents/hips-kuh-2008.pdf? (accessed on 3 February 2026).
  22. Lawson, D. (2015). Mathematics support at the transition to university. In M. Grove, T. Croft, J. Kyle, & D. Lawson (Eds.), Transitions in undergraduate mathematics education (pp. 39–56). The Higher Education Academy. [Google Scholar]
  23. Lishchynska, M., Palmer, C., Lacey, S., & O’Connor, D. (2023). Is motivation the key? Factors impacting performance in first year service mathematics modules. European Journal of Science and Mathematics Education, 11(1), 146–166. [Google Scholar] [CrossRef]
  24. Lorås, M., & Aalberg, T. (2020, October 21–24). First year computing study behavior: Effects of educational design. 2020 IEEE Frontiers in Education Conference (FIE) (pp. 1–9), Uppsala, Sweden. [Google Scholar] [CrossRef]
  25. Mabizela, S. E., & George, A. (2020). Predictive validity of the national benchmark test and national senior certificate for the academic success of first-year medical students at one South African university. BMC Medical Education, 20, 192. [Google Scholar] [CrossRef]
  26. Mokhithi, M., Campbell, A. L., Shock, J. P., & Padayachee, P. (2025). ‘I call it math therapy’: Student narratives of growth, belonging and confidence in mathematical thinking workshops. International Journal of Mathematical Education in Science and Technology, 56(12), 2353–2378. [Google Scholar] [CrossRef]
  27. Mokhithi, M., Campbell, A. L., Shock, J. P., & Padayachee, P. (2026). From Theory to Practice, and Back: Student Evidence Testing ZPD, APOS, CLT, and Constructivism in Mathematical Thinking Workshops. Education Sciences, 16(3), 385. [Google Scholar] [CrossRef]
  28. Mosia, M., Egara, F. O., Nannim, F. A., & Basitere, M. (2025). Factors influencing students’ performance in university mathematics courses: A structural equation modelling approach. Education Sciences, 15(2), 188. [Google Scholar] [CrossRef]
  29. Msomi, A., & Rzyankina, E. (2024). Bridging gaps: Enhancing holistic mathematics support in the transition from secondary school to university. Journal of Student Affairs in Africa, 12(2), 51–70. [Google Scholar] [CrossRef]
  30. Ni, Y., Zhou, D. H. R., Cai, J., Li, X., Li, Q., & Sun, I. X. (2017). Improving cognitive and affective learning outcomes of students through mathematics instructional tasks of high cognitive demand. The Journal of Educational Research, 111(6), 704–719. [Google Scholar] [CrossRef]
  31. Nkonki, V. J. J., Dondolo, V., & Mabece, K. (2023). The confluence of supplemental instruction (SI) programme factors on selected student outcomes in a historically disadvantaged university. Education Sciences, 13(11), 1145. [Google Scholar] [CrossRef]
  32. Olivier, C., & Burton, C. (2020). A large-group peer mentoring programme in an under-resourced higher education environment. International Journal of Mentoring and Coaching in Education, 9(4), 341–356. [Google Scholar] [CrossRef]
  33. Onyishi, C. N., & Sefotho, M. M. (2021). Differentiating instruction for learners’ mathematics self-efficacy in inclusive classrooms: Can learners with dyscalculia also benefit? South African Journal of Education, 41(4), 1938. [Google Scholar] [CrossRef]
  34. Orihara, S., Amamoto, Y., & Taguri, M. (2024). Simple and robust estimation of average treatment effects for the overlap population using model averaging. Biostatistics & Epidemiology, 8(1), e2378662. [Google Scholar] [CrossRef]
  35. Parrish, C. W., & Bryd, K. O. (2022). Cognitively demanding tasks: Supporting students and teachers during engagement and implementation. International Electronic Journal of Mathematics Education, 17(1), em0671. [Google Scholar] [CrossRef] [PubMed]
  36. Rosenbaum, P. R. (1987). Sensitivity analysis for certain permutation inferences in matched observational studies. Biometrika, 74(1), 13–26. [Google Scholar] [CrossRef]
  37. Rosenbaum, P. R. (2005). Sensitivity analysis in observational studies. In B. S. Everitt, & D. C. Howell (Eds.), Encyclopedia of statistics in behavioral science. Wiley. [Google Scholar] [CrossRef]
  38. Rosenbaum, P. R. (2010). Design sensitivity and efficiency in observational studies. Journal of the American Statistical Association, 105(490), 692–702. [Google Scholar] [CrossRef]
  39. Rosenbaum, P. R. (2012). Testing one hypothesis twice in observational studies. Biometrika, 99(4), 763–774. [Google Scholar] [CrossRef]
  40. Rosenbaum, P. R., & Rubin, D. B. (1983). The central role of the propensity score in observational studies for causal effects. Biometrika, 70(1), 41–55. [Google Scholar] [CrossRef]
  41. Rylands, L. J., & Coady, C. (2009). Performance of students with weak mathematics in first-year mathematics and science. International Journal of Mathematical Education in Science and Technology, 40(6), 741–753. [Google Scholar] [CrossRef]
  42. Saglam, M. H., & Goktenturk, T. (2024). Mathematically high and low performances tell us different stories: Uncovering motivation-related factors via the ecological model. Learning and Individual Differences, 114, 102513. [Google Scholar] [CrossRef]
  43. Schoer, V., Ntuli, M., Rankin, N., Sebastiao, C., & Hunt, K. (2010). A blurred signal? The usefulness of National Senior Certificate (NSC) Mathematics marks as predictors of academic performance at university level. Perspectives in Education, 28(2), 9–18. [Google Scholar]
  44. Selden, A. (2012). Transitions and proof and proving at tertiary level. In G. Hanna, & M. de Villiers (Eds.), Proof and proving in mathematics education (Vol. 15). New ICMI Study Series. Springer. [Google Scholar] [CrossRef]
  45. Seymour, E., & Hunter, A. B. (2019). Talking about leaving revisited: Persistence, relocation, and loss in undergraduate stem education. Springer. [Google Scholar]
  46. Shay, S., Collier-Reed, B., Hendry, J., Marquard, S., Kefale, K., Prince, R., Steyn, S., Mpofu-Mketwa, T., & Carstens, R. (2020). From gatekeepers to gateways: Courses impeding graduation annual report 2019. University of Cape Town. Available online: http://hdl.handle.net/11427/35360 (accessed on 29 November 2025).
  47. Smith, M. S., Bill, V., & Hughes, E. K. (2008). Thinking through a lesson: Successfully implementing high-level tasks. Mathematics Teaching in the Middle School, 14(3), 132–138. [Google Scholar] [CrossRef]
  48. Solomon, Y. (2007). Not belonging? What makes a functional learner identity in undergraduate mathematics? Studies in Higher Education, 32(1), 79–96. [Google Scholar] [CrossRef]
  49. Stuart, E. A. (2010). Matching methods for causal inference: A review and a look forward. Statistical Science, 25(1), 1–21. [Google Scholar] [CrossRef]
  50. Stuart, E. A., Lee, B. K., & Leacy, F. P. (2013). Prognostic score–based balance measures can be a useful diagnostic for propensity score methods in comparative effectiveness research. Journal of Clinical Epidemiology, 66(8), S84–S90. [Google Scholar] [CrossRef]
  51. Sweller, J. (1988). Cognitive load during problem solving: Effects on learning. Cognitive Science, 12(2), 257–285. [Google Scholar] [CrossRef]
  52. Treisman, U. (1992). Studying students studying calculus: A look at the lives of minority mathematics students in college. The College Mathematics Journal, 23(5), 362–372. [Google Scholar] [CrossRef]
  53. Vygotsky, L. S. (1980). Mind in society: The development of higher psychological processes (M. Cole, V. John-Steiner, S. Scribner, & E. Souberman, Eds.). Harvard University Press. [Google Scholar] [CrossRef]
  54. Walsh, R., & Guerin, A. (2025). Student approaches to in-person and online engagement with mathematics learning support post-COVID-19. Teaching Mathematics and Its Applications: An International Journal of the IMA, hraf011. [Google Scholar] [CrossRef]
  55. Wan, S., Bond, T. N., Lang, K., Clements, D. H., Sarama, J., & Bailey, D. H. (2021). Is intervention fadeout a scaling artefact? Economics of Education Review, 82, 102090. [Google Scholar] [CrossRef]
  56. Wood, D., Bruner, J. S., & Ross, G. (1976). The role of tutoring in problem solving. Journal of Child Psychology and Psychiatry, 17(2), 89–100. [Google Scholar] [CrossRef] [PubMed]
  57. Zakharov, A., & Carnoy, M. (2021). Does teaching to the test improve student learning? International Journal of Educational Development, 84, 102422. [Google Scholar] [CrossRef]
Table 1. Absolute Standardized Mean Difference (ASMD) for covariates.
Table 1. Absolute Standardized Mean Difference (ASMD) for covariates.
Covariate
(Average Scores)
Pre-
Matching
Treatment
Mean (n = 31)
Pre-
Matching
Control
Mean (n = 121)
ASMD (Pre-
Matching)
Post-
Matching
Treatment
Mean (n = 23)
Post-
Matching
Control
Mean (n = 17)
ASMD (Post-
Matching
School-leaving
Mathematics
84.3590.421.01685.6185.390.051
School-leaving English76.0081.000.71977.7477.960.054
Standardized preparedness Mathematics64.8177.780.92567.8366.000.115
Standardized preparedness Quantitative Literacy55.6573.841.00161.1361.260.031
Table 2. ATT estimates by outcome with cluster-robust Standardized Errors (SEs).
Table 2. ATT estimates by outcome with cluster-robust Standardized Errors (SEs).
OutcomeATT (SE),
t-Statistic
p-Value95% Confidence IntervalHedges’ g,
Glass’s Delta C
p (Holm)
Test 1F21.71 (3.78),
5.74
3.047 × 10−5[13.69, 29.73]1.81,
2.44
1.219 × 10−4
Test 2F26.75 (5.79),
4.62
2.832 × 10−4[14.48, 39.03]1.49,
1.69
8.497 × 10−4
Exam F0.91 (4.99),
0.18
0.857[−9.69, 11.52]0.06,
0.06
0.8575
Final F2.2 (3.76),
0.58
0.567[−5.78, 10.18]0.18,
0.21
0.5892
Test 1S6.39 (5.07),
1.26
0.226[−4.36, 17.05]0.40,
0.54
0.2270
Test 2S9.26 (5.53),
1.67
0.114[−2.46, 20.99]0.57,
0.70
0.2270
Exam S4.91 (5.38),
0.91
0.375[−6.50, 16.33]0.28,
0.35
0.7500
Final S4.93 (4.55),
1.08
0.295[−4.71, 14.57]0.34,
0.44
0.5892
Note: ATT units are percentage-point differences; effect sizes are reported as Hedges’ g and Glass’ delta C. F is for first semester, and S is for second semester. Final F and Final S are end-of-semester final marks for each semester, combining the two class tests and the exam. The final mark is 0.3 × Class Record + 0.7 × Exam or 0.4 × Class Record + 0.6 × Exam, whichever is higher, individually calculated. The class record is the average of two semester class tests and the WebAssign score. WebAssigns are online weekly homework exercises.
Table 3. Rosenbaum Sensitivity (Wilcoxon Signed-Rank) on matched pairs.
Table 3. Rosenbaum Sensitivity (Wilcoxon Signed-Rank) on matched pairs.
S (Wilcoxon T+)p at Γ = 1Γ* at p = 0.05
Test 1F265.55.3 × 10−54.20
Test 2F237.01.67 × 10−43.35
Exam F150.50.35181.00
Final F160.00.25171.00
Test 1S176.00.12391.00
Test 2S193.00.01541.30
Exam S171.00.15771.00
Final S175.00.13021.00
Note: In this framework, Γ ≥ 1 indexes the magnitude of possible hidden bias: Γ = 1 corresponds to a randomized experiment (no unmeasured bias), while Γ > 1 allows matched students to differ in their odds of treatment by up to a factor of Γ because of unobserved covariates, Γ* is the smallest level of hidden bias at which the treatment effect would no longer be statistically significant at p = 0.05 (Rosenbaum, 2005).
Table 4. Workshop effects by pre-university mathematics achievement.
Table 4. Workshop effects by pre-university mathematics achievement.
Band 70–84%
Effects [SE]
(pHolm)
Band 85–100%
Effects [SE]
(pHolm)
p
(Interaction)
Test 1F22.23 [3.437]
(pHolm = 1.97 × 10−10)
19.19 [5.05]
(pHolm = 1.46 × 10−4)
0.618
Test 2F31.63 [6.96]
(pHolm = 5.50 × 10−6)
25.42 [5.42]
(pHolm = 5.38 × 10−6)
0.482
Exam F2.66 [5.54]
(pHolm = 1.00)
−1.86 [4.32]
(pHolm = 1.00)
0.521
Final F3.33 [4.66]
(pHolm = 0.95)
−0.57 [3.36]
(pHolm = 0.95)
0.498
Test 1S0.59 [8.05]
(pHolm = 0.94)
7.66 [4.59]
(pHolm = 0.19)
0.455
Test 2S5.63 [5.61]
(pHolm = 0.63)
4.02 [6.29]
(pHolm = 0.63)
0.849
Exam S5.64 [8.12]
(pHolm = 0.97)
−2.96 [6.10]
(pHolm = 0.97)
0.397
Final S4.56 [6.69]
(pHolm = 0.99)
−0.27 [5.10]
(pHolm = 0.99)
0.567
Table 5. Average Treatment Effects on the Treated (ATT) for Higher-Order (HO) items.
Table 5. Average Treatment Effects on the Treated (ATT) for Higher-Order (HO) items.
ATT (SE
Clustered),
t-Statistic
p-Value95%
Confidence
Interval
Hedges’ g,
Glass Delta C
p-Adjusted Holm
Test 1F2.536 (6.774),
0.374
0.712−11.51,
16.59
0.1091,
0.1115
1.000
Test 2F13.527 (5.992),
2.258
0.0341.10,
25.95
0.7440,
0.8065
0.205
Exam F2.341 (5.893),
0.397
0.695−9.88,
14.56
0.1301,
0.1320
1.000
Test 1S7.880 (7.737),
1.019
0.319−8.164,
23.93
0.3294,
0.3505
1.000
Test 2S8.184 (5.699),
1.436
0.165−3.64,
20.00
0.3539,
0.3226
0.825
Exam S4.537 (5.976),
0.759
0.456−7.86,
16.93
0.219,
0.2454
1.000
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Mokhithi, M.; Campbell, A.L. Early Gains, Fading Effects: A Quasi-Experimental Evaluation of Mathematical Thinking Workshops for the School-to-University Mathematics Transition in South Africa. Educ. Sci. 2026, 16, 378. https://doi.org/10.3390/educsci16030378

AMA Style

Mokhithi M, Campbell AL. Early Gains, Fading Effects: A Quasi-Experimental Evaluation of Mathematical Thinking Workshops for the School-to-University Mathematics Transition in South Africa. Education Sciences. 2026; 16(3):378. https://doi.org/10.3390/educsci16030378

Chicago/Turabian Style

Mokhithi, Mashudu, and Anita Lee Campbell. 2026. "Early Gains, Fading Effects: A Quasi-Experimental Evaluation of Mathematical Thinking Workshops for the School-to-University Mathematics Transition in South Africa" Education Sciences 16, no. 3: 378. https://doi.org/10.3390/educsci16030378

APA Style

Mokhithi, M., & Campbell, A. L. (2026). Early Gains, Fading Effects: A Quasi-Experimental Evaluation of Mathematical Thinking Workshops for the School-to-University Mathematics Transition in South Africa. Education Sciences, 16(3), 378. https://doi.org/10.3390/educsci16030378

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop