3.2. Afternoon Tests
For each test, we report the fraction of correct answers given by participants in the two groups. Figure 7
shows the performance of each individual (small yellow dots for control subjects, big blue dots for experimental subjects) as well as the average performance (light yellow solid line for the control group, light blue solid line for the experimental group). The shaded regions indicate the region determined by mean ± SEM. The differences in the performance of the two groups are significant for all stimuli conditions (MW-U test, WS: p
; IS: p
; GD: p
= 0.004; GW: p
= 0.006; GI: p
It is clear from this figure that participants in the experimental group showed a beneficial effect of being exposed to the mnemonic strategies in the seminar: indeed, for every test, we find many subjects at ceiling in the experimental group: 45% for word sequences, 68% for image sequences, 64% for the digits and word grids, and 68% again for image grids, respectively, with 62% overall. In contrast, fewer control subjects were at ceiling, in each test: none for word sequences, 14% for image sequences, and 28% for the three grid tests, for an average of 20% overall. Note that almost half of the experimental group is at ceiling in the word sequence test, and no one in the control group. In general, this first test is seen to be the most challenging for both groups, and the one with the largest difference between them.
shows the group-averaged fraction of correct answers, in each test, when the items are classed in quadruplets depending on their presentation position/order. Each test was in fact designed with a total of 16 items, also to facilitate this analysis. For the two sequence tests (left panel), the four quadruplets were presented successively, while for the three grid tests (right panel), they were presented simultaneously, in the top and bottom, left and right quadrants of the grid.
We expect primacy and recency effects, possibly suppressed by close-to-ceiling performance in the experimental group. Moreover, we expect to see primacy and recency in the sequences, but less so in the grids, where an ordering is not univocal and different participants could be intuitively sectioning the grid in different ways (e.g., by columns, by rows, center versus periphery…), thus the effect would be washed out in the average.
The control group had a significantly better performance on the first and last quadruplet of sequentially presented stimuli (MW-U test, p(1&4 vs 2&3, SW&SI) = 0.0016), indicating robust primacy and recency effects. When analyzed in more detail, the effect was salient with words p(1&4 vs 2&3, SW) = 0.003, but only marginal with images, p(1&4 vs 2&3, SI) = 0.058, and this was largely due to differences in performance indicating primacy, in particular between the first and third quadruplets (p(1vs2, SW) = 0.009 and p(1vs3, SW) = 0.003 with words, while p(1vs2, SI) = n.s. and p(1vs3, SI) = 0.01 with images; all other quadruplet comparisons were not significant).
For grids, the primacy and recency effects in the control group are not prominent (MW-U test, p(1&4 vs 2&3, GD&GW&GI) = 0.48). Also, when considering them separately, in the different experimental conditions, significancy levels are not reached in any of the considered experiments and there is only a weak trend for the digits (MW-U test, p(1&4 vs 2&3, GD) = 0.14, p(1&4 vs 2&3, GW) = 0.39, p(1&4 vs 2&3, GI) = 0.62).
Such primacy and recency effects were suppressed, presumably by ceiling effects, in the experimental group, where the only (weak) surviving significant difference was between the first and third quadruplet in the sequential test with words, where p(1vs3, SW) = 0.04.
There were no clear trends when the items were ordered in quadruplets according to the retrieval sequence.
We do not try to quantify these effects further, as their main feature, evident in the figure, is that they are strongly suppressed by the ceiling after mnemonic training.
The distribution across participants of the number of errors in each test is plotted in Figure 9
. The distributions for control and experimental subjects are well separated, irrespective of the test, suggesting a generalized benefit of the training seminar, which is not circumscribed to only some of the tests, nor to only some of the participants.
We also analyzed, for all tests except the one with digits, how the errors were distributed between items that were among the 16, but in a different position, and other items in the pool of 48, but not among the 16.
Across tests, retrieval errors were less frequent than position errors, ranging between 15% and 42% of total errors, once averaged for each test and subject group. We did not find any significant difference in this proportion between the tests based on sequences and those based on grids, suggesting that the simultaneous arrangement does not make it much easier to remember the relative positions, as one might expect. In detail, for experimental subjects, averaging between the W and I tests, fraction(retr_err(S)) = 0.2, fraction(retr_err(G)) = 0.3, MW-U test p = 0.44, while for control subjects f(retr_err(S)) = 0.25, f(retr_err(G)) = 0.36, MW-U test p = 0.27.
This pattern fits with the facts that (i) all words were highly familiar, whereas those particular images were (presumably) new to the subjects; and (ii) there may be some interference between remembering items in the target grid and seeing them in the pool, also a grid, which may facilitate the false memory for items not in the target grid. Figure 10
details these proportions.