3. Results
All data and analysis scripts can be found on the paper’s OSF page (
https://osf.io/cnr2a/, accessed on 11 May 2021). We report results on the rank-order stability, mean-level change and individual differences in change for each of the three main diffusion model model parameters (
,
a,
). For all these analyses, we used Bayesian methods to obtain our results. We also conducted all analyses using a frequentist,
p-value based approach. This did not alter the interpretation of our findings. Finally, we report findings on the profile stability of the three parameters across time.
Table 1 shows the descriptive statistics of the individual posterior medians for the three diffusion model parameters for each of the four time points across the entire sample.
Table A2,
Table A3,
Table A4,
Table A5,
Table A6 in the
Appendix A contain the corresponding information, split up for each of the five sub-groups.
3.1. Rank-Order Stability
Table 2 shows the rank-order stability estimates of the diffusion model parameters for the entire sample. We report Bayesian correlation estimates, using a uniform prior for the correlation (see
Table A1) and individual posterior medians as variables. Rank-order stability was high for drift rates (
; all
) across the entire time span, with correlations getting slightly smaller for larger time periods (e.g.,
from T1 to T2, but only
from T1 to T4). We found the same pattern for boundary separation (
a): Rank-order stability was high (all
), with correlations getting slightly smaller across larger time periods (e.g.,
from T2 to T3, but only
from T1 to T3). For non-decision times (
), stability was again high (all
) across the entire time span, with correlations once more getting smaller for larger time periods (e.g.,
from T2 to T3, but only
from T1 to T4). All correlations showed Bayes factors >999 when compared to a null-model.
Table A7,
Table A8,
Table A9 show the estimates of rank-order stability separately for the three diffusion model parameters and split up across the five sub-groups studied. Generally, the interpretation of the pattern of results did not differ across groups, although within-group correlations often were slightly smaller than correlations for the total sample. Especially due to the smaller samples sizes, Bayes factor were also sometimes lower, for example, as low as
for the correlation of drift rates at T2 to the ones at T4 in Group 3 (
,
).
3.2. Mean Level Change and Individual Differences in Change
Figure 5 shows the group-level posterior distributions (i.e., across participants) for the three diffusion model parameters across the four time points. As can be seen, drift rates seem to rise after T1 (with the corresponding 95% highest density interval (HDI) showing no overlap with those of the other time points) and to a lesser degree also after T2 and T3. The pattern reverses for the boundary separation parameter, with a decline from T1 to the later time points. For non-decision times, no clear pattern of mean level change is evident. It should be noted that the group-level posterior distributions are not equivalent to the means of individual parameter posterior medians, due to the hierarchical modelling approach and due to the exclusion of individual parameter estimates with non-converged traces. However, the general pattern of results was the same for both group-level posteriors and means of individual posterior medians.
Table 3 shows the parameter estimates and fit indices for the Bayesian growth curve model of drift rates. The latent intercept and latent slope exhibited only a very weak estimated correlation, indicating that drift rates at T1 did not relate to the developmental patterns of drift rates. As the 95% CI of the covariance between intercept and slope included zero, we fixed this parameter to zero to help model convergence. All estimated parameters had effective sample sizes >5000 and
values below
, indicating that the chains had converged. Furthermore, model fit was good according to the mean Bayesian GammaHat estimate >0.99 and the mean Bayesian CFI estimate >0.99.
Latent slope loadings at T3 and T4 were estimated as and . Both the mean level (intercept) of the latent intercept parameter and of the latent slope parameter were estimated as positive and their 95% credibility intervals (CIs) did not include zero. This indicates that drift rates were generally positive at T1 (as would be expected) and tended to increase over time. The latent intercept showed considerable variance, indicating that people differed in their speed of information accumulation at T1. The latent slope parameter also indicated variance, meaning that people differed in their developmental patterns of drift rates across time—the 95% CI did not include zero.
Table 4 shows the parameter estimates and fit indices for the Bayesian growth curve model of boundary separations. The latent intercept and and latent slope exhibited only a very weak estimated correlation, indicating that boundary separation at T1 did not relate to the developmental patterns of boundary separation. As the 95% CI of the covariance between intercept and slope included zero, we fixed this parameter to zero to help model convergence. As the variance of the slope factor was also estimated to be zero and the model showed divergent transitions when estimating it, we also fixed this parameter. All estimated parameters had effective sample sizes >5000 and
values below
, indicating that the chains had converged. Model fit was good, with the mean Bayesian GammaHat estimate >0.99 and the mean Bayesian CFI estimate >0.99.
Latent slope loadings at T3 and T4 were estimated as and . The mean level (intercept) of the latent intercept parameter was estimated as positive, while the mean level (intercept) of the latent slope parameter was estimated as negative. Both their 95% CIs did not include zero. This indicates that boundary separations were generally positive at T1 (as would be expected) and tended to decrease over time. The latent intercept showed considerable variance, indicating that people differed in their decision criteria at T1. As was already mentioned, the latent slope parameter was estimated and then fixed to be zero.
Table 5 shows the parameter estimates and fit indices for the Bayesian growth curve model of non-decision times. Latent intercept and latent slope showed a very low estimated correlation, indicating that non-decision time at T1 did not relate to the developmental patterns of non-decision times. As the 95% CI of the covariance between intercept and slope included zero, we fixed this parameter to zero to help model convergence. As the variance of the slope factor was also estimated to be zero and the model showed divergent transitions when estimating it, we also fixed this parameter.
All estimated parameters had effective sample sizes >5000 and values below , indicating that the chains had converged. Model fit was good, with the mean Bayesian GammaHat estimate >0.97 and the mean Bayesian CFI estimate >0.98.
Latent slope loadings showed an unclear pattern, with loadings at T3 and T4 estimated as −0.358 and . The mean level (intercept) of the latent intercept parameter was estimated as positive, while the mean level (intercept) of the latent slope parameter was estimated as negative. Both their 95% CIs did not include zero. This indicates that non-decision times were generally positive at T1 (as would be expected). Given the unclear pattern of loadings on the slope factor, no clear interpretation of the negative intercept of the latent slope factor emerged. The latent intercept showed considerable variance, indicating that people differed in their non-decision time at T1. As was already mentioned, the latent slope parameter was estimated and then fixed to be zero.
In summary, we found notable individual differences in growth curve model intercepts for drift rates, boundary separations, and non-decision times. Regarding growth curve model slopes (i.e., rates of change), we only found individual differences for drift rates, but not for boundary separations or non-decision times.
3.3. Profile Stability
We estimated
q correlations of the
z-standardized individual posterior medians for the three diffusion model parameters across all possible combinations of time points (T1 with T2/T3/T4, T2 with T3/T4, T3 with T4).
Table 6 shows the means, standard deviations, and medians across participants. Profile stability was generally high, with all median
q correlations >0.85. However, there was also considerable variance in correlations across participants (all
SDs
), with lower mean correlations than median correlations.
Figure 6 shows density plots of the individual
q correlations for all six periods. As can be seen, a large part of the densities lies close to
, but there are also much lower coefficients of stability and also participants showing negative
q correlations.
4. Discussion
In this article, we studied stability and change of cognitive processes as measured by the three main diffusion model parameters-processing speed (i.e., drift rates), decision caution (i.e., boundary separations), and speed of encoding and motor response (i.e., non-decision times), using four different indices of stability and development. To our knowledge, this is the first study to analyse diffusion model parameters (i) over such a long time period, (ii) across more than two time points, and (iii) in such a large, heterogeneous sample ( at Time 1). Moreover, our main statistical analyses relied on modern Bayesian estimation methods which offer multiple advantages compared to traditional methods. Overall, our analyses aimed to investigate whether the cognitive constructs encoded by diffusion model parameters exhibit a measurable trait-like nature. In the following, we briefly summarize the gist of our results.
Regarding rank-order stability, we found robust temporal stability of the main diffusion model parameters. Generally speaking, temporal correlations were high for all three parameters. This held true even when the entire period of the study (i.e., two years) was considered. The correlations we found were in many cases markedly higher than those previously reported in the literature (
Lerche and Voss 2017;
Schubert et al. 2016;
Yap et al. 2012). Especially for non-decision times, previous studies had sometimes found rank-order stability to be low (
across one week in
Lerche and Voss 2017). In contrast, our results indicate that non-decision times show even higher correlations across long time periods (
) than drift rates. This finding is worth discussing, since drift rates have so far been considered as the most “trait-like” parameters of the diffusion model (
Schubert et al. 2016).
The latter difference might be attributable to several features of our study. First, in contrast to previous studies, we employed Bayesian hierarchical diffusion model estimation methods that in the past have been found to provide more robust results in correlational studies (
Ratcliff and Childers 2015;
Wiecki et al. 2013). Bayesian methods incorporate prior knowledge on probable parameter values. Hierarchical Bayesian methods make use of shrinkage of the individual parameter estimates towards the group-level posteriors, balancing out extreme individual parameter estimates that might reflect noise in the data (
Kruschke 2015).
Second, we used a comparatively large number of response times for each participant at each time point (600 trials), which necessarily leads to more precise estimates. Finally, our sample included a large number of participants and exhibited a greater heterogeneity, especially in relation to age. The variance of parameter estimates might account for the higher correlations. However, it must be noted that correlations remained strong-though sometimes notably lower or even within sub-groups as small as around 20 participants (see
Appendix A). Thus, the present results cannot be attributed solely to sample size and sample heterogeneity. In the end, our estimates of (correct) non-decision times might be more reliable than the ones reported in previous studies, while boundary separation values might have already been estimated very reliably there. Conversely, drift rates might not show greater stability than in previous studies because of the specific content of the task: differences in drift rates also reflect differences in implicit personality, as their developmental patterns were the original focus of the study.
When looking at the raw data, rank-order stabilities of mean accuracies and median correct and error response times are also quite high (
r posterior means between
and
, see
Table A10), which speaks in favour of the assumption that our high number of trials per person enables us to obtain reliable parameter estimates. At the same time, it is interesting to note that the stabilities of the diffusion model parameters might jointly contribute to the very high across-time stability of the raw data summary statistics.
Regarding mean-level stability and change, we found evidence for systematic changes in both drift rates and boundary separations. Group-level drift rates increased from the first time point to the second time point six months later. The pattern of increase continued throughout the next two time points, but the posterior distributions showed much overlap there. The increase in drift rates might be interpreted as a practice effect. People tended to process the information needed to solve the IAT tasks more efficiently after they had completed the first time point. Conversely, group-level boundary separations decreased from the first to the second time point and to a lesser degree (once more marked by overlap in the posteriors) thereafter. That is, people tended to apply more liberal decision criteria and gathered less information until they made their decisions in the second to fourth time points. We suppose that participants reduce their decision caution at later time points mainly in response to the increased drift rate: that is, participants notice that they may lower their response criteria without deteriorating accuracy. Additionally, a decrease in accuracy motivation over time might also contribute to the reduction of decision caution.
In the literature on the diffusion model, practice effects in the form of increasing drift rates and decreasing boundary separations (but sometimes also non-decision times and shifting starting points) have repeatedly been reported (
Dutilh et al. 2009,
2011;
Evans and Brown 2017;
Lerche and Voss 2017;
Petrov et al. 2011). However, none of these previous studies focused on training effects across such long time periods as in our study, but investigated primarily within-session training effects. It is interesting to note that training effects seem to be stable over months.
Evans and Brown (
2017) found that people often first adopt non-optimal decision criteria when working on a new task, that is, they are overly cautious and try to avoid mistakes, as is mirrored in high boundary separation in the diffusion model. Having practiced the task many times, people then adapt more lenient decision criteria that are closer to the optimum. Thus, a possible interpretation of our results states that people tend to keep the more lenient decision criterion when returning to the task months or even a year later.
Finally, we did not find systematic changes in non-decision times. Group-level posterior distributions remained roughly the same across the two year time period studied. This is in contrast to the results found in earlier studies on training effects that sometimes found decreasing non-decision times (
Dutilh et al. 2009,
2011). Task-specific aspects of the IAT might be responsible for our findings. For instance,
Dutilh et al. (
2011) found that the effects on non-decision times were partly task-specific as well as item-specific.
Regarding inter-individual differences in intra-individual change, our growth curve models indicate that inter-individual differences are mainly based on across-time intercepts: We found substantial variance in the latent intercepts of all three diffusion model parameters. For boundary separation and non-decision times, people varied in their intercepts (which contribute equally to all time points) but not in their slope parameters, which reflect the rate of change across time. The slope parameter for boundary separation showed a negative trend; this means that the decrease in boundary separation, that is, the use of more liberal decision criteria, is close to universal in our data. As the estimated slope factor loadings in the non-decision time model mirror the unclear and mostly stable group-level trends found for this parameter, the slope factor is hard to interpret. In any case, its variance was estimated to be zero. The slope factor in the drift rate growth curve model was the only slope factor to show substantial inter-individual differences.
Thus, people seem to differ in the ways they profit from training effects in terms of task-related information processing. In post-hoc analyses, we regressed the slope factor on age and found a clear and strong positive correlation. This means that older people tended to increase their drift rates more than their younger counterparts. As older adults did not show lower mean level drift rates (
Ratcliff et al. 2004;
Schubert et al. 2020;
von Krause et al. 2020), this implies that they generally profited more from practice. Of course, these post-hoc analyses must be interpreted cautiously and warrant further developmental research. To sum up, people tended to show great inter-individual differences in their overall levels of drift rates, boundary separations and non-decisions time, but differed little in their developmental patterns, with the exception of drift rates. It would be interesting to follow up on these results in a longitudinal study with a stronger focus on training effects, as these were only of periphery interest here.
Regarding profile stability, the estimated q correlations were strongly positive across time in the majority of cases, but not in all. We also found a considerable across-participant variance in correlations, with some people showing q values close to zero or even negative. Correlations tended to get lower across larger periods of time. The profiles comprising the relative strengths of drift rate, boundary separation and non-decision might be seen a configuration of process components that together lead to certain empirical response time distributions and accuracy rates. For example, the same accuracy data could be the results of high drift rates and low boundary separation, and vice-versa. In a similar way, some people might show low boundary separation in combination with high drift rates, others in combination with low drift rates. It seems that, for most participants in the study, this parameter configuration remained very much the same across time.
All in all, we found that the three main diffusion model parameters are broadly consistent across time, thus fulfilling a central prerequisite of being identified as traits. This is particularly interesting as the diffusion model can be applied to a large range of binary decision tasks (not just from the cognitive domain). Our results reveal positive change in drift rates and negative change in boundary separation, but little individual differences in change, with the exception of drift rates. Profiles of the three parameters were also quite stable.
4.1. Limitations
While our study has a number of unique features, for instance, the distinction between the four forms of stability and change, the four time points over a period of two years, and the relatively large sample size, it also has some limitations. First, the variety of tasks was rather restricted. While we used five different IATs and combined them to obtain task-general parameter estimates, we did not use any other tasks. It is known that diffusion model parameters obtained in different tasks sometimes show only weak correlations among each another (
Lerche et al. 2020;
Ratcliff et al. 2010;
Schubert et al. 2016). Thus, some of the results presented here might be specific to the tasks studied.
Second, it must be noted that the posterior predictive checks did not perfectly recover the error response time distributions. Several different factors might contribute to this. First of all, due to the small number of errors, the empirical quantiles are numerically unstable and thus may not be a good representation of the actual (latent) distribution. Additionally, due to the low number of error responses per person, the group-level parameter of error non-decision times greatly influenced the estimates of individual error non-decision times (because of hierarchical shrinkage). This means that individual deviations in error non-decision times might sometimes have been underestimated. In turn, this might have led to a situation where our approach of modelling error response times with a separate non-decision time parameter was less successful among the very slow errors. Nevertheless, as the focus of this paper is on the psychometric properties and developmental patterns of diffusion model parameters, the relative misfit of this small proportion of trials is of secondary importance.
Finally, there are alternative plausible ways to analyse the present data within a purely Bayesian framework. Intuitively, the most straightforward way to approach the question would have been to formulate and fit a full hierarchical model with time included as an additional level. However, despite being intuitive from a Bayesian lens, such an approach involves an enormous computational cost due to the large number of posteriors that need to be estimated simultaneously. In fact, estimating the full hierarchical model turned out to be practically infeasible using the available computational software. Thus, our two-step approach using posterior medians as summary statistics might underestimate the epistemic uncertainty around parameter estimates. However, we deem our approach a reasonable trade-off, since it incorporates more information than frequentist approaches used in most of the diffusion model literature. Further, it also utilizes hierarchical shrinkage within each time point, thereby rendering point and uncertainty estimates more robust than a non-hierarchical approach.
4.2. Conclusions
We examined four different forms of stability and change in the three main diffusion model parameters: drift rate, boundary separation, and non-decision time. Our main aim was to study whether and in which way the assumption of temporal stability that is inherent in the interpretation of model-parameters-as-traits holds. Across a time period of up to two years, all three diffusion model parameters showed strong rank-order stability. Group-level drift rates tended to increase, whereas group-level boundary separations decreased and group-level non-decision times exhibited no clear change. These findings could be interpreted as practice effects, which is remarkable given the long time intervals between the sessions (up to one year). People differed from one another in their base rates of all three main diffusion model parameters (intercepts in the growth curve models), but only drift rates showed inter-individual differences in change across time (slopes). Profiles of the three parameters mostly stayed stable across time, but some participants showed strong deviations from this pattern. We believe our study makes a strong case for the—with regard to temporal aspects—trait-like qualities of the three core diffusion model parameters. In the light of our results, the use of diffusion model parameters in individual differences research seems warranted and promising.