Response Mixture Modeling of Intraindividual Differences in Responses and Response Times to the Hungarian WISC-IV Block Design Test

Response times may constitute an important additional source of information about cognitive ability as it enables to distinguishing between different intraindividual response processes. In this paper, we present a method to disentangle interindividual variation from intraindividual variation in the responses and response times of 978 subjects to the 14 items of the Hungarian WISC-IV Block Design test. It is found that faster and slower responses differ in their measurement properties suggesting that there are intraindivual differences in the response processes adopted by the subjects.


Introduction
The cognitive strategies or cognitive processes that underlie problem solving have been well studied for various intelligence tests. For instance, in the Raven test [1], it was shown that subjects use an incremental, reiterative strategy for encoding and inducing regularities in each problem. Some, but not all, subjects use an abstract induction strategy and/or a dynamic working memory process (e.g., [2]). Inferences about processes underlying problem solving have mainly been based on behavioral data like verbal protocols, eye tracking, and direct observations. Interestingly, such inferences have all been based on interindividual differences. In the present study, we will use the quantitative response times to reveal possible intraindividual differences in the cognitive processes adopted to the block design test.
The idea that response time is an important variable in intelligence testing dates back to Francis Galton [3,4] who assessed intelligence using the time that subjects needed to respond to a stimulus. Due to the lack of methods to accurately assess response time and due to the lack of statistical methods to analyze the data, this idea did not receive a lot of attention at that time (see e.g., [5]). However, nowadays, it is generally acknowledged that there is a clear interrelation between response time and intelligence (e.g., [6], chapter 11). In Reference [7] (see also [8]), it is shown that this interrelation can be well explained by the diffusion model [9,10], a sequential sampling model that assumes that differences in responses and response times arise because of differences in an underlying cognitive information accumulation process.
As the response times seem to incorporate important information about the underlying process that resulted in the responses, it has been argued that the response times are an important source of information in studying the validity of intelligence and ability tests [11,12]. Specifically, focusing on response times may provide insight in the relation between cognitive abilities as operationalized by latent variables on the one side and cognitive processing theories on the other side. Such a connection may shed light on the relation between interindividual differences as modeled by latent cognitive abilities and models for intraindividual processes [13][14][15][16].
Intraindividual processes can be hypothesized to underlie intelligence test scores from a dual processing framework [17,18]. In this framework, faster responses are assumed to reflect more automated processes that are proceduralized, parallel, and do not require active control, while slower responses are assumed to reflect more controlled processes that are serial and require attentional control. This distinction has been shown applicable to cognitive abilities. For instance, in arithmetic tests it has been postulated that subjects use both a more automated, memory retrieval process, and a more controlled calculation process to solve the problems of an arithmetic test [19]. Similarly, it has been shown that decision-making may involve a slower selective search strategy or a faster pattern recognition strategy in medical decisions [20] and in solving chess puzzles [21].
The dual processing framework may also be applicable to the block design test-the focus of the present paper. That is, for the block design test two processes have been distinguished: a more analytic process and a more global process. The global process involves trial and error, the subject rearranges the block until the pattern matches the design. In the analytical process, the subject infers the position of the blocks from the design first, and subsequently places the blocks in the right position to match the design (e.g., [22][23][24]). Up until now, these processes have been assumed to exist interindividually, however, it might be that subjects alternate between the different processes, or that subjects use both processes simultaneously.
Partchev and De Boeck [25] showed that intraindividual differences in the response process to intelligence test items indeed exist for a Matrix Reasoning test and a Verbal Analogies test. In their approach, different latent variables were postulated to underlie the faster responses and the slower responses. Next, it was tested whether these variables can be psychometrically distinguished in terms of their correlation (i.e., whether the correlation between the two latent variables deviates from 1.0) and in terms of their measurement properties (i.e., the item difficulty). Partchev and De Boeck found that the latent variables underlying faster and slower responses are separate but correlated variables characterized by different item difficulty parameters. Note that this approach accounts for both inter-and intraindividual differences as the subjects differ in their overall position on the latent variables (interindividual differences) and on the exact latent variable that underlies the response on a given item (intraindividual differences). That is, the same subject can respond according to the fast latent variable on one item and to the slow latent variable on the next.
To operationalize the fast and slow latent variables, Partchev and De Boeck [25] used both an item median split and a person median spilt of the response times. Next, using one-parameter item response theory models, the responses in the fast category are specified as indicators of a fast latent ability variable and the responses in the slow category are specified as indicators of a slow latent ability variable. This procedure has four challenges. First, the assignment to the fast and slow categories is deterministic rather than stochastic which does not allow for measurement error in the category assignment. Second, as the splits are carried out at the median, it is either assumed that the subjects use the fast category as much as the slow category (person median split) or it is assumed that for each item, the fast category is used as much as the slow category (item median split). Third, as the continuously distributed response times are dichotomized, the information concerning individual differences within each category (fast/slow) is not taken into account. Finally, fourth, possible differences in item discrimination are not taken into account (although this is possible, see [26,27]) while this can be expected due to the worst performance rule [7,8,28].
To accommodate the above challenges in the Partchev and De Boeck [25] study, Molenaar and De Boeck [29] proposed the method of response mixture modeling. In this method, each response by each subject is classified into a faster response class or a slower response class (corresponding to the fast and slow categories of Partchev and De Boeck). This approach is contrary to traditional mixture modeling where all responses by a given subject are classified into faster and slower classes, see Figure 1. In the model the above challenges in the Partchev and De Boeck study are addressed explicitly as response class assignment is stochastic, response times are treated as continuous variables, the faster and slower classes are not necessarily of equal size, and differences in item discriminations are allowed. However, still some challenges remain. That is, the response mixture model by Molenaar and De Boeck [29], assumes invariance of the response class sizes across items and the model does not incorporate separate latent variables for the faster and slower responses. In addition, the model was applied to a chess dataset and not to an intelligence dataset. The assumed invariance of the response class sizes across items means that, although the proportion of faster and slower responses can be different (e.g., 0.3 for slow and 0.7 for fast), the proportion of faster and slower responses is assumed to be equal for all items (i.e., 0.3-0.7 for all items). This assumption may be violated as some items may be more suitable for a faster response processes than other items. explicitly as response class assignment is stochastic, response times are treated as continuous variables, the faster and slower classes are not necessarily of equal size, and differences in item discriminations are allowed. However, still some challenges remain. That is, the response mixture model by Molenaar and De Boeck [29], assumes invariance of the response class sizes across items and the model does not incorporate separate latent variables for the faster and slower responses. In addition, the model was applied to a chess dataset and not to an intelligence dataset. The assumed invariance of the response class sizes across items means that, although the proportion of faster and slower responses can be different (e.g., 0.3 for slow and 0.7 for fast), the proportion of faster and slower responses is assumed to be equal for all items (i.e., 0.3-0.7 for all items). This assumption may be violated as some items may be more suitable for a faster response processes than other items. Therefore, the present paper has two aims: First, we present a generalization of the model by Molenaar and De Boeck [29] that includes two separate latent variables for faster and slower responses, and in which the assumption of class size invariance is relaxed. Second aim of this paper is to apply this model to the item responses and item response times of the block design subtest of the Hungarian standardization sample of the Wechsler Intelligence Scale for Children-IV (WISC-IV [30,31]). This application adds to the application by Partchev and De Boeck [25] in that (1) we study whether the findings by Partchtev and De Boeck replicate for the Wechsler test; and (2) using the new model, we can investigate whether the faster and slower response class sizes are unequal (i.e., different from 0.5) and whether they differ across items; and (3) we test whether there are differences in item discrimination between faster and slower responses.
The outline of the present paper is as follows: First we introduce a commonly accepted interindiviual latent variable approach to the analyses of responses and response times. Next, we extend this approach to include intraindividual differences in the faster and slower latent variables. Then, we apply the model to a simulated dataset to investigate parameter bias. Finally, we apply the model to the block design subtest of the Hungarian WISC-IV standardization data to make inferences about the structure of cognitive ability for faster and slower responses. Therefore, the present paper has two aims: First, we present a generalization of the model by Molenaar and De Boeck [29] that includes two separate latent variables for faster and slower responses, and in which the assumption of class size invariance is relaxed. Second aim of this paper is to apply this model to the item responses and item response times of the block design subtest of the Hungarian standardization sample of the Wechsler Intelligence Scale for Children-IV (WISC-IV [30,31]). This application adds to the application by Partchev and De Boeck [25] in that (1) we study whether the findings by Partchtev and De Boeck replicate for the Wechsler test; and (2) using the new model, we can investigate whether the faster and slower response class sizes are unequal (i.e., different from 0.5) and whether they differ across items; and (3) we test whether there are differences in item discrimination between faster and slower responses.
The outline of the present paper is as follows: First we introduce a commonly accepted interindiviual latent variable approach to the analyses of responses and response times. Next, we extend this approach to include intraindividual differences in the faster and slower latent variables. Then, we apply the model to a simulated dataset to investigate parameter bias. Finally, we apply the model to the block design subtest of the Hungarian WISC-IV standardization data to make inferences about the structure of cognitive ability for faster and slower responses.

Item Response Theory Modeling of Response Times
Arguably the most popular approach to the analysis of responses and response times is the so-called hierarchical model of Van der Linden [32,33] (see also [34][35][36][37][38]). In this approach, first, a measurement model is specified for the responses. Here, we follow the authors of [35] and use a two-parameter model that is: where X pi is the score of subject p on item i, θ p is the latent ability variable, α i is the item discrimination parameter, and β i is the item difficulty parameter. Next, the response time information is added to this model. Some authors have introduced the response times to the model above as a covariate (e.g., [39,40]). However, as argued by Van der Linden [33], this precludes separation of item differences and interindividual differences in the response time distribution. Pragmatically, this can be accomplished by centering the response times within each subject and within each item. However no inferences about interindividual differences in speed or item differences in time intensity can be made. A more explicit statistical approach is therefore adopted here. We follow Van der Linden and specify a linear measurement model for the log-transformed response times, that is, where lnT pi is the log-transformed response time of subject p on item i. The time intensity parameter, λ i , models the item effects on the response times. Some items require more time to be completed than others irrespective of ability, for example, because more background information needs to be processed (reading a text) or more operations need to be conducted (3 + 3´2 versus 3 + 3). The latent speed variable, τ p , models the interindividual differences in the response time distributions. That is, some subjects are overall faster than others. Note that τ p should be interpreted as a slowness factor as, due to its positive sign in the equation above, subjects with higher levels of τ p will have larger response times. Finally, ϕ i and ε pi are respectively the slope parameter (time discrimination parameter) and the residual in the linear regression of lnT pi on τ p with homoscedastic variance, VAR(ε pi ) = σ εi 2 . In the above, the log-transformation is used because the raw response times, T pi , are bounded by zero and skewed [41], which makes the assumption of linearity and homoscedasticity implausible. Therefore, we assume that the raw response times have a log-normal distribution, such that the log-transformed response times are normal (see also [32]). To connect the separate models for the responses and response times in Equations (1) and (2), we follow the authors of [35][36][37][38]42] and allow τ p and θ p to be correlated, that is: As discussed by [35,36], the resulting model is a two factor model with categorical indicators (the responses) for the first factor (latent ability) and continuous indicators for the other factor (response times). Note that this model assumes conditional independence, that is, conditional on the latent speed and ability variables, the responses and response times are independent.

An Extension to Model Intraindividual Differences
Here, we retain the idea of separate measurement models for the responses and log-response times. That is, for the response times we use Equation (2). However, for the responses, we follow Partchev and De Boeck [25] and we use a separate two-parameter measurement model for the faster responses ("faster measurement model") and a separate two-parameter measurement model for the slower responses ("slower measurement model"). We avoid the necessity of median splitting the response times by formulating the overall model for the responses as a mixture of the faster and slower measurement models, that is: where π pi is the mixing proportion which denotes the probability that subject p answers item i using the slower latent variable, θ p (s) , as opposed to faster latent variable θ p (f ) . The faster and slower measurement models are then given by, respectively: and where with That is, the latent variables are multivariate normal and are allowed to covary.
To specify the faster responses as indicators in the faster measurement model and the slower responses as indicators in the slower measurement model, the mixing proportion π pi is made a function of the residual response time, ε pi . That is: where parameter ζ 1i is the faster-slower slope parameter which models the degree to which item i can distinguish between the faster and slower classes. That is, for larger ζ 1i , the item contains more information about the different measurement models for faster and slower responses. This parameter is constrained to be non-negative to ensure that the parameters in the slower measurement model (i.e., α i (s) and β i (s) ) correspond to the measurement properties of the slower responses (i.e., larger ε pi ).
The parameter ζ 0i is a threshold parameter, which indicates how many responses are classified as slower for item i. The larger ζ 0i is, the less responses are classified as "slower". That is, due to this parameterization, ζ 0i can be interpreted as the "difficulty" to respond slowly. From the parameter estimates for ζ 1i , ζ 0i , and σ εi 2 , the class size of the slower class, π i (s) , can be calculated for each item. 1 Note that the item median split procedure used by Partchev and De Boeck [25] assumes that π i (s) = 0.5 which corresponds to ζ 0i = 0. In Equation (9), note that the residual response time, ε pi , does not include the main effect of the subjects and the main effect of the items on the log-response times. Therefore, a given response is more likely to be classified into the faster/slower measurement model if the response is faster/slower than expected for that subject on that item. For instance, a response of 2 s to a given item can be classified as "faster" for one subject but as "slower" for another subject depending on the subjects' overall level of speed, τ p . Similarly, a response of 2 s for a given subject can be classified as "faster" for one item but as "slower" for another item depending on the intensity of the items, λ i . This is the key difference 1 Specifically, π i (s) is obtained by integrating out ε pi from Equation (9). That is, π i (s) = ş 8 8 ωrζ 1ˆp ε pi´ζ0i qsϕpε pi {σ εi qdε pi , where ω( ) is a logistic function and ϕ( ) is the standard normal density function. with interindividual mixture modeling which classifies subjects and not the separate responses into the mixture components. As the model above is a generalization of the response mixture model by Molenaar and De Boeck [29] as we discuss below, we also refer to the model above as response mixture model. Note that the model assumes that the faster and slower classes hold within all subjects in the population. However, the posterior class probability for each subject-item combination (π pi ) may reveal that a given subject hardly uses one of the classes, suggesting that there is only a single process for this subject.
To identify the full model above, some constraints are necessary. To identify the scales and location of the latent variables, σ τ 2 = σ 2 θ (s) = σ 2 θ (f ) = 1 and µ = {0,0,0} respectively. If restricted versions of the model are considered, for instance a model with α i (s) = α i (f ) , some of these constraints can be dropped, that is σ θ (s) or σ θ (f ) can be freed allowing for a scalar effect of the faster/slower response class on the discrimination parameters. Similarly, in a model with β i (s) = β i (f ) , the mean of the faster or slower latent variable can be freed, µ (s) or µ (f ) , allowing for a scalar effect on the difficulty parameters.

Special Cases
As discussed, the intraindividual differences model above is a generalization of the response mixture model by Molenaar and De Boeck [29]. That is, if in the model above, one uses the same latent variable in the faster and slower measurement models (i.e., cor(θ (s) , θ (f ) ) = 1), and if ζ 0i and ζ 1i are fixed to be equal across items (i.e., ζ 0i = ζ 0 , and ζ 1i = ζ 1 ), the model is equivalent to the model by Molenaar and De Boeck.
The traditional model in Equations (1)-(3) can be obtained from the full model by specifying θ (s) and θ (f ) to be psychometrically fully equivalent. Equivalence of θ (s) and , and ζ 1i = 0. Given these restrictions, π pi will cancel out Equation (4) because it will hold that PpX pi " 1|θ psq p q = PpX pi " 1|θ p f q p q, as a result, parameter ζ 0 will not be in the model anymore.

Conditional Dependence
As discussed above, in the traditional interindividual differences model outlined by Equations (1)-(3), it is assumed that conditional on the latent speed and the latent ability variable, τ p and θ p , the responses and response times of item i are independent. In the intraindividual extension of this model outlined by Equations (4)-(9), conditional dependence is explicitly modeled: If ζ 1i in Equation (9) deviates from 0, the measurement properties of the responses depend on the length of the response times. That is, there are intraindividual differences in the measurement model that applies to the responses, reflecting the use of different response processes.
Any phenomenon that leads to fluctuations in ability and speed throughout the test administration will result in violations of conditional independence between responses and response times [43]. Therefore, if subjects make more (or less) errors and respond faster (or slower) throughout the test irrespective of the difficulty and the time intensity of the items, conditional independence will be violated. This may for instance happen due to practice, decreased motivation, or increased fatigue throughout the test administration. Such a trend will be captured by the faster and slower classes in the present approach. This will be evident from the results. That is, in the case of learning, the faster response class will be larger (larger ζ 0i ) and less difficult (smaller β 1i ) for later items. In the case of a decreased motivation (faster responses with more errors for later items), the faster response class will be larger but the items will be more difficult in this class. In the discussion we return to this point.

Estimation
We implemented a Bayesian Markov Chain Monte Carlo procedure (MCMC; see, e.g., [44]) within the OpenBUGS software package [45]. To this end we specified N(0,10) prior distributions for the parameters β i (s) , β i (f ) , ϕ i , and λ i . In addition, for the parameters α i (s) , and α i (f ) we specified N(0, 10) prior distributions that are truncated below 0.01 to ensure that the parameter values are strictly positive.
For ζ 1i , we truncated the N(0,10) prior below 0. We specified a N(0,10) on log(σ ε 2 ). We scaled the latent variable by fixing their variances to equal 1, therefore, Σ is a 3ˆ3 correlation matrix, which we decompose as follows: where Γ is a 3ˆ1 column vector with parameters γ k for k = 1, 2, 3 and 1 3 is a 3ˆ1 column vector of 1 s. For each element of Γ we specified a uniform prior from´1 to 1. Finally, in Equation (9), ε pi does not need to be estimated from the data, as it can be calculated by: that is, the residual response time is the log-response time corrected for the main effect of the subjects and the main effect of the items. See the website of the first author for the OpenBUGS script.

Data Generation
To show the viability of the present approach, we applied the response mixture model to a simulated data set. Data were simulated using 2000 subjects and 30 items. The following true parameter values were used: First, for the discrimination parameters we created of a vector of increasing and equally spaced values that were subsequently assigned randomly to the items 1 to 30.
For α i (f ) we used values between 1 and 2 and for α i (s) we used values between 2 and 3 to reflect that slower responses discriminate better than faster responses (i.e., according to the "worst performance rule" [28]). For the difficulty parameters we used increasing, equally spaced values betweeń 2 and 1 for β i (f ) and between´1 and 2 for β i (s) to reflect that faster responses are associated with less errors as compared to slower responses [25]. Next, for ζ 0i we used increasing and equally spaced values between´0.5 and 0.5. Note that the items are thus ordered on both their difficulty (β i (s) and β i (f ) ) and their threshold parameter (ζ 0i ) reflecting that the harder the item, the more likely subjects are to resort to slower processes. This relation between item difficulty and the slower process has been chosen based on the real data application below. For the true values of the slope parameters, ζ 1i , we created of a vector of increasing and equally spaced values between 1 and 3 that were subsequently assigned randomly to the items 1 to 30. In addition, the correlations in Σ were fixed to cor(θ (s) , θ (f ) ) = 0.8, cor(τ, θ (s) ) =´0.6, and cor(τ, θ (f ) ) =´0.7. Note that although these correlations are large, they are not exceptional as for the Block Design application below, we found even larger correlations. Finally, the other parameters were fixed to: ϕ i = 2, σ εi 2 = 3, λ i = 2 for all i.

Results
Using our OpenBUGS implementation of the model, we drew 150,000 samples from the posterior parameter distribution of which we discarded the first 75,000 samples as burn-in. Preliminary data simulations showed that this is sufficient to ensure convergence of the chain to its stationary distribution. In Figure 2, trace plots are depicted for the parameters α i (f ) , α i (s) , ζ 0i , and ζ 1i for item 1, trace plots for the other parameters were similar. As can be seen, after the burn-in samples (solid grey line), the samples scatter randomly around a stable average.
Using our OpenBUGS implementation of the model, we drew 150,000 samples from the posterior parameter distribution of which we discarded the first 75,000 samples as burn-in. Preliminary data simulations showed that this is sufficient to ensure convergence of the chain to its stationary distribution. In Figure 2, trace plots are depicted for the parameters αi (f) , αi (s) , ζ0i, and ζ1i for item 1, trace plots for the other parameters were similar. As can be seen, after the burn-in samples (solid grey line), the samples scatter randomly around a stable average. In Figure 3, the true parameter values are plotted against the estimated values. As can be seen, for βi (s) , βi (f) , αi (f) , and ζ0i parameter recovery is good. For the αi (s) and ζ1i parameters, estimates are characterized by larger variability as compared to the other parameters. In addition, they tend to be slightly biased. However, given the large variability in the parameter estimates the bias does not seem to be severe. We investigate this possible bias in more depth in the application section where we simulate data given the results of the block design data. With respect to the variability in the parameter estimates, it can be concluded that the power to reject the restriction αi (s) = αi (f) will generally be small. In practice it seems, thus, advisable to consider restricting αi (s) = αi (f) . In addition, ζ1i could be constrained to be equal across items to decrease parameter variability in this parameter. We will consider these possibilities in the real data application next. In Figure 3, the true parameter values are plotted against the estimated values. As can be seen, for β i (s) , β i (f ) , α i (f ) , and ζ 0i parameter recovery is good. For the α i (s) and ζ 1i parameters, estimates are characterized by larger variability as compared to the other parameters. In addition, they tend to be slightly biased. However, given the large variability in the parameter estimates the bias does not seem to be severe. We investigate this possible bias in more depth in the application section where we simulate data given the results of the block design data. With respect to the variability in the parameter estimates, it can be concluded that the power to reject the restriction α i (s) = α i (f ) will generally be small. In practice it seems, thus, advisable to consider restricting α i (s) = α i (f ) . In addition, ζ 1i could be constrained to be equal across items to decrease parameter variability in this parameter. We will consider these possibilities in the real data application next.
to be severe. We investigate this possible bias in more depth in the application section where we simulate data given the results of the block design data. With respect to the variability in the parameter estimates, it can be concluded that the power to reject the restriction αi (s) = αi (f) will generally be small. In practice it seems, thus, advisable to consider restricting αi (s) = αi (f) . In addition, ζ1i could be constrained to be equal across items to decrease parameter variability in this parameter. We will consider these possibilities in the real data application next.

Data
Here we present an analysis of the Hungarian standardization data of the Block Design subtest from the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV [30,31]). This test involves arranging colored blocks as fast as possible into a displayed pattern. There are 14 items that are scored 1 (correct) and 0 (false). Response times are recorded starting at the moment that the subject was presented with the item until the moment that the subject indicated that he or she was finished, gave up, or the time was up. The number of subjects equals 978 with an age range between 6 and 15. Preliminary analysis indicated that measurement invariance of the response and response time measurement model in Equations (1)-(3) is tenable for older and younger subjects. The older subjects

Data
Here we present an analysis of the Hungarian standardization data of the Block Design subtest from the Wechsler Intelligence Scale for Children-Fourth Edition (WISC-IV [30,31]). This test involves arranging colored blocks as fast as possible into a displayed pattern. There are 14 items that are scored 1 (correct) and 0 (false). Response times are recorded starting at the moment that the subject was presented with the item until the moment that the subject indicated that he or she was finished, gave up, or the time was up. The number of subjects equals 978 with an age range between 6 and 15. Preliminary analysis indicated that measurement invariance of the response and response time measurement model in Equations (1)-(3) is tenable for older and younger subjects. The older subjects are overall more accurate and faster with more variance in their responses and response times (as reflected in a significant difference in the mean and variance of both θ and τ; results are available upon request). These differences between the younger and older subjects may reflect differences in response processes across ages. For instance, as shown by Rozencwajg and Corroyer [46], children of different ages use the same processes (global and analytic), but older children use the analytic process more often than the global processes. Therefore, we think that for our modeling approach, the differences between young and old as reported above are not necessarily a problem as these differences will be captured in the response class membership of the subjects (i.e., the responses of the older subjects-which are overall faster and more accurate-will be classified more often in a fast and accurate response class and less often in a slow and inaccurate response class). Proportions

Conditional Independence
As violations of conditional independence indicate intraindividual differences in the response measurement properties across response times, we started by testing for conditional independence between the responses and response times of the items. Specifically, we fit the traditional model in Equations (1)-(3) to the data using weighted least squares estimation in Mplus [47] with and without residual correlations between the responses and response times of the items. Note that the model

Conditional Independence
As violations of conditional independence indicate intraindividual differences in the response measurement properties across response times, we started by testing for conditional independence between the responses and response times of the items. Specifically, we fit the traditional model in Equations (1)-(3) to the data using weighted least squares estimation in Mplus [47] with and without residual correlations between the responses and response times of the items. Note that the model with residual correlations between the responses and response times is identical to the model proposed by the authors of [36]. For other more Bayesian oriented approaches see References [48,49].
For each item we consulted (1) the univariate modification index of the correlation between the responses and the response times in the traditional model (i.e., the model without residual correlations); and (2) the size of the residual correlation between the responses and response times in the model with residual correlations. The modification indices are related to the Lagrange multiplier test [50] proposed by the authors of [43] to test the assumption of conditional independence of responses and response times. Modification indices are chi-square distributed with 1 degree of freedom and indicate the impact of a parameter constraint on the log-likelihood of the model. Note that the modification indices are estimated under the assumption that conditional independence holds for all other items. See Table 1 for both the modification indices and the residual correlations. As can be seen, conditional independence is clearly violated for most of the items, although effect size is small. This finding is in line with [43] who found this type of conditional independence being violated in most of the items in a large-scale computerized test. Interestingly, the residual correlations are negative for all items but item 1, indicating that slower responses (higher ε pi ) are associated with a smaller probability of a correct response. This is not due to the brighter subjects being faster, as the residual correlation is separate from the main ability and speed effects (as these are captured by the respective latent variables).

Response Mixture Models
To investigate how faster responses differ in their measurement properties as compared to slower responses, we resorted to the response mixture models outlined in this paper. We fit different models to the data. First, the traditional model from Equations (1)-(3) was fit as a baseline model (Model #0; i.e., a model without mixtures, as discussed above). For this model, we run 10,000 iterations with 5000 as burn-in as this was already sufficient for convergence. Table 2 depicts the discrimination parameters (α i ) and the difficulty parameters (β i ) for the baseline model. In this model, no distinction is made between a faster and a slower measurement model. That is, this model is misspecified with respect to conditional dependence. As can be seen, the items increase in their difficulty and the discrimination parameters differ notably between items. In addition, the parameters of item 1 and 2 are associated with relatively large uncertainty (reflected by the large standard deviation of the parameters). This is due to the large proportion correct for these items (i.e., only 1 and 8 subjects failed on these items, respectively). Next, a series of response mixture models was fit to the data. For the response mixture models, we run two chains of 500,000 iterations each. We omitted the first 250,000 as burn-in. To check whether the MCMC sampling scheme converged to its stationary distribution we considered trace plots of the parameters and the Gelman and Rubin statistic on the equality of the variance across the two chains [51]. First, from the trace plots all chains seemed to vary randomly around a stable average. Second, the Gelman and Rubin statistic for all parameters were close to 1.00. Table 3 depicts the Deviance Information Criterion (DIC [52]), and a modified version of the Akaike's Information Criterion [53] and the Bayesian Information Criterion [54], referred to as mAIC and mBIC, respectively. The mAIC and mBIC have been proposed by Molenaar and De Boeck [29] for model comparison in response mixture models. 2 In addition, Table 4 contains the correlations between the latent variables in the different models together with σ θ (s) and µ θ (s) . As can be seen from the tables, a full model was considered with all parameters free (Model #1). In this model, the faster and slower latent variables are highly correlated as is also found by Partchev and De Boeck [25]. We do not compare the full model to the baseline model (Model #0) in terms of the DIC, mAIC and mBIC fit indices as the models contain a different number of random effects (two in the baseline model and three in the full model), which distorts model comparison (see, e.g., [55,56]). However, given that we found that conditional independence was violated for most of the items (see above), it is suggested that the full model should be preferred over the baseline model.   Next, we considered Model #2 with equal discrimination parameters across the faster and slower measurement models (α i (s) = α i (f ) ) but with the standard deviation of the slower latent variable, σ θ (s) , free to allow for a uniform effect on the discrimination parameters. This model fit the data better in terms of the DIC, mAIC, and mBIC as compared to the full model (Model #1). We therefore proceeded by retaining the restriction on the discrimination parameters and additionally constraining the difficulty parameters for the faster and slower responses to be equal (β i (f ) = β i (s) ). We freed the mean of the slower latent variable variable, µ (s) , to allow for a scalar effect on the difficulty parameters. This resulted in Model #3. The model fit the data worse as compared to Model #2 in terms of the DIC, mAIC, and mBIC. We therefore concluded that the difficulty parameters differ across faster and slower responses. In Model #4, we allowed the difficulty parameters to differ again between the faster and slower responses; however, we constrained the slope parameters, ζ 1i , to be equal across items (ζ 1i = ζ 1 ). This restriction resulted in an improved DIC, mAIC, and mBIC as compared to Model #3. Additionally constraining ζ 0i to be equal across items (ζ 0i = ζ 0 ; Model #5) resulted in a deterioration in model fit.
As Model #4 is the best fitting model, we conclude that the slope parameters ζ 1i are equal across items [ζ 1i = ζ 1 ], that the discrimination parameters are equal for the faster and slower classes [α i (s) = α i (f ) ], but that the difficulty parameters are different for the faster and slower In addition, the faster and slower response class sizes are unequal across items [ζ 0i ‰ ζ 0 ]. In Model #4, the estimate for ζ 1 equaled 10.72 (sd: 1.296; 95% Highest Posterior Density region, HPD: 8.46; 13.52), which is large because its scale depends on the scale of ε pi . That is, for all items, σ εi 2 equaled between 0.20 and 0.45.
As Model #3 is rejected in favor of Model #4, it can be concluded that the differences in the faster and slower difficulty parameters do not only reflect an overall effect (i.e., a difference in µ θ (s) and µ θ (f ) ) but also additional item specific effects. For the faster and slower discrimination parameters that are shown to be equal for the faster and slower classes, there may still be an overall effect in Model #4 as σ θ (s) is estimated freely. That is, in Table 4, it can be seen that the estimate for σ θ (s) in Model #4 equaled 1.18. It is interesting to see whether this parameter deviates from 1 as this will indicate that the slower responses discriminate overall better (as σ θ (s) > 1 see Table 4) than the faster responses (as σ θ (f ) = 1 for identification reasons). However, the 95% HPD region for σ θ (s) runs from 0.93 to 1.48, thus, as this interval contains 1, there is no clear evidence against the assumption that faster and slower responses discriminate equally well in the WISC data. For the estimates of α i , β i (f ) , β i (s) , and ζ 0i in Model #4, see Table 5. From the 95% HPD regions it appeared that only ζ 0i departures from 0 for all items, except item 1 and 2. Indicating that the faster and slower response class sizes are unequal (i.e., they depart from 0.5). In addition, Table 5, contains the size of the slower response class (π i (s) ) on each item can be determined from the estimates of ζ 0i , ζ 1 , and σ εi 2 . As can be seen, the slower response class is used more often for the items at the end of the test. As the items are administered in order of difficulty (as can also be seen in Table 2, the difficulty increases for increasing item number), this result suggests that for more difficult items, respondents resort more to slower response processes. With respect to the β i (s) and β i (f ) parameters estimates, it can be seen that they differ substantially. In Figure 5 this difference is graphically displayed. For all items the difficulty parameters are larger for the slower responses, except for item 1 (which explains the negative residual correlations in Table 1).
displayed) it appeared that the differences between the faster and slower difficulty parameters are somewhat smaller, however, the general effect from Figure 5 is still visible.  Simulation. As parameters αi (s) and ζ1i tend to be slightly biased in the simulation study, we investigated whether this is also the case given αi = αi (s) = αi (f) and ζ1i = ζ1 as found above. That is, we simulated data using the exact setup as in the application above. We used the final model, Model #4,  Table 5. Model #4: Posterior parameter means and standard deviations of the parameters in the measurement model for faster and slower responses and the mixing threshold parameters together with the size of the slower response class (π i (s) ). To investigate whether the results above are stable and do not depend too much on the exact choice concerning the parameter restrictions, we also present the results for the model with equal mixing parameters across items (i.e., ζ 0i = ζ 0 ; ζ 1i = ζ 1 ). In this model, ζ 0 was estimated to be 0.32 (95% HPD: 0.27; 0.37) and ζ 1 was estimated to be 10.00 (95% HPD 7.61; 13.08). From the results (not displayed) it appeared that the differences between the faster and slower difficulty parameters are somewhat smaller, however, the general effect from Figure 5 is still visible.
Simulation. As parameters α i (s) and ζ 1i tend to be slightly biased in the simulation study, we investigated whether this is also the case given α i = α i (s) = α i (f ) and ζ 1i = ζ 1 as found above.
That is, we simulated data using the exact setup as in the application above. We used the final model, Model #4, and fit Model #4 to these simulated data to see whether the α i , σ θ (s) , and ζ 1 parameters are adequately recovered. See Figure 6 for a plot of the true values of α i against the estimated values.
As can be seen, the true values are adequately recovery. In addition, the estimate for σ θ (s) equaled 1.42 (95% HPD: 1.09; 1.78) where the true value is 1.18 (see Table 4), and the estimate of ζ 1 equaled 9.22 (95% HPD: 7.61; 11.14) where the true value is 10.72 (see above). Thus, we think that the true values are adequately recovered given the parameter variability. and fit Model #4 to these simulated data to see whether the αi, σθ (s) , and ζ1 parameters are adequately recovered. See Figure 6 for a plot of the true values of αi against the estimated values. As can be seen, the true values are adequately recovery. In addition, the estimate for σθ (s) equaled 1.42 (95% HPD: 1.09; 1.78) where the true value is 1.18 (see Table 4), and the estimate of ζ1 equaled 9.22 (95% HPD: 7.61; 11.14) where the true value is 10.72 (see above). Thus, we think that the true values are adequately recovered given the parameter variability. True parameter values (x-axis) and estimated parameter values (y-axis) for the discrimination, difficulty, and mixing parameters in the simulated data application. The solid lines denote a 1 to 1 correspondence and r is the correlation between the estimated and true values.

Discussion
This paper was motivated by the finding of Partchev and De Boeck [25] that for a Matrix Reasoning and a Verbal Analogy intelligence subtest, faster responses load on a different latent variable with different item difficulty parameters than slower responses. In this paper, we followed up on this finding by (1) replicating the Partchev and De Boeck findings in the Block Design subtest of the WISC; (2) by additionally testing whether item discrimination parameters differ across faster and slower responses; and (3) by testing whether the faster and slower response class sizes are unequal and whether they differ across items.
The results show that slower responses discriminate about equally well as compared to the faster responses. This finding is in contrast with the "worst performance rule" [28], which predicts that slower responses contain more information about individual differences in ability than faster responses. This contrasting result might be due to a lack of power to detect a difference. That is, in the simulation study it was shown that the slower discrimination parameters are associated with relatively large uncertainties. Although, in the application, we constrained the slower and faster discrimination parameters to be equal, which would be expected to increase the power to detect an overall effect, it is still possible that the power was too small to detect a difference. An in-depth study of the power to detect unequal discrimination parameters across faster and slower responses given realistic effect sizes seems valuable therefore. True parameter values (x-axis) and estimated parameter values (y-axis) for the discrimination, difficulty, and mixing parameters in the simulated data application. The solid lines denote a 1 to 1 correspondence and r is the correlation between the estimated and true values.

Discussion
This paper was motivated by the finding of Partchev and De Boeck [25] that for a Matrix Reasoning and a Verbal Analogy intelligence subtest, faster responses load on a different latent variable with different item difficulty parameters than slower responses. In this paper, we followed up on this finding by (1) replicating the Partchev and De Boeck findings in the Block Design subtest of the WISC; (2) by additionally testing whether item discrimination parameters differ across faster and slower responses; and (3) by testing whether the faster and slower response class sizes are unequal and whether they differ across items.
The results show that slower responses discriminate about equally well as compared to the faster responses. This finding is in contrast with the "worst performance rule" [28], which predicts that slower responses contain more information about individual differences in ability than faster responses. This contrasting result might be due to a lack of power to detect a difference. That is, in the simulation study it was shown that the slower discrimination parameters are associated with relatively large uncertainties. Although, in the application, we constrained the slower and faster discrimination parameters to be equal, which would be expected to increase the power to detect an overall effect, it is still possible that the power was too small to detect a difference. An in-depth study of the power to detect unequal discrimination parameters across faster and slower responses given realistic effect sizes seems valuable therefore.
On the other hand, the failure to find differences in discrimination between faster and slower responses may suggest that the worst performance rule, which is typically established using experimental task with very fast responses (500-1000 ms), does not hold for cognitive ability tasks with responses that are substantially larger. For instance, the authors of [57] tested the worst performance rule by correlating measures of fluid intelligence to the response times on a psychomotor vigilance task. It was found that the higher quantiles of the response times correlated stronger with fluid intelligence. However, response times on the psychomotor vigilance task were on average 274 ms for the first quantile and 484 ms for the fifth quantile while in the present study, response times to the block design task are between 1 and 360 s.
A second finding in the application is that slower responses are associated with less accuracy than faster responses. This finding is in line with both [18,25]. It suggests that some responses take longer because these responses are based on processes that are more error prone and inefficient as compared to the faster responses. A final finding was that the faster and slower response class sizes are unequal and differ across items with the slower response class being larger for the more difficult items. Overall, the most important finding of the present study and the Partchev and De Boeck study [24] is that faster responses differ in their measurement properties from the slower responses. This result indicates that a different intraindividual process underlies the responses to the different items. In establishing this, both in the present study and in the Partchev and De Boeck study it was assumed that the response times can be separated into the binary categories of faster and slower responses. This binary distinction can reflect a true dichotomy underlying the response times, as is the case in for instance a dual processing framework [17,18]. For the present application to the WISC block design subtest, the faster process might be a more analytic process and the slower process might be a more global process (e.g., [24]). The exact substantive interpretation of the faster and slower response class however need to be established using additional covariates. For instance, covariates like test taking behavior ("trial and error" or "pattern memorization"), brain-imaging data, or eye tracking information can be used to establish the nature of the two classes.
However, our results do not mean that there needs to be a true dichotomy underlying the response processes of the block design subtest. That is, in our application we have used a dichotomy, but in reality there may be multiple processes. For instance, Rozencwajg and Corroyer [46] argued for a third, syntactic process to underlie the block design. This does not invalidate the present results as in the case of multiple processes (e.g., A, B, C, D, E, and F), the faster and slower latent variables will capture the variation in the item responses by the faster processes (e.g., A, B, and C) on the one hand and the variation by the slower processes on the other hand (e.g., D, E, and F). Adopting a modeling approach with more than two classes is possible in principle, but identification and estimation of such a model is highly challenging. Other possibilities might be that a continuous mode of processing, or a sequential use of processes (i.e., subjects use the slow process whenever the fast process was unsuccessful) exist in reality. Then again, our approach can still capture the most important patterns in the data (i.e., differences between the faster and slower responses).
Another result that needs discussion is the almost perfect correlation that was found between the fast and slow latent ability variables. This result indicates that there is only one dimension of interindividual differences underlying the block design test. As the item difficulties associated with the latent abilities differed across the faster and slower classes, a dimension of intraindividual differences can be distinguished as well. That is, the slower responses have different measurement properties as compared to the faster responses suggesting a different response processes, but the underlying dimension of interindividual differences is the same. This result is similar to what is found by DiTrapani, Jeon, De Boeck, and Partchev [27].
As we showed that intelligence subtests may be potentially heterogenous with respect to the intraindividual processes that are measured, the results imply that for a given test administration one should decide whether this source of intraindividual variation is considered desirable or a confound. That is, does the intraindividual variation contribute to the validity of the test (e.g., [13]) or does it provide a confound that should be eliminated (e.g., [58]). In the case of arithmetic tests for instance, the heterogeneity in intraindividual processes may be considered desirable as the test taps into both memory retrieval and problem solving, which are both aspects of arithmetic [19].
In other settings, differences in response times might be seen as a confound. For instance, undesirable strategies such as faking on some of the items of a test [59], the use of item preknowledge [60], learning and practice effects [2], post error slowing [61], and fatigue and motivation issues [62], will all result in intraindividual variation that harms the validity of an intelligence test. In addition, the authors of [58] discusses how different speed-accuracy compromises can be seen as a confound. That is, subjects differ in the amount of time they use to solve a given item. Therefore, an incorrect response may indicate either that the subject didn't use enough time or it may reflect the inability to solve that item. As a solution, [63] proposed the response signal paradigm. In this paradigm a subject can only respond to a test item at a given moment after the item is presented (e.g., after 20 s). Therefore, all response times will be equal for all subjects, which standardize the speed-accuracy tradeoff within and between subjects. Heterogeneity may therefore be decreased because some (but not all) subjects are not allowed enough time to solve a given item using a more accurate but more time-consuming response process.