A Practical Cross-Sectional Framework to Contextual Reactivity in Personality: Response Times as Indicators of Reactivity to Contextual Cues

Contextual reactivity refers to the degree in which personality states are affected by contextual cues. Research into contextual reactivity has mainly focused on repeated measurement designs. In this paper, we propose a cross-sectional approach to study contextual reactivity. We argue that contextual reactivity can be operationalized as different response processes which are characterized by different mean response times and different measurement properties. We propose a within-person mixture modeling approach that adopts this idea and which enables studying contextual reactivity in cross-sectional data. We applied the model to data from the Revised Temperament and Character Inventory. Results indicate that we can distinguish between two response specific latent states. We interpret these states as a high contextual reactive state and a low contextual reactive state. From the results it appears that the low contextual reactive state is generally associated with smaller response times and larger discrimination parameters, as compared to the high contextual reactivity state. The utility of this approach in personality research is discussed.


Introduction
For the past decade, there has been an increasing interest in within-person variance in personality behavior [1,2]. Although personality traits are relatively stable over time [3][4][5][6][7] personality related behavior, emotions and affects (i.e., states) tend to be variable [8][9][10][11] This variability arises because personality states are susceptible to contextual cues in a given situation [2,[10][11][12]. That is, people behave, think, or feel differently in different situational contexts. In addition, the amount of within-person variance in states also varies between persons and measured traits [9,13,14]. These findings suggest that the contextual cues, which are a source of within-person variance in states, do not affect every person or every state in the same way. This effect of contextual cues on the within-person variance in states has in the literature been referred to as: situational sensitivity (e.g., [9]), context effects [15], personality strength (e.g., [16]), stable/instable personality (e.g., [17]) and personality by situation interaction (e.g., [18]). In this paper, we adopt the term contextual reactivity.
Contextual reactivity has been argued to be an important predictor for personality related behaviors [16,19] and other psychological phenomena or psychopathologies [20,21]. However, Psych 2020, 2 254 the assessment of contextual reactivity is challenging as it requires relatively complex repeated measurement designs (e.g., the within and across context approach by [14]). Although the use of experience sampling apps makes gathering such longitudinal data much easier than a few decades ago, it can still be considered a challenging research design.
In this paper, we will derive a cross-section operationalization of contextual reactivity using the response times on personality items. The purpose of this cross-sectional operationalization is two-fold: First, it is intended to contribute to the theory on traits and states in individual differences research in general, and contextual reactivity in particular. Second, it provides a practical framework for researchers who want to study within-subject differences in personality but for who it is practically not possible to administer a repeated measurement design.

Theoretical Background
The derivation of the framework theoretically draws from: (1) the distinction between traits and states (e.g., [22]) and the notion of within-person variance in personality [1,2] and (2) information accumulation decision processes for personality [23,24] and the distance-difficulty hypothesis (e.g., [25]) which we elaborate on below.

Traits, States, and Within-Person Variance
The personality trait-state distinction has been highlighted within the personality literature throughout the years [8,22,[26][27][28]. A personality trait is defined as the relatively stable, time-invariant, mean tendency of behavior [26,29]. Behavior in this sense does not only refer to the enactment of personality, but also to the cognitions, emotions, and affects that are related to it. Personality states have been defined as the deviations of a person's behavior around this personality trait.
Both traits and states can show variation throughout a person's lifespan. However, trait variation is defined as slow irreversible change, where persons may slightly change over a time span of years or decades. The slow trait change is assumed to occur due to maturation and life experiences [30,31]. Because the change is so slow, we observe traits to be relatively stable over time. State variation is defined as the more fluctuating day-to-day variation in personality behaviors. State variation mostly occurs due to reactivity to daily contextual cues in different situations [28,32,33]. For example, a person who is highly extraverted may not act extraverted in every situation. The individuals' behavior may be influenced by the context of a specific situation. Because these contextual cues vary in day-to-day life, personality states also tend to be variable. Thus, trait change is the within-person mean change that happens over years, while state fluctuations are the within-person variations in behavior, mainly caused by differences in contextual cues in different situations [22]. As a result, high contextual reactivity is characterized by large fluctuations in behavior and thus in large within-person variance, whereas low contextual reactivity is characterized by small within-person variance [9] (see Figure 1).

Information Accumulation Decision Processes in Personality and the Distance-Difficulty Hypothesis
Recently, it has been argued that the response process underlying personality inventories follows an information accumulation decision process [23,24,34]. In this process, the person accumulates evidence supporting or opposing the statement of a personality item by internally assessing past behavior and comparing that to the item statement. If enough evidence has been gathered (i.e., a decision boundary is reached), the person will make a decision based on the direction of the evidence. The time it takes to reach a decision is therefore an important indicator for this decision process.
In this information accumulation process, high contextual reactivity will be characterized by a relatively indecisive accumulation process as due to the large within-person variance in personality behavior, there will be evidence both supporting and opposing the item statement. In contrast, low contextual reactivity will be characterized by evidence that is mostly directed towards one of the options (supporting or opposing the item statement). Hence, high contextual reactivity will be Psych 2020, 2 255 defined by larger response times due to a longer period of evidence accumulation than low contextual reactivity, see Figure 2 for a graphical representation of this idea. variation mostly occurs due to reactivity to daily contextual cues in different situations [28,32,33]. For example, a person who is highly extraverted may not act extraverted in every situation. The individuals' behavior may be influenced by the context of a specific situation. Because these contextual cues vary in day-to-day life, personality states also tend to be variable. Thus, trait change is the within-person mean change that happens over years, while state fluctuations are the withinperson variations in behavior, mainly caused by differences in contextual cues in different situations [22]. As a result, high contextual reactivity is characterized by large fluctuations in behavior and thus in large within-person variance, whereas low contextual reactivity is characterized by small withinperson variance [9] (see Figure 1). Psych 2020, 2, FOR PEER REVIEW 3 Figure 1. State variation and trait change (upper plots), and the within-person distribution of states (lower plots) for high contextual reactivity (left) and low contextual reactivity (right).

Information Accumulation Decision Processes in Personality and the Distance-Difficulty Hypothesis
Recently, it has been argued that the response process underlying personality inventories follows an information accumulation decision process [23,24,34]. In this process, the person accumulates evidence supporting or opposing the statement of a personality item by internally assessing past behavior and comparing that to the item statement. If enough evidence has been gathered (i.e., a decision boundary is reached), the person will make a decision based on the direction of the evidence. The time it takes to reach a decision is therefore an important indicator for this decision process.
In this information accumulation process, high contextual reactivity will be characterized by a relatively indecisive accumulation process as due to the large within-person variance in personality behavior, there will be evidence both supporting and opposing the item statement. In contrast, low contextual reactivity will be characterized by evidence that is mostly directed towards one of the options (supporting or opposing the item statement). Hence, high contextual reactivity will be defined by larger response times due to a longer period of evidence accumulation than low contextual reactivity, see Figure 2 for a graphical representation of this idea. In addition, as the decision follows a different process for high and low contextual reactivity, the measurement parameters of the responses will be different (due to a different rate of information accumulation, and possibly, a different distance between the boundaries). Note that this is related to the notion of measurement invariance (e.g., [35][36][37]) which states that different response processes will result in different measurement parameters. Moreover, the idea above is in line with the distance-difficulty hypothesis from personality research [17,25,38,39], in which it has been shown that stable personality scores over time (i.e., small within-person variance and thus low contextual reactive) are associated with more decision certainty and faster response times than scores that show more variability over time.
As was shown by Geukes et al. [14], people may vary in their contextual reactivity across different traits. That is, Geukes et al. found that the amount of within-person variability was higher for neuroticism and openness. Here, we postulate that contextual reactivity may also depend on specific facets of a trait. For instance, a person may only show high contextual reactivity to socializing facets of extraversion, but not to other facets. In addition, it is also possible that a person only shows high contextual reactivity for specific aspects within a facet. For example, the socializing behavior of a person who is very social may be highly affected by contextual cues specific to a work environment, but not by contextual cues specific to leisure activities. This idea is also supported by the empirical work on the distance-difficulty hypothesis (discussed above) in which it has been shown that the In addition, as the decision follows a different process for high and low contextual reactivity, the measurement parameters of the responses will be different (due to a different rate of information accumulation, and possibly, a different distance between the boundaries). Note that this is related to the notion of measurement invariance (e.g., [35][36][37]) which states that different response processes will result in different measurement parameters. Moreover, the idea above is in line with the distance-difficulty hypothesis from personality research [17,25,38,39], in which it has been shown that stable personality scores over time (i.e., small within-person variance and thus low contextual reactive) are associated with more decision certainty and faster response times than scores that show more variability over time.
As was shown by Geukes et al. [14], people may vary in their contextual reactivity across different traits. That is, Geukes et al. found that the amount of within-person variability was higher for neuroticism and openness. Here, we postulate that contextual reactivity may also depend on specific facets of a trait. For instance, a person may only show high contextual reactivity to socializing facets of extraversion, but not to other facets. In addition, it is also possible that a person only shows high contextual reactivity for specific aspects within a facet. For example, the socializing behavior of a person who is very social may be highly affected by contextual cues specific to a work environment, but not by contextual cues specific to leisure activities. This idea is also supported by the empirical work on the distance-difficulty hypothesis (discussed above) in which it has been shown that the Psych 2020, 2 256 stability of a response is highly dependent on the person (i.e., the specific position on the trait) and the item (i.e., the specific aspect of the trait/facet; e.g., [40]).

Practical Framework
Taking the above information together, the cross-sectional definition of contextual reactivity implies that high-contextual reactive responses are characterized by larger response times and by different measurement properties as compared to low-contextual reactivity responses. However, using the raw response times to make inferences about response specific contextual reactivity is challenging, as the raw response times are confounded by the item and person effects. That is, some people are on average slower than others and some items require more time than others. To make inferences about contextual reactivity, it is therefore important to distinguish between these effects and the effects of contextual reactivity on the response times.
To separate person effects, item effects, and the effects of contextual reactivity, we adopted a within-person mixture modeling approach for responses and response times [41][42][43]. In this approach, a mixture of two latent states is assumed to underlie the responses and response times of each item of a personality inventory. Each response was classified into a high contextual reactivity state or a low contextual reactivity state, based upon the response and the response time, taking the main person and main item effects into account.
Such a practical modeling framework for contextual reactivity is valuable as it provides researchers with statistical tools to enable inferences about within-subject differences in contextual reactivity while explicitly accounting for differences between persons and items. That is, the present approach will provide researchers with measures for contextual reactivity which can be used in empirical research to test hypotheses concerning personality. In addition, the approach will give insight into within-person trajectories concerning contextual reactivity underlying the responses to personality test items. Such trajectories cannot be studied using more conventional modeling approaches. In this paper, we will adopt this approach to illustrate the above advantages of distinguishing between high and low contextual reactivity. Specifically, we will apply the modeling approach to the responses and response times of the Revised Temperament and Character Inventory (TCI-R; [44]). Below, we first present the within-person mixture modeling approach and discuss how it can be used to investigate contextual reactivity, we then apply it to the TCI-R, and we end with a general discussion.

A Within-Person Mixture Modeling Account of Contextual Reactivity
To derive a suitable model for contextual reactivity, we first specify the hierarchical model of Van der Linden [45] as a baseline model to account for the main effects of the people and the items in responses and response times. To this end, the responses for person p = 1, . . . , N to item i = 1, . . . , n, denoted X pi , are regressed on a latent trait variable θ p resulting in item parameters, α i (discrimination parameter) and β ic (category attractiveness parameter; for c = 1, . . . , C) and person parameters, θ p (latent trait parameters), which, respectively, account for the main item effects and the main person effects. This regression can be any suitable mixed effects regression model. In this paper we use the generalized partial credit model, that is, In this model, the item effects α i and β ic account for differences in the measurement properties of the items. That is, the discrimination parameters α i account for the amount of variation in the trait that is captured in the item scores, and the category attractiveness parameters β ic account for some categories being used more often than other categories for a given item.
Similarly, the response times for person p on item i, T pi , are regressed on a latent speed factor resulting in item parameters, ν i (time intensities) and λ i (factor loadings), and person parameters τ p (latent speed parameters). A suitable regression model for the response times ideally takes into account the skewness of the response time distribution and its natural lower bound of zero. A practical solution is to log-transform the response times and to submit these to a linear model, i.e.,: εi is the residual variance. Note that all factor loadings are fixed to -1 in accordance with [45]. The negative sign ensures that the latent speed variable τ p retains its interpretation as a speed variable with larger log-response times corresponding to smaller levels on the speed variable τ p . In addition, the time intensity parameter ν i accounts for differences in the average log-response time to an item. This parameter is commonly referred to as the time intensity of an item. That is, some items require overall more time to be processed by the respondents, resulting in a larger mean response time for those items, for instance because the item consists of a relatively large text. To finalize the baseline model, the models for the responses and response times are connected by allowing a covariance σ θτ between the latent variables θ p and τ p . See left panel of Figure 3 for a graphical representation of this baseline model.
Psych 2020, 2, FOR PEER REVIEW 5 account the skewness of the response time distribution and its natural lower bound of zero. A practical solution is to log-transform the response times and to submit these to a linear model, i.e.,: where is the residual variance. Note that all factor loadings are fixed to -1 in accordance with [45]. The negative sign ensures that the latent speed variable retains its interpretation as a speed variable with larger log-response times corresponding to smaller levels on the speed variable . In addition, the time intensity parameter accounts for differences in the average log-response time to an item. This parameter is commonly referred to as the time intensity of an item. That is, some items require overall more time to be processed by the respondents, resulting in a larger mean response time for those items, for instance because the item consists of a relatively large text. To finalize the baseline model, the models for the responses and response times are connected by allowing a covariance between the latent variables and . See left panel of Figure 3 for a graphical representation of this baseline model. The baseline model above is a static model as it only models between-person differences. That is, latent variables and capture differences between persons in their trait and speed levels, respectively. As discussed, we are interested in modeling within-person differences as we expect within-person variance due to a person giving high-contextual reactive responses to some items and low-contextual reactive responses, which are characterized by smaller response times and different measurement properties, to others. To test this notion, we need to adapt the baseline model to account for these within-person differences. As discussed above, we will use a within-person mixture model [41]. Specifically, in the baseline model, we specify item-specific latent state variables, , with two states. The first state ( = 0) is the high-contextual reactivity (HC) state which is characterized by slower response times and the second state ( = 1) is the low contextual reactivity (LC) state which is characterized by faster response times, that is: where Δ is a shift parameter which models the difference in time intensity (i.e., mean log-response time) between a response from the LC-state as compared to the HC-state on item . To identify the LC-state to be the faster state, Δ is assumed to be larger than zero. Note that the latent state variable is item and person dependent, meaning that a person can be in state zero (HC) on item , but in state one (LC) on item + 1. In this paper, we do not consider the possibility of having state specific The baseline model above is a static model as it only models between-person differences. That is, latent variables θ p and τ p capture differences between persons in their trait and speed levels, respectively. As discussed, we are interested in modeling within-person differences as we expect within-person variance due to a person giving high-contextual reactive responses to some items and low-contextual reactive responses, which are characterized by smaller response times and different measurement properties, to others. To test this notion, we need to adapt the baseline model to account for these within-person differences. As discussed above, we will use a within-person mixture model [41]. Specifically, in the baseline model, we specify item-specific latent state variables, C pi , with two states. The first state (C pi = 0) is the high-contextual reactivity (HC) state which is characterized by slower response times and the second state (C pi = 1) is the low contextual reactivity (LC) state which is characterized by faster response times, that is: where ∆ ν i is a shift parameter which models the difference in time intensity (i.e., mean log-response time) between a response from the LC-state as compared to the HC-state on item i. To identify the Psych 2020, 2

258
LC-state to be the faster state, ∆ ν i is assumed to be larger than zero. Note that the latent state variable C pi is item and person dependent, meaning that a person p can be in state zero (HC) on item i, but in state one (LC) on item i + 1. In this paper, we do not consider the possibility of having state specific residual variances σ 2 εi (however we note that estimating state specific variances is possible in principle). As argued above, the HC-and LC-states will not only differ in their mean response times, but they will also differ in their measurement properties (as modelled by α i and β ic ) to reflect that both are distinct psychological variables. Therefore, in the model, these differences are accounted for by where ∆ α i and ∆ β ic are shift parameters which denote the difference in, respectively, the item discrimination and the category attractiveness between the LC-state and the HC-state. Note that ∆ α i can be interpreted as the parameter accounting for the interaction between C pi and θ p .
Besides parameters modeling the differences in discrimination, attractiveness, and time intensity between the two states of the within-person mixture model, the model contains a state size parameter which models the proportion of responses that is given from the LC-state π i . That is: e.g., a state size π i of 0.9 indicates that 90% of the responses to item i are given from the LC-state. See right panel of Figure 3 for a graphical representation of the within-person mixture model outlined above. Note that both the baseline and the mixture models are joint modeling approaches. That is, the models are fit to the responses and response times simultaneously (i.e., not in a two-stage or separate approach).

Data
The data that are used for this paper comprise responses and response times of the Revised Temperament and Character Inventory (TCI-R; [44]), collected through computerized testing. There were 1904 participants who completed the TCI-R. Participants that failed one of the five validity items were removed from the data (N = 100). Three participants were removed due to erroneous registration of negative response times. The data of the remaining 1801 were included in the analysis. The age of the participants ranged between 18 and 60.
The TCI-R consists of 240 personality items. Participants self-determine the degree to which they agree with the content of the item on a 5-point Likert scale, where "5" stands for "strongly agree". The items are divided over 6 domains and 29 subscales. For each scale the characteristics of the participants, as measured through the TCI-R, can be found in Table 1 [46]. In test administration, the item order was mixed across the domains and subscales of the TCI-R but was the same across the participants. The data can be requested from the second author. Revengefulness Compassion C5 Self-serving Pure conscience Self-Transcendence ST1 Self-conscious Self-forgetful ST2 Self-differentiation Transpersonal identification ST3 Rational materialism Spiritual acceptance

Categorizing Response Times
In the mixture model in the right panel of Figure 3, it is commonly assumed that the response times within each state are log-normally distributed (e.g., [41,43]). However, Molenaar et al. [47] showed that violations of this assumption can lead to spurious state detection (i.e., detecting two latent states where there is only one) and parameter bias. Molenaar et al. demonstrate that categorization of the continuous response times is a suitable solution for this problem, and leads to better performing models (i.e., no/less bias and no spurious state detection). Molenaar et al. found robust results for three, five, and seven response time categories. Within the current study, we therefore use 5 response time categories. We categorized the response time data following the recommendations by [48]. That is, we transformed the response times into categories using the cumulative probabilities within the standard normal distribution at −2, −2/3, 2/3, and 2. This implies cut-off values at the percentiles 2.28, 25.25, 74.75, and 97.73 of the observed response time distribution. This procedure is desirable as it results in relatively uniform information across the latent speed variable [48]. Note that due to this categorization approach, the within-person effects also become more robust to arbitrary scale properties. That is, it has been shown how traditional measures of within-person variability depend on arbitrary scale properties like the mean and the minimum/maximum of the scale (see e.g., [49]). As we use categorized response times and explicitly model each response time category (see below), these properties of the scale are taken into account and do not affect the modeling results (see [47]). Figure 4 contains the distribution of the raw and categorized response times for an example item. minimum/maximum of the scale (see e.g., [49]). As we use categorized response times and explicitly model each response time category (see below), these properties of the scale are taken into account and do not affect the modeling results (see [47]). Figure 4 contains the distribution of the raw and categorized response times for an example item.

Model Specification, Estimation, and Fit
The two models from Figure 3 were fit to each of the 29 TCI-R subscales separately. For the state size parameter π i , we assume homogeneity of the states over items which is a common assumption in within-person mixture modeling (e.g., [50][51][52]). For the present mixture model, this assumption implies that the state sizes are equal over the items. For the purposes of the present study, we think that invariant state sizes will capture the most important patterns in the data (i.e., the overall state size). However, relaxing this assumption is possible in principle but will result in a complex model that requires (very) large sample sizes. Congruently, we specify ∆ α i , ∆ β ic , and ∆ ν i to be item invariant, that is, ∆ α i = ∆ α , ∆ β ic =∆ β , and ∆ νi = ∆ ν although we note that it is possible to estimate item specific effects (see [41]). In addition, for the categorized response times, we used a partial credit model [53]. Note that the response time model does thus not include a discrimination parameter (to follow [45]) while the model for the responses X pi above does include a discrimination parameter (as the responses follow a generalized partial credit model).
Thus, the mixture model that we fit to the TCI-R data is given by: for the categorized responses times T pi and for the responses. Finally, the state sizes are given by: All models are specified in the statistical software program LatentGOLD [54]. The LatentGOLD scripts are available from www.dylanmolenaar.nl. To evaluate which model fits the data best, we consulted the Bayesian information criterion (BIC) and the Akaike information criterion (AIC). These measures are comparative fit indices, where lower values represent better model fit. Molenaar et al. [47] studied the true and false positive rates for these fit indices in selecting among similar mixture models adopted here. It was found that BIC performs acceptably and AIC is associated with an increased false positive rate in some situations. Thus, if considered together and the AIC and BIC both agree on which model is the best fitting model, this result can be trusted. Table 2 contains the AIC and BIC fit indices of the baseline and the mixture model for the subscales of the TCI-R. As can be seen, the baseline model is rejected for all subscales in favor of the mixture model. Table 2 also includes the estimated correlation between the latent trait and latent speed variables in the two models for each scale. As can be seen the correlations differ only minorly with 20 out of the 29 scales showing a slightly larger absolute correlation in the baseline model as compared to the mixture model. This is to be expected as the main difference between the models is how the residuals are modeled. That is, in the baseline model the residuals are unmodeled, while in the mixture model, a two-states structure is imposed. The slight difference between the baseline model and the mixture model is due to the differences in measurement properties of the latent trait and latent speed variable across the states. Therefore, in the remainder, we focus on the results of the mixture model to see how the measurement properties differ across states. First, Table 3 contains the parameter estimates for the shift parameters and the state size for the LC-state for the mixture model. As discussed above, the shift parameters ∆ α , ∆ β , and ∆ ν represent the overall difference between the LC-state and the HC-state in the discrimination, attractiveness, and time intensity parameters. That is, a positive estimate of a shift parameter indicates that overall Psych 2020, 2 262 the corresponding parameter (e.g., attractiveness) is larger in the LC-state, while a negative estimate indicates that the corresponding parameter is overall smaller in the LC-state. Note that ∆ ν is restricted to be positive to identify the LC-state as the faster state.
For the shift parameters of the discrimination parameters we observe that for all subscales, the items are significantly more discriminative in the LC-state. This suggests that the relationship between the measured trait and the item-responses is stronger within the LC-state than within the HC-state. As is evident from the large estimates of ∆ α , the item discrimination parameters in the HC-state are small, indicating that the responses in this state are discriminate poorly between people with high and low trait values. This result is understandable as the responses in the HC-state depend highly on the context and therefore contain few systematic differences. For the attractiveness parameters we observe significant differences between the states for most of the subscales. However, the direction of the difference depends on the subscale considered. For the novelty seeking (NS), harm avoidance (HA), and reward dependance (RD) domains, half of the subscales show larger attractiveness parameters in the LC-state and the other half show larger attractiveness parameters in the HC-state. For the persistence (PS), self-directedness (SD), and cooperativeness (C) domains, all but one scale shows larger attractiveness in the LC-state as compared to the HC-state, and finally, for the self-transcendence (ST) domain, two subscales show larger attractiveness in the HC-state, and one scale shows larger attractiveness in the LC-state.
For all but two subscales, the LC-state is the largest state with state size between roughly 0.7 and 0.8 for most of the subscales indicating that most responses are in the LC-state. For the RD3 and C3 subscales, the LC-state size is small (respectively, 0.041 and 0.198). However, for these two subscales, the shift parameters for the time intensities are close to zero, indicating that there is hardly a difference in mean response time between the states. This complicates the interpretation of the results in terms of a HC-state and a LC-state for these two subscales.
To illustrate what individual within-persons trajectories in state membership underlying the responses to the TCI-R look like, we depict the posterior state probability estimates of the LC-state in Figure 5 for some example participants. As can be seen, state membership dependents highly on the person, the subscale, and the item. For instance, for participant one, the responses seem to be more fluctuating between HC and LC, while for participant two, most of the responses are within the LC-state. In addition, for participant three, the responses seem to be merely within the LC-state for the NS scales and more fluctuating between LC and HC for the PS scales.

Discussion
In this paper, we presented a cross-section account of contextual reactivity in personality, where larger response times are indicative of responses that are more subject to contextual reactivity. We applied this idea to 29 personality dimensions, measured by the TCI-R [44]. We found two latent states to underlie the responses and response times of the TCI-R. The two latent states differ in overall response time, item discrimination, and item attractiveness, suggesting that the responses within each state measure different psychological variables. We interpreted the latent state containing the faster responses as the high contextual reactivity state and the latent state containing the slower responses as LC-state We observed that for all but one of the personality subscales of the TCI-R, the items have larger discrimination parameters in the low contextual reactivity state. This is in line with the idea that smaller response times indicate more stable and more precise responses [17,25,38,39] and that they have smaller underlying within-person variance in personality [1]. For the attractiveness parameters we found no systematic differences between the low and high contextual reactivity states. That is, we found the attractiveness parameters to differ across the low and high contextual reactivity states, but the direction of the difference depends on the exact subscale. Thus, contextual reactivity does not systematically relate to the item scores as for some subscales, contextually reactive responses are associated with higher item scores, and for other subscales contextual reactive responses are associated with lower item scores. As a result, these findings reflect that contextual reactivity (i.e., amount of within-person variance in personality) is theoretically distinct from the trait level (i.e., within-person mean in personality), with no systematic difference between high and low contextual reactivity on the exact item scores.
In the theoretical derivation of the practical framework for contextual reactivity, we build upon the assumption that there is an explicit distinction between personality traits and states (e.g., [22]). It

Discussion
In this paper, we presented a cross-section account of contextual reactivity in personality, where larger response times are indicative of responses that are more subject to contextual reactivity. We applied this idea to 29 personality dimensions, measured by the TCI-R [44]. We found two latent states to underlie the responses and response times of the TCI-R. The two latent states differ in overall response time, item discrimination, and item attractiveness, suggesting that the responses within each state measure different psychological variables. We interpreted the latent state containing the faster responses as the high contextual reactivity state and the latent state containing the slower responses as LC-state.
We observed that for all but one of the personality subscales of the TCI-R, the items have larger discrimination parameters in the low contextual reactivity state. This is in line with the idea that smaller response times indicate more stable and more precise responses [17,25,38,39] and that they have smaller underlying within-person variance in personality [1]. For the attractiveness parameters we found no systematic differences between the low and high contextual reactivity states. That is, we found the attractiveness parameters to differ across the low and high contextual reactivity states, but the direction of the difference depends on the exact subscale. Thus, contextual reactivity does not systematically relate to the item scores as for some subscales, contextually reactive responses are associated with higher item scores, and for other subscales contextual reactive responses are associated with lower item scores. As a result, these findings reflect that contextual reactivity (i.e., amount of within-person variance in personality) is theoretically distinct from the trait level (i.e., within-person mean in personality), with no systematic difference between high and low contextual reactivity on the exact item scores.
In the theoretical derivation of the practical framework for contextual reactivity, we build upon the assumption that there is an explicit distinction between personality traits and states (e.g., [22]). It should be noted that such a distinction has been not been without criticism. For instance, Ellen and Potkay [55] argued that the distinction between trait and state is arbitrary for five reasons: (1) well established trait and state questionnaires (e.g., the Adjective Check List, or Profile Of Mood States) contain a considerable amount of overlapping items; (2) in research practice, a personality measure is "declared" to measure a trait or a state by the researcher without empirical evidence; (3) researchers are unable to indicate where the state stops and the trait starts; (4) in daily life, people hardly distinguish (verbally) between trait and state; and (5) research intended to measure a trait, is sometimes post-hoc reported as research into a state. The critique by Allen and Potkay has been refuted by Zuckerman [56] who pointed to empirical studies showing that questionnaires for traits and states differ in their psychometric properties in a theoretically expected way (e.g., the test-rest reliability for traits is generally much larger than for states) and that, as would theoretically be expected, state questionnaires are vulnerable to experimental manipulation, while trait questionnaires are not. In addition, Fridhandler [57] pointed out that the trait-state distinction consists of four aspects: (1) short-term versus long-term; (2) a continuous versus a reactive manifestation; (3) a concrete versus an abstract entity; and (4) the result of situational causality versus personal causality, while Ellen and Potkay solely focus on the former two. As a result, the trait-state distinction may seem arbitrary, but if focusing on all four aspects, it is not.
In the framework proposed in this paper, we have come up with a formal distinction between traits and states, that touches mainly upon aspects one and four by Fridhandler [57]. That is, in the prosed mixture model, we assume that short-term situational specific effects (C pi in the model; i.e., effects that underlie specific items but not all items) characterize states and that long-term person specific effects (θ p in the model; i.e., the effect that underlies the data on all items of a given person) characterize the trait. As our formal framework is questionnaire independent (i.e., the formal definitions for states and traits do not depend on the exact instruction of the questionnaire) we addressed critique one and two by Ellen and Potkay [55] above. In addition, as traits and states are formally defined by, respectively, a latent trait variable θ p and a latent state variable C pi , critique three is also addressed. That is, it is made explicit what patterns in the data correspond to trait effects θ p and what part corresponds to state effects C pi . A key issue for the validity of our proposed framework is whether our distinction into traits and states is in line with the theory. However, as we have argued in the introduction section, we see a basis for this.
The idea of using the response times to operationalize contextual reactivity is intended to aid the assessment of personality in general, and the assessment of contextual reactivity in particular. That is, as illustrated in the real data application, the framework we have presented in this paper can be used to obtain measures for contextual reactivity using cross-sectional data. These measures, that is, the posterior state membership of each response, can be used in research to predict behavior more accurately. Contextual reactivity may for instance be used to predict other psychological constructs, for example, psychopathology, or, contextual reactivity may serve as a predictor for psychological resilience. In addition, contextual reactivity may be predicted from other (personality) variables by adding covariates to the mixture models as presented in this paper. Extending the model in this way is straightforward and may add to our understanding of (the distinction between) within-person variance, contextual reactivity, and personality traits in general and their relation to other variables. Either way, for these hypotheses to be tested, the assessment of contextual reactivity should be made easier, more comprehensible and less time consuming. Our proposed approach is a step forwards in this direction.
In our modeling approach, we took the hierarchical modeling framework of Van der Linden [45] as a point of departure. However, there are many other approaches to analyze responses and response times, see [58] for a recent overview. Our choice was mainly a pragmatic one: in the hierarchical model, item and person effects on responses and response times are separated in a way that is characteristic for item response theory. As a result, it is relatively straightforward to account for item and person effects to extract effects due to different item-specific latent states, which was the main aim of the present study. However, the ideas put forward in this paper are certainly amenable to other statistical modeling frameworks.
In psychometrics, most approaches to simultaneously analyze responses and response times have originated from ability measurements. As discussed for instance in [59], in ability measurement, there are (scientifically) two main reasons to add the response times to the analysis of the responses (see [60] for other, more practical reasons): (1) To increase the measurement accuracy of the latent ability underlying the responses and (2) to make inferences about differences in response processes underlying the test. In the case of personality measurement, the increase in measurement accuracy using the response times is not likely to be large as the response-response time correlations are commonly (much) lower in personality assessment as compared to ability measurement. However, response times can still be valuable to make inferences about differences in the response process underlying personality tests. For instance, response times have been used to detect faking on personality questionnaires [61] and differences in self-schema [62]. Similarly, in this paper, we have used response times to enable inferences about differences in low and high contextual reactivity. Using the responses only, these inferences are challenging. Therefore, we think that response times can also be a valuable source of information in personality assessment.
In this paper, we focused on two models, a between-person baseline model, and a mixture model with both between-and within-person effects. The issue of whether to use the baseline model or the mixture model depends on the research question. That is, the aim of this paper was to present a modeling framework which enables inferences about within-person differences in personality. If a research question is not about within-person differences in personality, the researcher is free to use a between-person modeling approach. However, the presence of within-subject differences in the data may bias the between-person differences. The biasing effects of neglecting within-person differences due to a mixture of latent states is an interesting topic on its own but it was not the focus of the present paper.
There are some limitations to the present study, that should be taken into consideration. First, we have not explicitly manipulated the context of the items. Ideally, our cross-sectional approach to assess contextual reactivity is validated by comparing the results of this approach to the results of repeated measurements. For this, one could think of a design where personality states are examined over time, and where states are measured for different contexts (e.g., experience sampling). For each person, the observed within-person variance within a specific state can then be compared to the classifications of the related responses that are measured in a cross-sectional personality inventory.
Second, we considered only two different response states to be underlying the responses. This does not, however, imply that contextual reactivity truly is a dichotomous construct. Contextual reactivity may have more than two categories or there could be a continuous latent construct underlying the responses. Here, we pragmatically use two states to facilitate interpretation (i.e., comparing a high level of contextual reactivity with a low level). We think that only two levels of contextual reactivity constitute a good interpretable approximation to contextual reactivity as a continuous variable. That is, we think that with two states, we capture the most important patterns in the data (e.g., that discrimination parameters are smaller for higher levels of contextual reactivity). Using more than two categories, or using a continuous operationalization of contextual reactivity is possible (see e.g., [63,64] for a viable approach), but these models require large sample sizes and may be more challenging to interpret.