Inferring Cognitive Abilities from Response Times to Web-Administered Survey Items in a Population-Representative Sample

Monitoring of cognitive abilities in large-scale survey research is receiving increasing attention. Conventional cognitive testing, however, is often impractical on a population level highlighting the need for alternative means of cognitive assessment. We evaluated whether response times (RTs) to online survey items could be useful to infer cognitive abilities. We analyzed >5 million survey item RTs from >6000 individuals administered over 6.5 years in an internet panel together with cognitive tests (numerical reasoning, verbal reasoning, task switching/inhibitory control). We derived measures of mean RT and intraindividual RT variability from a multilevel location-scale model as well as an expanded version that separated intraindividual RT variability into systematic RT adjustments (variation of RTs with item time intensities) and residual intraindividual RT variability (residual error in RTs). RT measures from the location-scale model showed weak associations with cognitive test scores. However, RT measures from the expanded model explained 22–26% of the variance in cognitive scores and had prospective associations with cognitive assessments over lag-periods of at least 6.5 years (mean RTs), 4.5 years (systematic RT adjustments) and 1 year (residual RT variability). Our findings suggest that RTs in online surveys may be useful for gaining information about cognitive abilities in large-scale survey research.


Introduction
The monitoring of cognitive abilities in large population surveys is receiving increasing attention in social science, epidemiology, and health policy research. It is widely recognized that cognitive abilities are powerful predictors of important life outcomes, including educational and work performance (Clark et al. 2010;Nye et al. 2022;Robertson et al. 2010;Wai et al. 2018), earnings and financial wellbeing (Furnham and Cheng 2016;Murnane et al. 2000), life satisfaction (Enkvist et al. 2013;St John and Montgomery 2010), health (Luciano et al. 2009;Stilley et al. 2010), successful aging (Castro-Lionard et al. 2011), and mortality (Batty et al. 2007;Duff et al. 2009). Large-scale monitoring of people's cognitive abilities allows for the investigation of protective and risk factors of cognitive impairment associated with aging (Finkel et al. 2003) and chronic medical conditions (Schagen et al. 2014).

Conventional and Alternative Approaches to Cognitive Testing
Cognitive assessments have traditionally been conducted face-to-face in clinic settings using standardized tests and trained health professionals who oversee their administration (Woodford and George 2007). In this setting, cognitive testing is not routinely and infrequently conducted, which has limited opportunities for tracking intra-individual changes in cognitive abilities (Seelye et al. 2015). For pragmatic reasons, in population survey research cognitive functioning tests have often been conducted over the telephone, especially when in-person assessments were not viable (Lachman et al. 2014;Langa et al. 2009). With the advancement of online data collection, efforts have increasingly been underway to conduct cognitive testing over the internet (Bissig et al. 2020;Feenstra et al. 2018;Tsoy et al. 2021). All of these modes of administration, however, have been shown to be costly, time-intensive, and/or burdensome to respondents.
Recently, alternative approaches to infer individuals' cognitive abilities from other behaviors have been proposed that do not require the use of cognitive tests and that can overcome some of their practical limitations. The core assumption of these approaches is that unobtrusive monitoring of people's day-to-day behaviors during routine but cognitively challenging activities can provide pertinent and ecologically valid information about an individual's cognitive status and change over time. For example, Kaye and colleagues examined the utility of a home-installed activity assessment system consisting of different types of sensors that were installed in older people's homes and computers that were provided to participants for daily use (Kaye et al. 2011). The activity system was installed for 33 months, on average. The derived metrics included computer usage, time spent outside of the home, walking speed, and overall daily activity. Annual assessments included physical examinations, neuropsychological testing, and questions about health and functioning. Results showed that passive monitoring can give insight into functioning and performance difficulties in near real-time. The study further demonstrated the feasibility of implementing the technology and engaging older adults in its use. Similarly, passively monitored computer mouse movement patterns during routine home computer use have been shown to be sensitive to detecting mild cognitive impairment (MCI) in older adults (Seelye et al. 2015). Older adults who were cognitively intact or had MCI participated in a longitudinal study that examined in-home monitoring technology. Computer mouse movement patterns during a week of routine at-home computer use were derived. Metrics included total mouse moves, movements with greater variability and less efficiency, and movements with longer pauses. All of these metrics were significantly associated with MCI. The results of both of these studies demonstrate the potential of unobtrusive monitoring of routine but cognitively challenging activities.

The Role of Response Times for Measuring Cognitive Abilities
In the present paper, we examine whether participants' response times (RTs) to questions in online surveys can be used to infer their cognitive abilities. The assessment of RTs has a long history in cognitive testing. For example, RTs are routinely used to measure perceptual speed in standardized cognitive tests, and distributional characteristics of RTs in laboratory-based response latency tasks have been used to measure higher-order cognitive abilities (Kyllonen and Zu 2016). It is important to acknowledge that an individual's RT does not necessarily have a uniform relationship with cognitive abilities but instead that RTs interact with a respondent's cognitive skill level and the complexity of the item or task (Goldhammer et al. 2014;Hunt 1980). For tasks that involve more automated, lower-order cognitive processes some research has found a positive relationship between RTs and a respondent's skill level. In contrast, higher-order tasks that require controlled cognitive processes have shown an inverse relationship between RTs and respondent skill level, suggesting a calibration of response latency with item difficulty (Dodonova and Dodonov 2013;Goldhammer et al. 2014;Naumann and Goldhammer 2017;Naumann 2019). This calibration holds across a range of cognitive tasks and supports a dual process model of cognition (Coomans et al. 2016;Evans and Stanovich 2013). The dual process model suggests a conceptual dichotomy of processes that are either automatic, rapid, and unconscious, or controlled, slow, and conscious. Both automatic and controlled processes are used in a range of tasks including reasoning, judgment, and social decision making (Evans and Stanovich 2013). For example, Dodonova and Dodonov (2013) assessed RTs and response accuracy during a cognitive task that consisted of items with changing difficulty levels. They examined changes in the relationship between respondents' RT-accuracy, accuracyability, and RT-ability as a function of increasing task difficulty. Their results showed that overall respondents with greater cognitive ability had faster RTs and higher accuracy rates compared to respondents with lower cognitive abilities. With increasing item difficulty, the accuracy-ability relationship strengthened, whereas the speed-ability relationship tended to weaken. In another example, Goldhammer et al. (2014) examined whether the "time on task effect" in computer-based reading and problem-solving tasks is moderated by respondent skill level and task difficulty. Results showed that the time on task effect was positive and amplified with greater task difficulty for problem solving tasks. For reading tasks, the opposite results were found. In addition, the positive time on task effect lessened with greater respondent skill level for problem-solving tasks, whereas the negative time on task effect amplified for respondents with greater skill level for reading tasks. In sum, these studies demonstrate that the relationship between RTs and a respondent's cognitive skills is complex and may differ depending upon the complexity of the task (requiring more controlled versus routine cognitive processing).
To date, very limited research has examined the possibility that RTs captured as a byproduct of online survey responses could be useful to infer people's cognitive abilities. We set our focus on RTs in online surveys because web-based data collection has become a mainstay of large-scale survey research opening the door to innovative ways to approach the measurement of cognitive abilities apart from standardized cognitive testing. Response latencies to questionnaire items are routinely collected in most online surveys and are already a standard feature of many web-based data collection platforms making them a cost-effective and readily available source of paradata in online studies.

Response Times and Cognitive Abilities in Survey Research
To date, the use of RTs in survey research has focused on evaluating the quality of survey questions (Yan and Tourangeau 2008), identifying which items are effortful and which are not (Lenzner et al. 2010), detecting survey satisficing and participants engaging in careless responding (Meade and Craig 2012;Schneider et al. 2018), and studying survey fatigue during lengthy questionnaire assessments (Galesic and Bosnjak 2009). An important yet understudied topic in survey research is the question of whether completing an online survey may be a particularly good venue for assessing cognitive function. RTs to items in surveys are indeterminate as to whether responses are objectively accurate, unlike those in many cognitive tasks, such as simple or complex RT, memory or executive attention, reading comprehension or information search on the internet. Nevertheless, surveys can be complex and cognitively demanding tasks and there are clear individual differences in response latency to survey items (Park and Schwarz 2012;Tourangeau et al. 2000). Moreover, survey items are often heterogeneous in contents and demands such that respondents need to adjust their attentional focus and adapt their responses as they navigate through different sets of questions. Thus, responding to survey items arguably requires many different cognitive processes, some of which are automatic, but others of which involve higherlevel cognitive abilities including attentional control, planning, organization, and mental flexibility. For this reason, we hypothesize that RTs in online surveys might be particularly well suited for inferring respondents' higher order cognitive abilities.

The Present Study
The goal of the present study was to evaluate whether it is possible to glean information about people's inductive reasoning skills (quantitative and verbal) and task switching/inhibitory control from their response times to survey questions in a nationally representative online panel study. In order to capture a range of information relevant to higher-order cognitive functioning, we examined multiple variance components inherent in question RTs in the completion of multi-item surveys, including a person's mean RT and patterns of intraindividual RT variability.
We expected that faster mean RTs for survey items would be associated with greater inductive reasoning skills and task switching/inhibitory control (Schmiedek et al. 2007). Furthermore, we carefully considered the role of intraindividual variability in RTs to survey items. Greater intraindividual RT variability in standardized reaction time tasks has been associated with less efficient neural transmission and lower intelligence (Deary and Caryl 1997;Hanes and Schall 1996;Jensen 1992;Slifkin and Newell 1998). Accordingly, we expected that greater RT variability in survey item responses may also be associated with lower cognitive abilities. However, given the heterogeneous nature of survey items, it is also the case that greater RT variability in online surveys may in part indicate that respondents systematically adjust their RT to the demands of different survey items, which may reflect greater mental flexibility and greater executive functioning (Gehring et al. 1993;Holroyd and Coles 2002). We therefore speculated that two different components of intraindividual RT variability can be distinguished: one reflecting "systematic RT adjustments" (i.e., variability in RTs in response to variation in item demands) and one reflecting "residual RT variability" (i.e., spontaneous RT fluctuations that are not explained by systematic RT adjustments), both of which may be associated with cognitive abilities but in opposite directions.
A second goal was to evaluate prospective associations between RT components in survey item responses and people's cognitive abilities. Impairments in higher order cognitive functioning are early indicators of neurodegenerative disease and dementia (Gallassi et al. 2002;McKhann et al. 2011). A major advantage of RTs from longitudinal online surveys is that they are available repeatedly over time, potentially facilitating early detection of declines in cognitive abilities. We examined the maximal time lag for which the different RT components in survey item responses would allow for the longitudinal prediction of subsequent scores from standardized cognitive (inductive reasoning skills and task switching/inhibitory control) tests. If it were possible to gain information about a respondent's cognitive abilities from their RTs in web-based surveys, this would set the stage for larger scale monitoring of cognitive abilities in the general population in addition to standard cognitive testing.

Participants
The data analyzed were drawn from the Understanding America Study (UAS), a probability-based internet panel initiated in 2014 (Alattar et al. 2018). The panel is housed at the University of Southern California and currently has~10,000 adult panel members. In contrast to convenience (opt-in) panels, UAS panel members are recruited through nation-wide address-based sampling, which tends to reduce many biases in population parameters estimated from convenience panels where members self-select to participate (Yeager et al. 2011). UAS panelists without internet access are equipped with a tablet and broadband internet to achieve representativeness, given that internet access tends to be lower among older and less educated Americans (Couper et al. 2007). As is typical for large-scale internet panels, UAS respondents complete about 1-2 web-based surveys per month. Response rates are routinely high (75-95%), and attrition rates are modest (7-8% per year).

Survey and Item Selection
Survey items were drawn from 42 UAS surveys administered between 2014 and 2021 (administered on average in about 2-month intervals) on a wide variety of topics, including perceived wellbeing, retirement planning, financial decision making, personality, and health behaviors (for an overview of survey contents see https://uasdata.usc.edu/, accessed on 12 December 2022). Most surveys cover more than one topic. Since respondents entered the UAS at different times (the panel is still growing), the number of surveys for which RTs were available as paradata differed across respondents. For each respondent, only surveys administered before each of the formal cognitive tests (see below) were included, and the analysis included only UAS respondents who had at least 5 of the 42 surveys completed at the time of analysis.
Within each of the UAS surveys, items were eligible for the analysis regardless of their content, however, we specified several criteria for item inclusion: (1) items needed to be shown individually on a page because the RT timestamps recorded as paradata in the UAS were recorded per page; this excluded survey items presented together on the same screen in grid or matrix format; (2) open-ended questions were excluded; and (3) 75% of respondents or more needed to have completed an item to reduce potential selection biases for items involving skip patterns (item nonresponse tends to be low in the UAS, but skip patterns are relatively common). The mean number of items analyzed per survey was 28.34 (median = 25 items, SD = 14.92, range = 10 to 65 items); a total of 1173 survey items were included (for a sample of 50 survey items illustrating the heterogeneity of item contents, see Table S1 in the Supplementary).

Recording of Response Times
The UAS administers surveys using the NubiS data collection tool. NubiS creates HTML based question screens and sends these to the browser for display. Respondents provide their answers on their computers, tablets, or smartphones using navigational buttons to move through the survey. The NubiS tool uses Hypertext Preprocessor code to record RTs as the number of seconds spent on each question screen, defined as the moment from which NubiS sends a question screen to the browser to the moment it receives a signal that the respondent has exited the screen. RTs encompass respondents' reading and answering time and excludes any time on the server for processing answers and creating question screens. For each survey item, RTs were trimmed at the 99th percentile of respondents to eliminate extreme outliers (e.g., respondents stepping away from their computer) (Ratcliff 1993). RTs were log transformed as is customary to normalize the distribution of RT data (Thissen 1983). Henceforth, we refer to the log-transformed RTs as RTs for simplicity.

Quantitative Reasoning
The Number Series task was used to measure quantitative reasoning, a type of inductive reasoning skill that involves the ability to solve problems that depend upon mathematical relationships (Mather and Jaffe 2016). Respondents are presented a series of numbers with one number missing from the series (e.g., 4, 7, 10, ?). The task is to determine the numerical pattern in the series and to provide the missing number. The UAS administers Number Series items that had previously been implemented as self-administered online tests in the Cognition and Aging in the USA (CogUSA) study (McArdle et al. 2015), in 2-year intervals. Two parallel forms with 15 items each are rotated across biennial assessments to reduce practice effects. For participants who had completed the online task more than once, the last assessment completed was used in the present analyses. Items are scored using Item Response Theory (IRT) using Samejima's Graded Response Model (Samejima 1969), and test scores are scaled in T-scores, where 50 is the mean and 10 is the SD of a census-weighted sample of the general adult US population. Higher scores indicate better quantitative reasoning.

Verbal Reasoning
The Verbal Analogies task was used to measure verbal reasoning, an inductive reasoning skill involving the comprehension of concepts expressed through language (Mather and Jaffe 2016). In this task, respondents need to recognize a relationship between two words and successfully apply it to two other words (e.g., "Night" is to "Dark" as "Day" is to ?). The UAS also administers Verbal Analogies from the CogUSA (McArdle et al. 2015) study as online self-administered test, where two 15-item parallel forms are counterbalanced across occasions. For each participant, the last assessment was analyzed in the present study. The test is scored using IRT, with scores normed on a T-score metric (mean = 50 and SD = 10 in the general adult US population); higher scores indicate better verbal reasoning skills.

Task Switching/Inhibitory Control
We used the Stop-and-Go Switch task as a measure of task switching and inhibitory control (Lachman et al. 2014). Participants are presented with the word red or green and are asked to respond with either stop or go (i.e., for the color red respond with stop; for the color green respond with go). The test includes three conditions (baseline, reverse baseline, mixed conditions) that are administered sequentially and that include reversals of the instructions (i.e., stop for green; go for red). A switch trial is defined as the first response after a participant is asked to change from one instruction to another. A nonswitch trial does not involve a change in instructions. The Stop-and-Go Switch task was originally developed for telephone administration, implemented in the Midlife in the United States National Longitudinal Study (MIDUS) (Lachman et al. 2014;Tun and Lachman 2008). The UAS developed an adapted version for self-administered web administration that has been validated in prior research (Liu et al. 2022). Latencies were measured, in milliseconds lapsed, between the presentation of the cue and the correct response. Participants needed to have at least 70% correct trials to be scored, which was deemed an acceptable threshold to exclude respondents with invalid or careless answer behavior (Liu et al. 2022). The baseline conditions are administered to measure choice reaction time and were not examined here. Latencies in the mixed condition are considered an assessment of task switching and inhibitory control (Lachman et al. 2014). We followed the scoring procedures used in MIDUS: median latencies were first calculated for the switch and nonswitch trials of the mixed condition in order to eliminate the effects of outliers, and the average of the median latencies for switch and nonswitch trials was used as a measure of task switching/inhibitory control (Hughes et al. 2018;Lachman et al. 2014). For the present analyses, the median latencies were reverse scored such that higher scores on the variable indicate better functioning

Data Analysis
The analyses were conducted in multiple steps. First, we derived measures of RT components (mean and intraindividual variability in each person's RTs) from the logtransformed survey item RT data. Second, we examined associations between the derived RT measures and participants' cognitive abilities. The third step examined the temporal stability of their relationships between the RT measures and subsequent tests of cognitive abilities.

Step 1: Deriving Survey Item RT Component Measures
We used a multilevel structural equation modeling (MSEM) approach to calculate the RT component measures. MSEM accommodates the nested structure of the RT data, with RTs for multiple survey items nested within respondents, accounts for measurement error in observed RTs, and has proven useful for capturing intraindividual RT dynamics and quantitative differences in these dynamics between individuals (Hamaker and Wichers 2017;McNeish and Hamaker 2020). Two different models were estimated in an attempt to isolate relevant RT component measures from respondents' RT patterns.
Our first model was a so-called "location-scale" multilevel model, an extension of the traditional multilevel model that allows for random effects (i.e., individual differences) in means or intercepts (referred to as "location" in statistical terms) and in intraindividual variability (referred to as "scale") in the same model (Hedeker et al. 2012;McNeish and Hamaker 2020). The model captures respondents' average RT and amount of intraindividual variability in RTs as two latent variables. Based on prior research on RT variability in standardized (laboratory-based) reaction time tests, we hypothesized that slower average RTs and greater intraindividual RT variability would be associated with worse cognitive functioning abilities (Haynes et al. 2017;Rutter et al. 2020;Tam et al. 2015). The location-scale model can be described with the following multilevel equations: At Level 1 (the within-person level), the observed (log-transformed) response time RT for item i and participant j equals the sum of a person-specific mean RT α j and a residual deviation from that mean RT, r ij . The residual deviations are assumed to follow a normal distribution with mean 0 and variance σ j 2 . In contrast to traditional multilevel models, where the variance of these residuals is assumed to be the same for all individuals, the location-scale multilevel model allows this variance to differ between individuals (as indicated by the individual-specific subscript j). Level 2 (the between-person level) captures these individual differences as random effects (i.e., latent variables). The random effects in person-specific RT means u 0j and in the log of each person's intraindividual RT variance u 1j are assumed to follow a multivariate normal distribution with mean vector 0 and covariance matrix τ (McNeish and Hamaker 2020).
Our second model was an expanded version of the location-scale model, based on the assumption that different components of intraindividual RT variability can be distinguished from each other that may relate to cognitive abilities in opposite ways. Specifically, survey items vary considerably in difficulty and the cognitive demands associated with them (Schneider et al. forthcoming). In RT modeling, a concept analogous to item difficulty is the "time intensity" of an item, defined as "the amount of time an item tends to require" (Kyllonen and Zu 2016, p. 14). We speculated that one component of intraindividual RT variability consists of systematic adjustments whereby a person adjusts their RTs to the time intensity levels of the items. Such RT adjustments may reflect greater responsiveness to changing stimuli and greater mental flexibility, which are important aspects of fluid intelligence and executive functioning (Gehring et al. 1993;Holroyd and Coles 2002). This component may be distinguished from residual intraindividual RT variability that is unrelated to the time intensity of the items and might reflect attentional lapses and neural noise (Deary and Caryl 1997). Accordingly, the second model expanded the location-scale model by distinguishing systematic RT adjustments and residual RT variability as two between-person latent variables: At Level 1 of the expanded model, the observed (log-transformed) response time for item i and participant j is regressed on the time intensity (TI) for item i, such that RT ij equals the sum of a person-specific intercept α j , the item's time intensity multiplied by a person-specific slope parameter β j , and a residual r ij . Consistent with prior RT models (Kyllonen and Zu 2016;van der Linden 2006), we obtained the TIs of the survey items from a cross-classified multilevel model of the log RTs with crossed random effects on subjectand item-levels, where the item-level random effect indicates latent differences in TIs between items. Estimated TIs were saved and centered at 10 s (approximately the average TI across all items, see Section 3) when entered in the expanded location-scale model. This means that the intercept of the expanded model captures the person's predicted RT for an item with a TI of 10 s (which we refer to as a person's mean RT thereafter), the slope captures the predicted increase and decrease in the person's RT for items with higher and lower TIs (which we refer to as "systematic RT adjustments"), and the residual captures the deviations of the observed RT from the predicted RT for each person and item (referred to as "residual RT variability"). At Level 2, random effects represent latent individual differences in mean RTs u 0j , in systematic RT adjustments u 1j , and in residual RT variability u 2j . All multilevel models were estimated in Mplus version 8.8 (Muthén and Muthén 2017) using Bayesian parameter estimation with software default diffuse priors. A graphical representation of the MSEM approach to estimating the latent variables involved in the expanded version of the location-scale model is shown in Figure S1 (for Mplus code, see Figure S2 in the Supplementary).

Step 2: Associations between RT Component Measures and Cognitive Abilities
The RT component measures described above were estimated based on all survey response times from all included UAS surveys available for each person prior to each of the cognitive assessments. To examine the relationships between the RT component measures and respondents' cognitive abilities, we examined bivariate correlations and performed multiple regressions in which the RT component measures served as multiple predictor variables and a cognitive functioning variable (quantitative reasoning, verbal reasoning, and task switching/inhibitory control, in separate models) served as dependent variable. Additional multiple regressions controlled for demographics of age, gender, race, ethnicity, education, and income, entered as covariates. We also explored whether the relationships between the RT component measures and cognitive abilities differed between younger (less than 40 years of age) and relatively older (40 years or older) participants using moderated regression models with age as a moderator. Separate regression models were estimated for the RT measures derived from the location-scale model and from the expanded location-scale model.

Step 3: Stability of Lagged Relationships with Cognitive Abilities
To examine the temporal stability of prospective (i.e., lagged) relationships between the RT component measures and cognitive functioning outcomes, we first determined the time intervals between the administration of each of the 42 surveys and the time point of cognitive testing (i.e., lag times) for each participant. Next, we estimated the (expanded) multilevel location-scale model separately for each of the 42 surveys and created an average score of each RT component measure for every half-year interval before the cognitive test (i.e., 0-0.5 years, >0.5-1 years, and so on, up to >6-6.5 years before the test). We then estimated the lagged associations between the RT component measures and each specific cognitive test for increasingly longer lag times, using bivariate correlations and multiple regression models (entering the RT components in combination as predictors of each cognitive test). To compare the regression coefficients of the RT components across the different lag times, we used a "lag as moderator" approach (Selig et al. 2012) whereby the RT components for all lag time periods were used as time-varying predictors of the cognitive scores and the RT component by lag time (used as categorical variable) interactions were tested. Regression analyses were conducted using the SURVEYREG procedure in SAS 9.4 (Cary, NC, USA) with cluster-robust standard errors.
The maximum time lag for which the RT components were significantly associated with cognitive test scores was determined by inspecting the longest consecutive lag for which the 95% confidence interval of the regression coefficient did not include 0. To quantify the magnitude of relationships between the RT component measures and cognitive test scores, we considered correlations and standardized regression coefficients of .10, .30, and .50 as small, medium, and large effects, respectively (Cohen 1988).

Descriptive Characteristics
The demographic characteristics of the participant sample are presented in Table 1. The sample composition differed somewhat across analyses for the three cognitive tests because UAS respondents joined the panel and completed the cognitive assessments at various time points. Across all three samples, the majority of respondents were between the ages 18 and 54 years, and about one third were between the ages of 55 and 74 years. The racial and ethnic composition of respondents was largely representative of the general US population. Slightly more than half of the respondents were female. With regard to educational attainment, respondents with high school graduation or less comprised about a quarter of the sample, respondents with some college education comprised about one third of the sample, and respondents with a college degree comprised almost half of the sample. For annual household income, slightly less than half of the respondents had incomes up to $49,999, about one quarter had incomes up to $99,999, and about a quarter had incomes of $100,000 or more. The number of UAS surveys and items included in the analyses also differed by cognitive test. For number series, the average number of surveys per respondent was 19.76 (SD = 7.48, range = 5 to 36 surveys) and the average number of items per respondent was 548.10 (SD = 194.25, range = 84 to 1109 items). For verbal analogies, the average number of surveys was 19.57 (SD = 8.09, range = 5 to 37) and the average number of items was 549.26 (SD = 204.96, range = 85 to 1109 items). For the stop-and-go switch task, the average number of surveys was 21.51 (SD = 9.25, range = 5 to 38) and the average number of items was 572.10 (SD = 247.96, range = 85 to 1064 items). In total, 5,004,187 survey item RTs were analyzed for number series, 5,052,106 item RTs for verbal analogies, and 4,362,812 item RTs for the stop-and-go switch task.
The mean log RT across all participants and survey items was 2.33 log seconds (SD = 0.79, range = .00 to 6.38 log seconds) for analyses involving the number series test, 2.33 log seconds (SD = 0.79, range = .00-6.38 log seconds) for verbal analogies, and 2.32 log seconds (SD = 0.78, range = .00-6.38 log seconds) for the stop-and-go switch task. Item TIs (i.e., the expected item-level RTs) were calculated from all available RTs and are therefore not specific to each cognitive test. The distribution of the item TIs is shown in Figure 1 (in log seconds and back-transformed median seconds per item). The mean of the items' TIs was 2.29 log seconds (11.53 s when back-transformed), with a range of 1.14 to 4.04 log seconds (3.14 to 57.00 s when back-transformed). Scores from the three cognitive tests showed moderate to large positive intercorrelations. Number Series and Verbal Analogies tests correlated at r = .64 (p < .001). Stop-and-Go task scores correlated r = .22 (p < .001) with Number Series and r = .23 (p < .001) with Verbal Analogies scores.

Prediction of Cognition from RT Components Derived from the Location-Scale Model
The two RT components derived from the location-scale model were each participant's mean RT and each participant's RT variability (whereby intraindividual RT variability was not decomposed into subcomponents) across survey items. The two RT components were weakly correlated with each other (r = −.03, p = .01, for the Number Series sample, r = −.03, p = .004, for the Verbal Analogies sample, r = −.04, p = .002, for the Stopand-Go Switch task sample). Table 2 shows the correlations between the RT components and cognitive tests, and results from multiple regressions in which the RT components served as multiple predictors of each cognitive test. As expected, higher mean RTs were significantly negatively associated with each of the cognitive tests, indicating that people with slower mean RTs had lower cognitive scores. Standardized regression coefficients were small in magnitude for Number Series (β = −.06) and Verbal Analogies (β = −.16), and medium to large for the Stop-and-Go Switch task (β = −.41). Contrary to our expectation, greater intraindividual variability in RTs was significantly positively associated with each of the cognitive tests Scores from the three cognitive tests showed moderate to large positive intercorrelations. Number Series and Verbal Analogies tests correlated at r = .64 (p < .001). Stop-and-Go task scores correlated r = .22 (p < .001) with Number Series and r = .23 (p < .001) with Verbal Analogies scores.

Prediction of Cognition from RT Components Derived from the Location-Scale Model
The two RT components derived from the location-scale model were each participant's mean RT and each participant's RT variability (whereby intraindividual RT variability was not decomposed into subcomponents) across survey items. The two RT components were weakly correlated with each other (r = −.03, p = .01, for the Number Series sample, r = −.03, p = .004, for the Verbal Analogies sample, r = −.04, p = .002, for the Stop-and-Go Switch task sample). Table 2 shows the correlations between the RT components and cognitive tests, and results from multiple regressions in which the RT components served as multiple predictors of each cognitive test. As expected, higher mean RTs were significantly negatively associated with each of the cognitive tests, indicating that people with slower mean RTs had lower cognitive scores. Standardized regression coefficients were small in magnitude for Number Series (β = −.06) and Verbal Analogies (β = −.16), and medium to large for the Stop-and-Go Switch task (β = −.41). Contrary to our expectation, greater intraindividual variability in RTs was significantly positively associated with each of the cognitive tests with small effect sizes (regression coefficients ranging from β = .14 for Verbal Analogies and the Stop-and-Go Switch task to β = .19 for Number Series, ps < .001), indicating that more variable RTs were predictive of higher cognitive scores. The two RT components in combination explained 4% (Number Series), 5% (Verbal Analogies) and 19% (Stop-and-Go Switch task) of the variance in the cognitive test scores. The pattern of results remained similar after demographic covariates were controlled; however, the relationship between mean RTs and Number Series scores became nonsignificant (see Table 2). As shown in Table 3, the relationships between a person's mean RTs and the cognitive test scores were significantly stronger (i.e., more negative) for older (40+ years of age) compared to younger (less than 40 years of age) participants. Specifically, for Number Series and Verbal Analogies, the negative relationships between mean RTs and cognitive test scores were only evident among older participants (β = −.17 and −.32, respectively) but were nonsignificant among younger participants (β = .03 and .01, respectively). For the Stop-and-Go Switch task, even though mean RTs were significantly negatively associated with Stop-and-Go Switch task scores in both age groups, the association was significantly more pronounced at older ages (β = −.48) compared to younger ages (β = −.25). Relationships between RT variability and cognitive test scores did not show pronounced age differences (for Number Series, the association was significantly more pronounced among older compared to younger participants, p = .01, but the difference in standardized regression coefficients was small; Table 3).

Prediction of Cognition from RT Components Derived from the Expanded Location-Scale Model
In the expanded location scale model, in addition to deriving an estimate of respondents' mean RT, intraindividual RT variability was decomposed into subcomponents where one component represented the variation in RT with variation in item TIs ("systematic RT adjustments") and the second component reflected variation in RTs that were unrelated to variation in the TIs of the items ("residual RT variability"). The three RT components were moderately intercorrelated: mean RTs and systematic RT adjustments were positively correlated at r = .29 (for the Verbal Analogies sample) to r = .38 (for the Stop-and-Go Switch task sample); mean RTs and residual RT variability were negatively correlated at r = −.27 (for the Number Series sample) to r = −.31 (for the Stop-and-Go Switch task sample); systematic RT adjustments and residual RT variability were negatively intercorrelated at r = −.09 (for the Verbal Analogies sample) to r = −.11 (for the Stop-and-Go Switch task sample), all ps < .001. The modest size of these correlations among the RT components indicated no multicollinearity problems when entering them simultaneously as predictor variables in regression models. Table 4 shows the correlations between these RT components and the cognitive tests, as well as results from multiple regressions predicting each cognitive test from these RT components in combination. Slower mean RTs were significantly negatively associated with each of the cognitive tests with medium to large effect sizes (standardized regression coefficients ranging from β = −.28 for Number Series to β = −.48 for the Stop-and-Go Switch task, ps < .001). As hypothesized, distinguishing between systematic RT adjustments and residual RT variability yielded effects of intraindividual variability in opposite directions. More pronounced RT adjustments were significantly positively associated with each of the cognitive tests, with large effects for Number Series (β = .50, p < .001) and Verbal Analogies (β = .42, p < .001) and a small to medium effect for the Stop-and-Go Switch task (β = .22, p < .001). Greater residual RT variability was negatively associated with Number Series (β = −.15, p < .001) and Verbal Analogies (β = −.16, p < .001) scores, with small effects in the expected direction; unexpectedly, greater residual RT variability was very weakly positively associated with performance on the Stop-and-Go Switch task (β = .04, p < .001). The three RT components in combination explained 26% (Number Series), 22% (Verbal Analogies) and 22% (Stop-and-Go Switch task) of the variance in the cognitive test scores.
The pattern of results remained similar after demographic covariates were controlled (see Table 4). However, the effect of the residual RT variability component on the Stopand-Go Switch task became very weakly negative (β = −.04, p < .001), consistent with the originally predicted direction of the effect. Note: Regression coefficients for models with demographic covariates are statistically controlled for age, gender, race, ethnicity, education, and income. a df = 3, 8768; b df = 9, 8733; c df = 3, 8943; d df = 9, 8912; e df = 3, 6505; f df = 9, 6463.
Moderator analyses by age showed that the relationships between mean RTs and the cognitive test scores were significantly more negative for older compared to younger participants. As shown in Table 5, the effects of mean RTs were between 1.5 and 2 times larger among older (βs ranging between −.30 and −.53) compared to younger (βs ranging between −.19 and −.25) participants. No significant age differences were evident for the relationships of the cognitive tests with people's systematic RT adjustments or with residual RT variability, respectively.

Stability of Lagged Relationships with Cognitive Abilities
We next present results for the lagged associations between RT component measures and the cognitive tests for increasingly longer lag times. Given that the RT components from the location-scale model produced only weak effects as shown above, we limit the presentation to results involving RT components from the expanded location-scale model (for results from the location-scale model, see Figures S3 and S4 in the Supplementary).
Lagged effects in half-year intervals before each cognitive test are shown in Tables 6-8 (with tests of overall model fit, main effects of RT components, and interactions by lag period) and graphically illustrated in Figure 2. Slower mean RTs prospectively predicted lower scores on each of the three cognitive tests over the full lag period of 6.5 years with small to medium effects; the magnitude of the associations significantly differed across lag periods (p < .001 for all mean RT by time-period interactions) but did not show clear monotonic trends for increasingly longer time lags for any cognitive test.   The effects of systematic RT adjustments significantly varied across time lags (ps < .001) with decreasing trends in the magnitude of associations with the cognitive tests. More pronounced RT adjustments predicted significantly better Number Series scores for a lag period of up to 6 years with consistently medium effects; a small statistically significant effect was evident for a lag period of 6.5 years. The effect of systematic RT adjust-  Note: a Overall model fit: F 51, 6508 = 28.23 (p < .001); mean RT: F 1, 6508 = 227.50 (p < .001); systematic RT adjustments: F 1, 6508 = 84.03 (p < .001); residual RT variability: F 1, 6508 = .00 (p = .99); main effect for time: F 12, 6508 = 2.60 (p = .002); mean RT by lag period interaction: F 12, 6508 = 6.73 (p < .001); systematic RT adjustments by lag period interaction: F 12, 6508 = 5.53 (p < .001); residual RT variability by lag period interaction: F 12, 6508 = 3.76 (p < .001).
The effects of systematic RT adjustments significantly varied across time lags (ps < .001) with decreasing trends in the magnitude of associations with the cognitive tests. More pronounced RT adjustments predicted significantly better Number Series scores for a lag period of up to 6 years with consistently medium effects; a small statistically significant effect was evident for a lag period of 6.5 years. The effect of systematic RT adjustments in predicting Verbal Analogies steadily decreased in magnitude from medium/large to small effects over the years, but remained significant for up to 6 years. Finally, systematic RT adjustments significantly predicted better scores on the Stop-and-Go Switch task, with generally small effects, for a lag period of up to 4.5 years.
Finally, the effects of residual RT variability were weak in magnitude and less temporally stable (ps < .001 for residual RT variability by time period interactions); greater residual RT variability predicted significantly worse scores on Number Series over a period of up to 3.5 years, on Verbal Analogies over 3 years, and on the Stop-and-Go Switch task over a period of 1 year, with small to very small effect sizes. The pattern of results remained similar with overall weaker effects of the mean RT component after demographic covariates were controlled (see Figure S5 in the Supplementary).

Summary of the Results
In sum, our results showed that the RT components derived from the initial locationscale model explained only very little proportions of the variance (between 4 and 5%), whereas the RT components derived from the expanded location-scale model explained between 22 and 26% of the variance in each of the three cognitive tests. Out of the three RT components considered in the expanded model, the strongest relationships with cognitive test scores were evident for respondents' mean RTs (especially among older respondents and for task switching/inhibitory control) and systematic RT adjustments (especially for inductive reasoning tests), whereas the residual RT variability showed weak relationships with cognitive tests. The RT components further demonstrated moderate stability in prospective associations with cognitive assessments.

Discussion
Surveys are ubiquitous in research and clinical practice. The usefulness of paradata from online surveys for inferring respondents' cognitive abilities has received little scientific attention. We examined whether response times (RTs) to survey items could be used to infer inductive reasoning skills and task switching/inhibitory control in a large probabilitybased longitudinal internet panel. Because little is known to date about which specific RT components in survey responses may be most relevant to cognitive functioning, we explored the utility of two different models to derive information about a person's mean RT and patterns of intraindividual RT variability.

Success of the Location-Scale Model RT Components
The RT components derived from the first model, a multilevel location-scale model that captures individual differences in mean RTs and in intraindividual RT variability, explained only little (4% to 5%) variance with small effect sizes in the inductive reasoning measures (numerical and verbal reasoning scores) and 19% of the variance (a medium to large combined effect) in task switching/inhibitory control scores. Contrary to our hypothesis, a greater amount of intraindividual variability in RTs across survey items showed small positive associations with higher cognitive abilities. This result may seem counterintuitive and surprising in light of a substantial body of research that has viewed greater intraindividual RT variability in elementary speed tasks as detrimental to the successful solving of complex intelligence test items (Jensen 1992;Joly-Burra et al. 2018;Rammsayer and Troche 2010;Schmiedek et al. 2007;Schulz-Zhecheva et al. 2016) and as associated with developmental cognitive decline (Haynes et al. 2017). However, it is also the case that, contrary to the expanded model discussed below, the "naïve" location-scale model did not distinguish between components of intraindividual variability that could either be attributable to inconsistencies in response speed (e.g., neural "noise", which has often been found to be related to lower cognitive abilities) (Haynes et al. 2017) and systematic adjustments of the speed of responding to variations in the task demands (e.g., switching strategies in accordance with the demands of different survey items, which is related to better cognitive abilities) (Hunt 1980).
In view of the small positive correlations between intraindividual RT variability and cognitive test scores, it is possible that the variability measure predominantly captured people's ability to adapt their response speed to differing task demands associated with the survey items. Arguably, however, conflating these two aspects of intraindividual RT variability in responses to survey items yields an ambiguous blend of RT components that relate to cognitive functioning in opposite directions. This highlights that results from experimental RT research based on simple choice reaction time tasks and other elementary speed tasks may not directly translate to RTs found in survey settings and that ignoring the differences between the tasks yields largely uninterpretable results. This conclusion further aligns with prior research that has shown that the relationship between RTs and a respondent's cognitive skills is not necessarily uniform but is instead a function of whether a task requires more controlled, higher-order cognitive processes (e.g., problem-solving) or more routine automated cognitive processes (e.g., reading) (Dodonova and Dodonov 2013;Goldhammer et al. 2014;Naumann and Goldhammer 2017;Naumann 2019).

Success of the Expanded Location-Scale Model RT Components
Our second model used an expanded version of the multilevel location-scale model in an attempt to explicitly acknowledge that survey items vary considerably in the cognitive demands-operationalized as time intensity (TI) differences-associated with them. We argued that an individual's item-to-item RT variability can be separated into two distinct components: (a) systematic RT adjustments whereby a person adjusts their response speed to the TI levels of the items, and (b) residual intraindividual RT variability that might be attributed to random noise in responding. This distinction is in line with Fiske and Rice's (1955) seminal early work that stressed the importance of considering different types of short-term fluctuations including Type III variability, which they defined as "variability in response with variation in the stimulus or in the situation" (p. 236) and Type I variability, defined as spontaneous variability that is not a response to stimulus variation.
As expected, the TIs of the items in the UAS surveys varied widely, from 3 s to about 60 s across items. When the item TIs were incorporated as predictors in the expanded multilevel location-scale model, the resulting RT component variances together explained between 24% and 26% variance in the inductive reasoning measures and 22% of the variance in task switching/inhibitory control, a considerable improvement over the RT components derived from the original location-scale model. While these represent large effect sizes by common conventions (Cohen 1988), the proportions of variances explained are of course not near values that would suggest that the RT components can be viewed as interchangeable with any of the three cognitive tests. As discussed next, however, all 3 RT components from the expanded location-scale model showed unique associations with the cognitive test scores in theoretically expected directions.
We found that slower mean RTs derived from the expanded model showed moderate negative associations with cognitive functioning that consistently exceeded those from the original location-scale model in magnitude. A likely reason for this is that the expanded model controlled for differences in TIs across surveys, that is, the mean RTs represented the person's RT for an item with a TI of 10 s. As such, the expanded model may have been better able to capture individual differences in average response speed under conditions that would have been expected if the survey items were homogeneous or interchangeable in time intensity (and perhaps in underlying cognitive demands). In prospective analyses, mean RTs derived from the expanded location-scale model remained robust over a time period of more than 6 years, suggesting that the mean RTs captured relatively stable, "trait-like" individual differences relating to higher-order cognitive abilities.
Although mean RTs showed expected relationships with all three cognitive tests, they showed the strongest relationship with the Stop-and-Go Switch task, both in the original as well as the expanded location scale model. This finding is noteworthy as it points to the differences in cognitive functions that were assessed with the three cognitive tasks. The Stop-and-Go Switch task, as a measure of task switching/inhibitory control, appears to capture somewhat lower-order and more automated cognitive functions compared to the quantitative and verbal analogies tests that assess inductive reasoning, a more complex task that requires more controlled higher-order cognitive functions. Prior research has shown a positive speed-accuracy relationship for lower-order abilities, such as reading speed and attention (Goldhammer et al. 2014;Naumann and Goldhammer 2017). This suggests that mean RTs might more strongly relate to more automated and routine cognitive functions and tasks. It should also be noted that out of the three cognitive tests, the Stop-and-Go Switch task was the only one that was itself based on RTs, whereas the Number Series and Verbal Analogies tests are based on the accuracy of responses.
The second RT component of the expanded location-scale model, labeled systematic RT adjustments, explicitly considered the extent to which individuals' RTs varied with the TI levels of the items. Greater systematic RT adjustments in response to changing TI levels were positively associated with greater cognitive abilities for all three tests. Among the different RT components, this component had the strongest associations with the inductive reasoning scores, with large effect sizes. These results are in line with prior research showing that the time spent on a given task might be moderated by the difficulty of the task and a respondent's skill level (Dodonova and Dodonov 2013;Goldhammer et al. 2014). Specifically, studies have shown a positive correlation between a respondent's skill level (determined through success on a task) and RTs for more complex tasks, such as reasoning (Goldhammer and Entink 2011;Klein Entink et al. 2009). In contrast, for tasks involving basic skills, such as reading, a negative correlation was observed (Richter et al. 2012). Our results of stronger associations with tests involving inductive reasoning compared to task switching/inhibitory control further corroborate these prior studies and suggest that systematic RT adjustments might be uniquely suited for better understanding higher-order, controlled cognitive processes where taking more time on a task or item is not only expected but might yield better (more accurate) results.
Moreover, we found that respondents' RT adjustment scores showed stable prospective relationships with subsequent cognitive functioning scores over 4 or more years, in line with prior literature suggesting that strategic adjustments in response caution are reliable person-characteristics that are replicable across time and tasks (Hedge et al. 2019).
The third RT component of the expanded location-scale model reflected a respondent's residual intraindividual RT variability not accounted for by systematic RT adjustments. In contrast to the original location-scale model, where greater intraindividual RT variability related to higher cognitive scores, higher levels of residual RT variability were associated with lower scores on the inductive reasoning tests (and with task switching/inhibitory control after controlling for demographic covariates), consistent with the idea that more random variation in thinking impedes the ability to solve complex intelligence tasks (Jensen 1992;Joly-Burra et al. 2018;Rammsayer and Troche 2010;Schmiedek et al. 2007;Schulz-Zhecheva et al. 2016). Theoretically, greater intraindividual RT variability represents transient fluctuations in behavioral performance, and has been linked with attentional lapses and fluctuating executive control (West et al. 2002). Neuroimaging research supports the idea that RT variability is an indicator of lower neurobiological integrity, including reduced white matter volume (Jackson et al. 2012) and increased white matter hyperintensity volume (Bunce et al. 2007). Lifespan cognitive research has further suggested that age-related dopamine depletion reduces the neural signal-to-noise ratio, such that more intermittent brain signaling leads to more behavioral variability and reductions in a wide range of cognitive abilities (MacDonald et al. 2012). Out of the three RT components from the expanded location-scale model, this component had the weakest associations with cognitive test scores, and the least durable prospective associations in lagged analyses. One possible explanation is that even after removing intraindividual RT variation in relation to differences in the items' TIs, the residual RT variability component did not consist of purely spontaneous variability (Type 1 variability in Fiske and Rice's terminology) in RTs.
Exploratory analyses examining age differences in the relationships between the RT components and cognitive test scores showed that slower mean RTs were more strongly associated with worse performance on all three cognitive tests for older compared to younger participants. Prior psychometric work has indicated that associations between performance across various cognitive and sensory tasks generally strengthen over the adult lifespan, suggesting that the structure of individuals' cognitive abilities becomes less differentiated in older age (Hülür et al. 2015;Li et al. 2004). While speculative, our finding that mean RTs in surveys relate more strongly to inductive reasoning and task switching/inhibitory control at older compared to younger ages is perhaps consistent with this dedifferentiation hypothesis of cognitive aging. Previous research has also shown that survey satisficing and insufficient effort responding are more prevalent at younger compared to older ages (Schneider et al. 2018); thus, it is also possible that insufficient effort responding (which can manifest in fast RTs in survey responding, Bowling et al. forthcoming) among younger participants diluted the relationships with cognitive test scores in this age group.

Implications for Research
To date, survey research has predominantly utilized RTs for evaluating the quality of survey questions (Yan and Tourangeau 2008), detecting careless responding and survey fatigue (Galesic and Bosnjak 2009;Meade and Craig 2012;Schneider et al. 2018), and studying attitude strength (Fazio 1990). The results of the present study have implications for cognitive functioning research in that the scope of RT applications could be broadened to aid in the monitoring of cognitive abilities in large population surveys. With the increase in web-based data collection, RTs are now routinely assessed and have become a readily available byproduct in most online studies. Our study using more than 5 million RTs from survey items in a nationally representative sample demonstrates the potential of harnessing this type of data and corroborates the feasibility of utilizing survey item RTs for intelligence research. Our longitudinal (lagged) results further illustrate potential implications for epidemiology and health policy research in that they suggest avenues for the prospective prediction of cognitive abilities in survey studies when standard cognitive tests are not available.
Our findings also have implications for web-based survey design. Even though survey researchers are well aware of the response burden inherent in completing lengthy questionnaires, relatively little research has examined the linkages between survey response burden and participants' level of cognitive ability in general population surveys. Our evidence that lower intellectual abilities are systematically related to longer RTs in survey items, strongly suggests that lower intellectual ability is associated with greater respondent burden as it will take people with lower cognitive abilities longer to complete a questionnaire. The cumulative effects of longer RTs may result in survey fatigue, especially in these individuals. Moreover, for survey development care should be taken when considering the nature and demands of survey items on the respondent. Surveys consisting of very heterogeneous sets of items place greater demand on respondents in that they need to adjust their attentional focus and adapt their responses as they navigate through different sets of questions. This might be particularly challenging for respondents with lower cognitive abilities and could potentially lead to greater attrition and missing data, particularly for repeated assessments of the same survey.

Limitations
Several study limitations should be noted. First, even though the cognitive tests examined in this study had been specifically developed for online administration (Liu et al. 2022;McArdle et al. 2015), they were self-administered and completed on the web at participants' own convenience, which reduces the level of standardization often seen in tests administered by trained professionals in precisely controlled test environments.
Similarly, participants completed the web-based surveys at their convenience, and the RTs recorded as paradata may be impacted by many environmental influences that were uncontrolled, including participants' current location, the time of day on which a survey was completed, differences in the device used when completing the questions, and momentary environmental distractions.
We also assumed that the contents of the particular surveys and the domain specific demands posed by an item did not affect the results. The extent to which a given question poses higher or lower domain specific demands may likely be individual-specific in that it will depend, among other things, upon the person's familiarity with the specific content domains tapped by the questions. That is, a respondent may find those questions particularly challenging to answer that tap issues and topics that they have limited experience with, compared to topics they are deeply familiar with. For example, individuals who follow politics closely may already have well-formed opinions about political issues, and therefore, they may respond quicker than individuals who do not follow politics closely. The same holds true for other topics. In this study, we have used items covering a broad range of topics, which should average out such interactions between item content and individual characteristics, mitigating this concern. In future research with more clearly defined sets of items, one potential research strategy could be to assess participants' familiarity with different item domains (e.g., in a separate set of self-report items) and to control RTs for participants' familiarity with content domains at the item-and person-level.
Furthermore, there are additional aspects of intraindividual RT variability that could potentially be isolated from survey RT paradata and were not examined here, such as parameters of descriptive (e.g., ex-Gaussian, shifted Wald) RT distributions (Matzke and Wagenmakers 2009;Schmiedek et al. 2007). Even though these parameters have been linked to intelligence (Schmiedek et al. 2007), they are arguably most meaningfully derived from relatively homogeneous reaction time tasks involving quick decision making, and we speculate that they may not be well suited for capturing cognitive processes in responses to heterogeneous questionnaire items. As another example, Joly-Burra et al. (2018) applied dynamic structural equation modeling to reaction times in a classical Go/NoGo task to measure "coherence" in a person's RT pattern (in addition to measures of mean RT and RT variability), operationalized as the autocorrelation of consecutive RTs (i.e., the extent to which momentary deviations from a person's mean RT carry over to the next item). The expanded location-scale model used in the present study could be further expanded to incorporate individual-specific autoregressive effects in RTs (i.e., to measure "coherence" in RTs). However, this model would benefit from having response times for consecutive survey items available, which was not the case in our study because RTs of items administered together on a page (presented in grid or matrix format) were not time-stamped separately as paradata and could not be included in the analyses.
Finally, it should be kept in mind that even though we used a wide range of survey items, the predictions may not generalize to all survey items and content domains, and they were derived from a single internet panel. Our results require replication using RT paradata in other online survey studies.

Conclusions
The present study found that RTs to online items in survey research have the potential to provide information about people's cognitive abilities, including inductive reasoning skills and task switching/inhibitory control. Even though the amount of variance shared between the RT components and cognitive test scores was nowhere near values that would suggest that RTs from surveys can be used as a replacement of formal cognitive testing, our findings demonstrate the utility of harnessing this type of readily available paradata for intelligence research and open the door to innovative ways to approach the measurement of cognitive abilities on a population level. Our longitudinal (lagged) results further illustrate that RTs in survey research may be useful for the prospective prediction of cognitive abilities. We found that the expanded location-scale model outperformed the original multilevel location-scale model in the prediction of cognitive abilities by separating item-to-item RT variability into two distinct components that capture a person's systematic RT adjustments to changing time intensities of the items and residual intraindividual RT variability addressing random noise in responding. Whereas the pool of survey items selected for the present study was heterogeneous in nature, future research could benefit from examining sets of survey items that differ in well-defined domains and cognitive demand characteristics; this would create additional opportunities for studying how item contents and demands may be differentially related to specific cognitive processing domains. We encourage future population and cognitive research to continue to investigate the multiple uses of survey item RTs and hope that the potential theoretical and empirical benefits of applying an expanded version of the location-scale model for understanding respondents' cognitive abilities will continue to be further explored.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/jintelligence11010003/s1, Table S1: A sample of 50 survey items in the Understanding America Study used for the measurement of survey item response times. Figure  S1: Path diagram of the multilevel structural equation model used to estimate the parameters of the expanded version of a "location-scale" multilevel model. Figure S2: Mplus code of the multilevel structural equation model to decompose the item response time data into three latent betweenperson components using the expanded version of the multilevel location scale model. Figure S3: Standardized regression coefficients for the prediction of cognitive test scores from time-lagged survey item response time (RT) components derived from the naive location-scale model. Figure  S4: Standardized regression coefficients for the prediction of cognitive test scores from time-lagged survey item response time (RT) components derived from the naive location-scale model. Regression coefficients control for age, gender, race, ethnicity, education, and income. Figure S5: Standardized regression coefficients for the prediction of cognitive test scores from time-lagged survey item response time (RT) components derived from the expanded location-scale model. Funding: This research was funded by the National Institute on Aging, grant numbers R01AG068190 and U01AG054580, and by the Social Security Administration.

Institutional Review Board Statement:
The study was conducted in accordance with the Declaration of Helsinki, and approved by the Institutional Review Board of the University of Southern California (UP-14-00148).