Patient-Reported Outcome Measures for Assessing Dual-Task Performance in Daily Life: A Review of Current Instruments, Use, and Measurement Properties

The patient perspective of dual-task (DT) impairment in real life is unclear. This review aimed (i) to identify patient-reported outcome measures (PROMs) on DT and evaluate their measurement properties and (ii) to investigate the usage of PROMs for the evaluation of DT difficulties. A systematic literature search was conducted using PubMed and Web of Science from inception to March 2022. Methodological quality was evaluated using the COSMIN checklist. Six studies examined the measurement properties of DT PROMs. Nine studies used DT PROMs as the outcome measure. Five PROMs were identified, including the Divided Attention Questionnaire (DAQ), Dual-Task-Impact on Daily-life Activities Questionnaire (DIDA-Q), a Questionnaire by Cock et al. (QOC), Dual-Tasking Questionnaire (DTQ), and Dual-Task Screening-List (DTSL). Fourteen measurement properties were documented: five (35.7%) rated quality as “sufficient”, six (42.8%) “insufficient”, and three (21.4%) “indeterminate”. The quality of evidence for each measurement property ranged from very low to high. While DT performance is investigated in many populations, the use of PROMs is still limited, although five instruments are available. Currently, due to insufficient data, it is not possible to recommend a specific DT PROM in a specific population. An exception is DIDA-Q, which has the highest quality of measurement properties in people with multiple sclerosis.


Introduction
Daily life activities generally require performing a secondary cognitive or motor task [i.e., dual-task (DT)], such as walking while talking or eating while listening. Recent studies showed that DT walking speed measured in the laboratory was lower than during single walking but similar to the most used walking speed in daily life [1,2]. Therefore, the measurement of DT walking performance has received much attention as it is thought to better reflect everyday life conditions and thereby provide an improved outcome for ecological validity.
An increasing amount of research has examined DT performance, using different cognitive tasks and various walking and balance tasks in many populations, predominantly in neurological diseases (e.g., Parkinson's disease, multiple sclerosis, stroke, mild cognitive impairment, dementia, traumatic brain injury), elderly, children, and healthy adults [3][4][5][6][7]. Studies generally show that when the motor and cognitive tasks are combined, it can lead to worse performance in one or both tasks, particularly in the elderly and people with neurological diseases [3,8,9]. The association between a decline in performance and aspects of daily life, such as an elevated risk of falling and a lower quality of life, led to increased interest in intervention strategies [6]. Furthermore, several previous studies examined the psychometric (also called measurement) properties of DT assessments to provide accurate measurement, with some reliable and valid outcome measures identified for use in research and clinical settings [10][11][12].
However, besides the objective measurement procedures of DT walking collected in highly controlled lab settings, there is limited knowledge about perceived dual-task difficulties in daily life. Everyday life is typically different from traditional lab conditions, with varying and unpredictable distractors. In addition to walking and thinking, a wide variety of different motor and cognitive DT activities are required in daily life. Thus, the generalizability of DT lab outcomes to be transferred to real-life situations may be questioned and is poorly understood. Patient-reported outcome measures (PROMs) measuring different daily living activities are likely to capture a wider dimension than only a standardized DT walking test. Therefore, PROMs focusing on DT performances are essential [13].
To the best of our knowledge, there is no systematic review that has summarized the measurement properties (reliability, validity, and responsiveness) of PROMs for DT walking difficulties. This information is critical for research and clinical work in various populations. Therefore, the purpose of our systematic review is to explore: (i) the measurement properties, methodological quality, and descriptive characteristics of PROMs specific to DT activities and (ii) the use of PROMs for evaluation (or screening) of DT difficulties.

Materials and Methods
This systematic review was performed according to the Consensus-based Standards for the Selection of Health Measurement Instruments (COSMIN) methodology for systematic reviews of measurement properties (2018) for systematic reviews [14]. It has been provided to perform an updated, appropriate methodology for a systematic psychometric review. Therefore, it allows the selection of the instruments for research or clinical practice and identifies gaps in knowledge on the quality of measurement properties. This review was reported following the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) guidelines [15]. The protocol was registered at the International Prospective Register of Systematic Reviews (PROSPERO Reference: CRD42022325230).

Eligibility Criteria
The inclusion criteria for study selection were: (1) developed patient-reported outcome measures (PROMs) to evaluate perceived DT difficulties in daily life and/or reporting at least one measurement property according to the COSMIN terminology and definitions or using PROMs as outcome measures or screening method (2) English language; (3) Full-text is available.
Conference proceedings, editorials, (systematic) reviews, meta-analyses, practice guidelines, letters, and animal studies were excluded from the study.

Search Strategy
A systematic search was performed using the MEDLINE and Web of Science databases on 23 March 2022 without date restrictions. The Medical Subject Headings terms (MeSHterms) and keywords were selected based on their relevance to the research question. By adding the Boolean operators AND and OR accordingly, the following complete search strategy was constructed: questionnaire AND dual task OR "cognitive motor interference" OR "divided attention". In addition, reference lists of all included studies were thoroughly examined to detect any other potentially eligible papers for inclusion.

Study Selection
Two reviewers (A.F. and C.V.G.) independently screened the articles by title and abstract. The reviewers extracted all potentially eligible articles from the title and abstract review and retrieved the full text. In cases where the full text was unavailable, a request was sent to the corresponding author. The two reviewers then discussed the findings and reached a consensus on the final articles to be included for further analysis. In case of inconsistency and/or disagreement, a third reviewer was consulted (Z.A.).

Data Extraction
The following data were extracted from each included study: basic characteristics of the study (authors, year of publication, etc.), details of the study design (sample size, aims, type of the study, etc.), participant characteristics (population used for the validation process, age, gender, etc.), details of the PROM (name, number of items, subscales, response options, scoring methodology, and range of scores), and measurement properties according to the COSMIN guideline. The main results of the included studies were also retrieved.
Additionally, for each study included in the final analysis, we evaluated the content validity, the internal structure (including structural validity, internal consistency, and crosscultural validity), reliability, measurement error, criterion validity, hypothesis testing for construct validity, and responsiveness when available. Definitions of these measurement properties and taxonomy can be found in the COSMIN manual for systematic reviews of PROMs [14].

Quality Assessment
The methodological quality of each eligible study was independently ranked by two researchers (A.F. and C.V.G.) utilizing the COSMIN Risk of Bias checklist. The quality of the methodology employed in each study to determine the measurement property was then independently graded on a four-point scale: very good, adequate, doubtful, and inadequate. The lowest rating of any standard within a box was used as the rating for that measurement property (worst score counts principle) [16].
The results from each study on a measurement property were assigned a quality rating as sufficient (+), insufficient (−), or indeterminate (?) [14].

Summary and Grading of the Quality of Evidence
This section refers to rating the quality of the PROM as a whole. PROMs were qualitatively summarized and assigned a four-point quality rating. A modified Grading of Recommendations Assessment, Development and Evaluation (GRADE) approach (omitting publication bias) was used to assign evidence quality as high, moderate, low, or very low [14].

Results
A total of 6864 records were identified, of which 4765 articles were screened on title and abstract following the removal of duplicates. Of those screened, 2577 full-text publications were assessed for eligibility, and six articles providing measurement properties were included in the outcome measure evaluation in the systematic review. Additionally, nine studies using PROMs as outcome measures or the screening method were included for documenting current use. Figure 1 demonstrates the PRISMA flow diagram. No metaanalysis was performed due to the heterogeneity of the outcomes and study designs of included studies.

Description of PROMs Assessing Dual-Task Difficulties
Five PROMs were identified: Divided Attention Questionnaire (DAQ), Dual-Task-Impact on Daily-life Activities Questionnaire (DIDA-Q), Questionnaire by Cock et al. [17] (QOC), Dual-Tasking Questionnaire (DTQ), and Dual-Task Screening-List (DTSL). The DTQ, DIDA-Q, QOC, and DTSL have been developed for persons with neurological diseases (traumatic brain injury and stroke, multiple sclerosis, acquired brain injury, and Parkinson's disease, respectively). The DAQ has been developed for older and young adults. The DIDA-Q was developed in Italian, and the DTSL in Dutch, but translated English versions have since been published [18,19]. While DTQ, DTSL, and QOC were developed and immediately used as outcome measures in planned research (intervention and observational studies), the authors mainly introduced a new instrument for DAQ and DIDA-Q. Table 1 provides a full explanation of the PROMs.

Description of PROMs Assessing Dual-Task Difficulties
Five PROMs were identified: Divided Attention Questionnaire (DAQ), Dual-Task-Impact on Daily-life Activities Questionnaire (DIDA-Q), Questionnaire by Cock et al. [17] (QOC), Dual-Tasking Questionnaire (DTQ), and Dual-Task Screening-List (DTSL). The DTQ, DIDA-Q, QOC, and DTSL have been developed for persons with neurological diseases (traumatic brain injury and stroke, multiple sclerosis, acquired brain injury, and Parkinson's disease, respectively). The DAQ has been developed for older and young adults. The DIDA-Q was developed in Italian, and the DTSL in Dutch, but translated English versions have since been published [18,19]. While DTQ, DTSL, and QOC were developed and immediately used as outcome measures in planned research (intervention and observational studies), the authors mainly introduced a new instrument for DAQ and DIDA-Q. Table 1 provides a full explanation of the PROMs.

Methodological Quality of the Included Studies on Measurement Properties
The methodological quality of included studies is presented in Table 2. A total of 22 measurement properties were evaluated in the included studies. Six measurement properties (27.3%) were rated as having "very good", five (22.7%) "adequate", eight (36.4%) "doubtful", and three (13.6%) "inadequate" methodology quality. The quality of the PROM development process for the DAQ (in young and older adults), DIDA-Q (in persons with multiple sclerosis), and QOC (in persons with acquired brain injury) is presented in Table 3 [17,18,23]. It was unclear whether the study involving the QOC was a PROM development study. Nevertheless, we rated it because it provided information on PROM development [17]. Although DIDA-Q is generally scored as "very good" in PROM design items and performed in a pilot test [18], all three questionnaires are rated as "doubtful" in total due to the "worst score counts" principle by COSMIN.  Table 4 presents the overall evidence for each measurement property against the COSMIN GRADE Assessment. Table 5 details the measurement properties separately for each included study. Five studies investigated the validity of three PROMs (DAQ, DTQ, and DIDA-Q).     The structural validity of DIDA-Q was rated "sufficient" (with moderate quality of evidence) in people with multiple sclerosis (MS). In contrast, the structural validity of the DAQ and QOC was rated as "insufficient" (with low and very low quality of evidence) in adults and persons with brain injury, respectively.

Validity Measures of PROMs Assessing DT Difficulties
Four studies reported correlations between DT PROMs and other outcome measures (i.e., hypothesis testing) for the construct validity of DAQ (in young and older adults), DTQ (in older adults), DIDA-Q (in MS), and QOC (in acquired brain injury) [17,18,20,21]. The overall rating was "insufficient" for DAQ, "indeterminate" for DTQ and QOC, and "sufficient" for DIDA-Q. The quality of evidence was rated very low (DAQ, DTQ, and QOC) to moderate (DIDA-Q). The comparison between subgroups (i.e., known groups or discriminative validity) was documented only for DIDA-Q, and significant differences were noted for different disability levels in persons with MS [18].
Cross-cultural adaptation was only performed by Sertel et al. and Amini et al. [20,21]. The DTQ has been translated from English to Turkish and adapted for older adults, and DAQ has been translated into Persian. Cross-cultural validity (differential item functioning between languages) has not been investigated in any study.

Reliability Measures of PROMs Assessing DT Difficulties
Five studies reported the internal consistency of three PROMs:DAQ, DTQ, and DIDA-Q. The overall rating was "insufficient" with moderate quality of evidence for DAQ, "insufficient" with very low quality for DTQ, and "sufficient" with high quality for DIDA-Q. Cronbach's alpha scores are provided in Table 5. Cronbach's alpha was ≥0.70 (statistically acceptable value) for individuals with MS on the DIDA-Q and for young and older adults on the DAQ.
Test-retest reliability was reported in five studies for DAQ, DTQ, DIDA-Q, and QOC. The overall rating was "insufficient" with moderate quality of evidence for DAQ, "sufficient" with very low quality for DTQ, "sufficient" with moderate quality for DIDA-Q, and "indeterminate" with very low quality for QOC. For QOC, Cohen's coefficient was calculated instead of the intraclass correlation coefficient (ICC) in six persons. ICC scores for each PROM are provided in Table 5. Table 6 details the nine studies that used PROMs as an outcome measure or for screening purposes. The most commonly measured population was MS (n = 5), followed by Parkinson's disease (n = 1), mild cognitive impairment (n = 1), acquired brain injury (n = 1), and spinal cord injury (n = 1). DAQ and DTQ were used as outcome measures, while DTSL was used for the inclusion criteria of an intervention study. In four studies, DTQ scores were compared between persons with MS and healthy controls, finding significant differences between groups that support the discriminative validity of the DTQ in MS [12,[24][25][26]. Three studies used PROMs (DTQ, DAQ, DTSL) as an experimental outcome measure in intervention studies. The DTQ improved following a cognitive-motor DT exercise training program in individuals with acute brain injury but not in persons with MS [27,28]. Improved scores on the DAQ were found following a training program based on divided attention (cognitive-cognitive) in persons with mild cognitive impairment. [29]. DTSL was used as an inclusion criterion in four studies (n = 3 in MS, n = 1 in Parkinson's disease) [12,24,28,30], two of which were intervention studies [28,30].

Discussion
This review provides an overview of the use of PROMs assessing DT difficulties and synthesizes the current evidence from six studies that aimed to evaluate the measurement properties of these PROMs. To the best of our knowledge, this is the first review that investigated PROMs assessing perceived DT difficulties. A total of 14 measurement properties were documented, with 5 (35.7%) rated as "sufficient", 6 (42.8%) "insufficient", and 3 (21.4%) "indeterminate" quality. Five PROMs were identified that were developed for different populations and used as outcome measures. No measurement properties of the DTSL were examined, and the quality of evidence of DTQ, DAQ, and QOC for reliability and validity was generally rated very low and low. The DIDA-Q obtained the highest number of positive criteria for measurement properties, although solely in MS.
The DIDA-Q, which consists of the most items and includes subscales of cognition, upper extremity, and balance-mobility, has been investigated only in persons with MS from Italy [18]. The DIDA-Q presents sufficient internal consistency, reliability, structural validity, and construct validity. Still, the PROM development was rated "doubtful" in our review. Furthermore, the measurement error and cross-cultural validity of DIDA-Q have yet to be investigated. Given the moderate to high-quality evidence for many measurement properties, we recommended its use in the MS population. Nevertheless, we encourage future studies to continue investigating its measurement properties in other populations and languages.
According to our findings, all studies using PROMs for perceived DT difficulties as an outcome measure were in people with neurological diseases highlighting its necessity in these populations. Only two (DIDA-Q and DAQ) PROMs have provided detailed information on the development procedures. The low methodological quality scores of the other PROMs are likely because they did not conduct the PROM development processes systematically. Therefore, we recommend that future PROM development and cross-cultural adaptation studies on perceived DT difficulties follow COSMIN tools.
Although the purpose of all five DT PROMs is to evaluate DT difficulties, instrument properties differ among PROMs. The DTSL was designed as a checklist (yes/no choice); others were Likert-type scales to determine the difficulty level. There is no study on the measurement properties of DTSL. However, it was utilized to show the presence of DT impairment as an inclusion criterion in some intervention and cross-sectional studies in persons with MS and Parkinson's disease [12,24,28,30]. Although we think that the use of a checklist for DT training and assessment studies is relevant, there is a need for a study to explore discriminative and other measurement properties.
Cross-cultural adaptation studies are essential, allowing researchers and health professionals in different societies to acquire comparable data for DT difficulties. We observed that only DTQ and DAQ had been culturally adapted for use in other languages (Turkish and Persian) [20,21]. We recommend performing cross-cultural adaptation studies with rigorous methodologies.
Identified PROMs generally showed deficiencies regarding responsiveness, measurement error, cross-cultural validity, and discriminative validity. It is important to emphasize these flaws for future studies in clinical and research contexts. While the criterion validity is not applicable, it is essential to determine the relevance to the lab-based DT performance tests commonly used to detect DT impairment so far. Only one study has examined the relationship between the Timed-Up and Go test with a cognitive task and the DTQ, and authors find small significant correlations between perceived DT impairment and lab-based DT test in older adults [20].

Methodological Considerations
A major strength of this review is the use of the updated version COSMIN methodology for systematic reviews of measurement properties for systematic reviews.
No meta-analysis was performed due to the heterogeneity of the outcomes and study designs of included studies. It is noted that only a limited number of questionnaires were found. However, this is likely as DT assessment and treatment is a relatively new domain of investigation.

Conclusions
This review highlights the importance of understanding the quality of PROM development and measurement properties of PROMs for proper use and interpretation in a particular population. Based on the evidence from this review, we recommend utilizing the DIDA-Q to assess perceived DT difficulties in persons with MS. The measurement properties of the DTSL were not investigated, and the quality of evidence of the DTQ, DAQ, and QOC was usually rated as very low and low. The responsiveness, measurement error, and cross-cultural validity of the identified PROMs have yet to be studied. We acknowledge that further studies focusing on measurement errors, cross-cultural validity, and comparison with lab-based DT walking assessments are warranted.

Conflicts of Interest:
The authors declare no conflict of interest.