Psychometric Characteristics of the Patient-Reported Outcome Measures Applied in the CENTER-TBI Study

Traumatic brain injury (TBI) may lead to impairments in various outcome domains. Since most instruments assessing these are only available in a limited number of languages, psychometrically validated translations are important for research and clinical practice. Thus, our aim was to investigate the psychometric properties of the patient-reported outcome measures (PROM) applied in the CENTER-TBI study. The study sample comprised individuals who filled in the six-months assessments (GAD-7, PHQ-9, PCL-5, RPQ, QOLIBRI/-OS, SF-36v2/-12v2). Classical psychometric characteristics were investigated and compared with those of the original English versions. The reliability was satisfactory to excellent; the instruments were comparable to each other and to the original versions. Validity analyses demonstrated medium to high correlations with well-established measures. The original factor structure was replicated by all the translations, except for the RPQ, SF-36v2/-12v2 and some language samples for the PCL-5, most probably due to the factor structure of the original instruments. The translation of one to two items of the PHQ-9, RPQ, PCL-5, and QOLIBRI in three languages could be improved in the future to enhance scoring and application at the individual level. Researchers and clinicians now have access to reliable and valid instruments to improve outcome assessment after TBI in national and international health care.


Introduction
Traumatic brain injury (TBI) causes alterations in brain function, as a result of an external force [1], for example, due to falls, road traffic accidents, sports, assaults, or violence. It is a considerable source of disability and death worldwide. The sequelae of TBI not only impact the lives of those affected and their relatives on many different levels [2], but they can also result in high direct and indirect costs [3,4].
The data analyzed in this study were collected in the international Collaborative European NeuroTrauma Effectiveness Research in TBI observational study (CENTER-TBI; clinicaltrials.gov NCT02210221), which has been conducted since 2014 in 18 European countries and Israel, with enrolment being completed at the six-month outcome assessment in 2018. This study aimed to capture a contemporary picture of TBI with respect to all severity groups, its care and outcome, to develop precision medicine approaches and apply comparative effectiveness research to identify best practices. It provides insights into the longitudinal detection of somatic, functional, behavioral, psychiatric, cognitive, psychological, and psychosocial sequelae after TBI and can serve as a basis for the development of a new multidimensional assessment approach [15,16].
An important criterion when selecting instruments for research and clinical practice is their psychometric quality. For most patient-reported outcome measures (PROMs) administered in the CENTER-TBI study this had not yet been examined in the field of TBI, nor had the newly translated versions of the instruments been psychometrically investigated. Hence, the present study aims to investigate the classical psychometric properties of the newly and previously translated PROMs in the field of TBI administered in the CENTER-TBI study.
In research and clinical contexts, instruments offer insights into outcome after TBI. The comparability of the translated instruments with their original version and the validation in the field of TBI enables the reliable and valid aggregation of data in multi-center national and international studies on outcomes after TBI.
The study aims are the investigation of: 1.
The reliability (total score, scale, and item level) of the PROMs, comparing them with the values of the original instrument versions to ascertain the quality and comparability of the translations and applicability in the field of TBI; 2.
The factorial validity using confirmatory factor analyses (CFA) to replicate the original factorial structure of the translated instruments.

Participants
Participants were recruited at 63 centers across 18 countries, from 19 December 2014 to 17 December 2017. Ethical approval was secured for each site and informed consent was obtained from all patients or from their legal representatives. The inclusion criteria for the core study were a clinical diagnosis of TBI, presentation within 24 h after injury, and an indication for a computed tomography (CT) scan. Patients were differentiated into three strata: emergency room (ER; patients primarily evaluated at an ER), admission (ADM; patients admitted primarily to a hospital ward), and intensive care unit (ICU; patients who were primarily admitted to an ICU). Further details can be found elsewhere [16]. Data were retrieved from the core 2.1 of the CENTER-TBI database using the data access tool Neurobot.
The core study sample included 4509 individuals. In the present study, we focused on participants aged 16 years and above who had completed at least one outcome measure at the six months' assessment after the TBI. The data were collected either on-site at the hospital by personnel, by face-to-face or telephone interviews (clinical ratings), or via mail (PROMs) and centrally entered using a web-based electronic case report form.

Sample Charachteristics
Language, sex, age, education, employment, marital status, and living situation were selected as sociodemographic characteristics. Samples were then aggregated by language. More specifically, individuals from German-speaking communities in Austria, Belgium, and Germany were integrated into the German sample, individuals from French-speaking communities in Belgium and France into the French sample, and individuals from Dutchspeaking communities in Belgium or the Netherlands were merged into the Dutch sample. Only few participants (N = 20) received the outcome questionnaires in a language other than in the local language of the participating site. These individuals were classified according to their respective language group: Dutch (7), English (8), German (1), Romanian (3), and Swedish (1).
The following variables were used to characterize extracranial and brain injuries: the individuals' mental health status before the injury, clinical care pathways, cause of injury, loss of consciousness (LOC), post-traumatic amnesia (PTA), TBI severity (GCS), abnormalities on computed tomography (CT) scans, total injury severity score (ISS), and brain injury severity score from the Abbreviated Injury Scale (AIS) [17].

Pataient-Reported Outcome Measures (PROMs)
Since most instruments applied in the CENTER-TBI study only existed in English, they had to be translated into the languages of the participating countries following a formalized approach (i.e., linguistic validation) to ensure their linguistic, cultural and conceptual comparability in the respective languages [18,19]. For more details, see von Steinbuechel et al. [20].
The selection of the outcome measures was informed by the Common Data Elements (CDE) recommendations [21,22]. For six out of eight PROMs (see instrument description marked with an asterisk * below), at least one translation had to be performed. In this study, we report psychometrics for all eight PROMs newly and previously translated yet not validated instruments in the field of TBI.
The Generalized Anxiety Disorder 7 Item Scale (GAD-7)* [23] measures the level of generalized anxiety disorder using seven items and a four-point Likert scale (from 0 "not at all" to 3 "nearly every day"). The total score ranges from 0 to 21 with values of 10 and above indicating impairment and cut-offs of 5, 10, and 15 representing mild, moderate, and moderately severe to severe anxiety, respectively [23].
The Patient Health Questionnaire (PHQ-9)* [24] assesses self-reported symptoms of major depression using nine items and a four-point Likert scale (from 0 "not at all" to 3 "nearly every day"). The PHQ-9 total score ranges from 0 to 27 with a score of 10 and above indicating clinically relevant impairment and cut-offs of 5, 10, 15, and 20 indicating mild, moderate, moderately severe, and severe depression, respectively [24,25].
Both the GAD-7 and PHQ-9 were available in almost all languages except for Latvian (GAD-7 and PHQ-9) and Serbian (GAD-7 only). Nevertheless, we conducted analyses on both instruments to examine their psychometric properties in individuals after TBI.
The Posttraumatic Stress Disorder Checklist-5 (PCL-5)* [26] comprises 20 symptoms of post-traumatic stress disorder (PTSD) based on the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5) [27], using a five-point Likert scale (from 0 "not at all" to 4 "extremely"). The total score ranges from 0 to 80 with higher values indicating greater impairment. For clinical screening, either a cut-off score of 31 [28] or 33 is applied [29].
The Rivermead Post-Concussion Symptoms Questionnaire (RPQ)* [30] uses a fivepoint Likert scale (from 0 "not experienced at all" to 4 "a severe problem") to evaluate the following 16 post-concussion symptoms: headaches, dizziness, nausea and/or vomiting, noise sensitivity, sleep disturbance, fatigue, irritability, depression, frustration, forgetfulness and poor memory, poor concentration, slow thinking, blurred vision, light sensitivity, double vision, and restlessness. Participants rate how much they have been suffering from these symptoms during the past 24 h compared with their condition before the accident. The RPQ total score ranges from 0 to 64 with cut-offs of 13, 25, and 33 indicating mild, moderate, and severe symptoms, respectively [31].
The Quality of Life after Brain Injury Scale (QOLIBRI)* [32,33] measures TBI-specific HRQoL in individuals after TBI. It consists of six domains comprising 37 items using a fivepoint Likert scale (from 0 "not at all" to 4 "very"). The six domains comprise cognition, self, daily life and autonomy, social relationships, emotions, and physical conditions. The total score is transformed linearly to range from 0-100, whereby higher values indicate better TBIspecific HRQoL [34]. Patients after TBI with a score below 60 may be assumed to display impaired HRQoL [34]; country-specific reference values can be found elsewhere [35]. For the QOLIBRI, psychometric criteria of almost all target language versions involved in the present study (except for Swedish) had already been published [32,36]. The Spanish translation was published after CENTER-TBI had started [37]. To be congruent with the analyses of other PROMs, we replicated the psychometric analyses for the nine language versions of the QOLIBRI.
The Quality of Life after Brain Injury-Overall Scale (QOLIBRI-OS)* [38] is the short version of the QOLIBRI measuring the physical condition, cognition, emotions, daily life and autonomy, social relationships, and current and future prospects with using six items. The items are answered on a five-point Likert scale (from 0 "not at all" to 4 "very"). Patients after TBI with a score below 52 may be assumed to display impaired HRQoL [34]; country-specific reference values can be found elsewhere [39]. For the QOLIBRI-OS too, psychometric properties have already been examined in almost all languages, except for Spanish and Swedish [38]. Here, again, psychometric analyses were replicated in all languages to be congruent with the other PROMs.
The 36-item Short Form Health Survey-Version 2 (SF-36v2) [40,41]. The SF-36v2 measures subjective health status using 36 items with various response formats for each of the eight scales (from dichotomous "yes/no" to polytomous five-point Likert scale responses). The scales can be summed to produce the physical component score (PCS) and mental component score (MCS) measuring physical and mental functioning, respectively. Both scores range from 0 to 100 with higher values indicating better HRQoL. The values can be transformed into T-scores (M = 50, SD = 10) based on a normative U.S. sample. A value below 47 on a single health domain scale or component summary score is indicative of functional impairment in comparison to the U.S. population [40].
The 12-Item Short Form Survey-Version 2 (SF-12v2) [42] is a short, 12-item version of the SF-36v2. The scores range from 0 to 100 with higher values indicating better HRQoL. The raw values can be transformed into T-scores (M = 50, SD = 10) based on a normative U.S. sample. However, the authors recommend using country-and group-specific cut-off values as not every country/group has a mean health of 50 [42,43]. In the CENTER-TBI study, the SF-12v2 was found to have more missing data than the SF-36v2. Therefore, to increase the power for the calculation of the PCS and MCS of the SF-12v2, missing values were replaced by values derived from the respective items of the SF-36v2 and combined with reported data. For the analyses on the item level, only reported data were used.
The SF-36v2 and SF-12v2 translations were already available in the target languages and had to be purchased from Optum for one-time use [44]. However, since most translated versions of both the SF-36v2 and the SF-12v2 were not subjected to psychometric analyses in the field of TBI, they were included in the analyses of the present study. Both instruments were also used for validity analyses.

Clinician-Reported Outcome (ClinRo) and a Clinical Scale
The instruments listed below were used to analyze convergent and discriminant validity.
Missing GOSE values were centrally replaced by values derived from the GOSE-Q. Since the GOSE-Q is not able to differentiate between vegetative state and lower severe disability, GOSE levels 2 and 3 were collapsed into one category. The missing values at six-months outcome assessments were imputed using a multi-state model; the imputation procedure is described elsewhere [47]. The GOSE was not subjected to reliability analyses, as it would require data from independent raters to provide interrater reliability, which was not available in the CENTER-TBI database.
The Glasgow Coma Scale (GCS) [48] allows healthcare professionals to consistently evaluate the level of consciousness of individuals after TBI, also classifying the severity of TBI. The GCS scores range from 3 (no response) to 15 (normal level) with higher values indicating less impaired consciousness and lower TBI severity. Scores of 13 to 15 indicate mild TBI, 9 to 12 moderate TBI, and 3 to 8 severe TBI.

Statistical Analyses
The present study focuses on the analyses of reliability, convergent and discriminant validity of eight PROMs in nine TBI language samples with enough participants (i.e., at least 50 participants in the Dutch, English, Finnish, French, German, Italian, Norwegian, Spanish, and Swedish samples) as well as factorial validity in six samples (i.e., at least 150 participants in the Dutch, English, Finnish, Italian, Norwegian, and Spanish samples). Figure 1 provides an overview of our psychometric analyses according to the classical test theoretical (CTT) criteria with the respective cut-off values [49]. Criteria of classical test theoretical psychometric analyses and their application in this study. The white boxes indicate analyses performed in this study; the grey boxes describe psychometric properties investigated either during instrument development (i.e., content validity), or alternative methods of retest reliability or parallel form reliability, or analyses deferred to further studies (i.e., measurement invariance and interpretation).

Descriptive Statistics
Descriptive statistics include information on the sample sizes, percentage of missing data, mean (M), standard deviation (SD), skewness (SK), and kurtosis (KU) for each item per language version of an instrument and an average of the item characteristics across all Figure 1. Criteria of classical test theoretical psychometric analyses and their application in this study. The white boxes indicate analyses performed in this study; the grey boxes describe psychometric properties investigated either during instrument development (i.e., content validity), or alternative methods of retest reliability or parallel form reliability, or analyses deferred to further studies (i.e., measurement invariance and interpretation).

Descriptive Statistics
Descriptive statistics include information on the sample sizes, percentage of missing data, mean (M), standard deviation (SD), skewness (SK), and kurtosis (KU) for each item per language version of an instrument and an average of the item characteristics across all languages. For skewness, values less than −1 or greater than 1 indicate a highly skewed distribution; values from ±1 to ±0.5 show that the distribution is moderately skewed; values from −0.5 to +0.5 denote a symmetrical distribution. For asymmetry and kurtosis, values between −2 and +2 are considered acceptable [50].

Reliability
For reliability analyses, researchers often accept data of 30 participants as being sufficient to detect a required minimal effect of 0.70 as a cut-off value for reliability coefficients [51]. However, some researchers argue that larger sample sizes are required to avoid bias [51,52]. In the present study, reliability coefficients were therefore only calculated if the sample size comprised at least 50 individuals per language, to provide more robust results.
To examine the reliability of each instrument, Cronbach's alpha, split-half reliability with the Spearman-Brown correction (odd vs. even items), and Cronbach's alpha if an item is omitted were reported. Both, the split-half reliability and the Cronbach's alpha if item omitted were calculated for scales with at least three items. Although different recommendations in terms of cut-off points for the Cronbach's alpha do exist, there is an agreement that in group comparisons Cronbach's alpha should reach at least a value of 0.70 implying acceptable internal consistency [53]; an alpha above 0.90 indicates excellent internal consistency [54]. The Cronbach's alpha value, if an item has been omitted, should not exceed the total Cronbach's alpha of a scale. A value higher than the total Cronbach's alpha indicates that the excluded item decreases the reliability of the instrument and requires further revision [55].
To evaluate the discriminating ability of the items, item-total correlations either at the scale or at the total score level, or both were calculated. A correlation coefficient of 0.30, corresponding to a medium effect size, was chosen as the cut-off criterion, based on the guidelines for effect size proposed by Cohen [56,57]. An item-total correlation below 0.30 implies that the item cannot discriminate well between high-performing and low-performing individuals. Furthermore, low item-total correlations, especially at the scale level, may identify irregularities of the factorial structure of an instrument.

Convergent and Discriminant Validity
All language samples analyzed in this study included at least 50 observations, which is recommended for validity analyses [58].
Spearman correlation coefficients were used to examine associations between the GOSE, physical (PCS) and the mental component score (MCS) of the SF-36v2 and SF-12v2, and the total scores/domain-specific scores of all other measures.
Discriminant validity was investigated by calculating Spearman correlation coefficients for the GCS and the total and scale scores of all instruments, to be in line with analyses already provided in the field of TBI [59]. To evaluate the strength of correlations, the Cohen criteria [56,57] were applied to identify small (0.10), medium (0.30), and large (0.50) effect sizes.

Factorial Validity
Factorial validity was examined by means of confirmatory factor analyses (CFA) and a robust weighted least squares estimator (WLSME) for ordinal data, whereby only the original factor structure of the instruments was analyzed. Therefore, one-factor solutions were estimated for the GAD-7 [23], the PHQ-9 [24], the RPQ [30], and the QOLIBRI-OS [38]. For the other instruments, respective multiple scale models were inspected: a four-factor model for the PCL-5 [26], a five-factor model for the QOLIBRI [32], an eight-factor model with two second-order factors for the SF-36v2 [40], and finally, a two-factor model for the SF-12v2 [42]. In CFA analyses, samples should comprise at least 150 observations to provide stable results [60]. Therefore, only language samples fulfilling this criterion were analyzed.
Analyses were performed using the packages psych [65] for psychometric characteristics and lavaan [66] for the factorial validity analyses applying the R version 4.0.2 [67].

Comparability of the Translated Versions
To evaluate the quality of the translated versions of the eight PROMs, psychometric criteria obtained from the CENTER-TBI language samples were compared with those reported for the original English instrument versions. For this purpose, a systematic literature search was carried out. Psychometric characteristics were compared with those obtained from the original validation studies in the original populations, for which the respective instrument was developed. If available, they were also compared with the validation studies in the field of TBI. If the original articles did not provide information on all coefficients, these were retrieved from more recent studies.
These comparisons were confined to the reliability coefficients (i.e., Cronbach's alpha coefficients, split-half or test-retest reliability), as validity testing in the original studies was performed using instruments not applied in the CENTER-TBI study. Instruments showing reliability within the same ranges (i.e., <0.70-acceptable, 0.70-0.89-good, ≥0.90excellent) or higher in both original and the PROMs applied in the CENTER-TBI study were considered comparable.

Sample Characteristics
For the CENTER-TBI study, eight PROMs were translated or already available in 20 target languages ( Figure 2). As some countries withdrew from the project early (Bulgarian and Czech centers) or no participants were recruited (Arabic and Russian), 16 countries participated in the study. Seven out of 16 language samples (i.e., Danish, Hungarian, Hebrew, Lithuanian, Latvian, Romanian, and Serbian) were not psychometrically analyzed due to a low number of observations (N < 50). Additionally, three language samples (French, Norwegian, Swedish) had to be excluded from the reliability analyses of the SF-12v2, also because of insufficient sample sizes. For the factorial validity, six language samples comprising at least N = 150 observations (i.e., Dutch, English, Finnish, Italian, Norwegian, and Spanish) were investigated for all instruments except for the SF-12v2, as only three SF-12v2 language samples (i.e., Dutch, Finnish, and Spanish) fulfilled the sample size criteria.
The number of participants varied between PROMs, since not every participant filled in each instrument at the six-months outcome assessment. Sample characteristics for each instrument and language are provided in the Online Supplement (OS 1: Sample characteristics, Tables S1-S8). A brief overview on the sample compositions used for the analyses is presented in Figure 2. Appendix A (Table A1) provides additional information on the number of participants for the validity analyses using the GOSE and the GCS.

Reliability and Comparability of the PROMs
Reliability coefficients for the total and scale scores of the PROMs are shown below. Item characteristics as well as reliability coefficients on the item level are reported in the respective tables in the Online Supplement 2 (OS-2 Reliability, Tables S1-S8).

GAD-7
All translations analyzed were available prior to the CENTER-TBI study. Item scores for the GAD-7 were not normally distributed (SK: M = 1.64, SD = 0.51; KU: M = 2.40, SD = 2.22) across all languages. At the item level, most items were moderately to strongly correlated with the total score of the GAD-7 in most languages (0.36 to 0.89). When calculating Cronbach's alpha if item omitted, all values were smaller than the total Cronbach's alpha across all languages. The values of the split-half reliability ranged from 0.70 to 0.90 across all languages. On the total score level, all translations revealed Cronbach's alpha and split-half reliability values comparable to the results of the original English versions in a non-TBI population (i.e., patients from 15 primary care sites [23]) except for the Finnish, German, Spanish, and Swedish versions showing Cronbach's alpha values slightly lower than 0.90, but over 0.80. The reliability results were within the same or higher range (0.70 to 0.89 and ≥0.90) compared to the validation in an English TBI sample [68] (see Table 1).  1 Reliability coefficients obtained from the CENTER-TBI study sample. 2 Reliability coefficients from the original English validation of the GAD-7 in a non-TBI sample [23], and from the first English validation in a TBI sample [68]. 3 Split-half reliability (CENTER-TBI data), test-retest reliability provided by original studies; N = number of cases; values in bold represent at least satisfactory reliability (≥0.70).

PHQ-9
All analyzed PHQ-9 translations were available prior to the CENTER-TBI study. The items of the PHQ-9 were not normally distributed (SK: M = 1.66, SD = 0.80; KU: M = 2.71, SD = 3.85) across all languages. At the item level, all items were moderately to highly correlated with the total scores of the PHQ-9 across all languages, except for Swedish. Here, the item "Moving or speaking so slowly that other people could have noticed" had a low correlation (r = 0.18) with the total score. At the total score level, the Cronbach's alpha values were above 0.70 (0.78 to 0.89) in every language. When calculating Cronbach's alpha if item omitted, no value exceeded the total Cronbach's alpha. The values of the split-half reliability ranged from 0.85 to 0.90. Reliability coefficients were comparable (i.e., ranged from 0.70 to 0.89 and above) with those obtained from the original English publication in a non-TBI population (i.e., primary care patients from five general health clinics and three family practice clinics) [24]. The Cronbach's alpha coefficients calculated from CENTER-TBI data were slightly lower compared with the results from the first English validation study in a TBI sample [69], whereas the results of the split-half reliability were within a comparable range [70] (see Table 2).  1 Reliability coefficients (CENTER-TBI study sample). 2 Reliability coefficients (original English validation of the PHQ-9 in a non-TBI sample [24] and from two English validations in TBI samples) † Cronbach's alpha [69] and ‡ test-retest reliability [70]. 3 Split-half reliability (CENTER-TBI data), test-retest reliability provided by original studies; N = number of cases; values in bold represent at least satisfactory reliability (≥0.70).

PCL-5
All but the Norwegian version of the PCL-5 were translated for the CENTER-TBI study. The items of the PCL-5 were not normally distributed (SK: M = 1.75, SD = 0.66; KU: M = 2.70, SD = 2.99) across all languages. At the scale (i.e., DSM-5 cluster) level, most items had medium to high correlations with the cluster total scores of the PCL-5 across all languages. Only the item "Trouble remembering important parts of the stressful experience" displayed borderline correlations with the total cluster scores in French (r = 0.20), Norwegian (r = 0.28), and Swedish (r = 0.28) language samples. The internal consistency was satisfactory to excellent (0.74 to 0.92) at the cluster level. All split-half reliability coefficients demonstrated at least satisfactory reliability (i.e., ≥0.70). At the total score level, the values of the Cronbach's alphas ranged from 0.91 to 0.94 in all languages. The Cronbach's alphas if item omitted did not exceed the values of the initial Cronbach's alpha except for the item "Trouble remembering important parts of the stressful experience" in all but English and German language samples. The split-half reliability was excellent (0.92 to 0.96) across all languages. Cronbach's alpha coefficients on the total score and the cluster level were comparable to the original English validation results in a non-TBI sample (i.e., undergraduate students having experienced a stressful life event [26] and military service members [71]) in all translations. No publications on psychometric properties of the PCL-5 in the field of TBI samples were found (see Table 3). Note. * Instruments translated and linguistically validated for the CENTER-TBI study. 1 Reliability coefficients (CENTER-TBI study sample). 2 Reliability coefficients (original English validation of the PCL-5 in a non-TBI sample on the † total score level [26]) and on the total score and cluster level [71]. 3

RPQ
All but the German and Norwegian versions of the RPQ were translated for the CENTER-TBI study. The item score distributions of the RPQ were skewed (SK: M = 1.31, SD = 0.83; KU: M = 1.37, SD = 4.07) across all languages. At the item level, most items displayed medium to high correlations with the total scores of the RPQ. In the German translation, the item "Double Vision" had a borderline correlation with the total score of the RPQ (r = 0.25). The item "Nausea" of the German and Swedish translations displayed rather low correlations (r = 0.25 and r = 0.24, respectively) with the total score.
At the scale level, however, the values of the Cronbach's alpha and the split-half reliability were above 0.70 across all languages. No comparisons between the original and the translated language versions can be provided for the internal consistency, as no information was available concerning Cronbach's alpha in the English RPQ version investigated in a TBI sample. Moreover, further studies on the RPQ [31,72,73] provided no information on the internal consistency, as they focused on the factorial structure of the questionnaire. The test-retest reliability scores in the original study were comparable to the split-half reliability results of the English and Finnish language samples from the CENTER-TBI study. The split-half reliability of all other translations was slightly above 0.90 except for the Swedish version (α Cronbach = 0.82). For details, see Table 4. Note. * Instruments translated and linguistically validated for the CENTER-TBI study. 1 Reliability coefficients (CENTER-TBI study sample). 2 Reliability coefficients from the original English validation of the RPQ in a TBI sample [30]. 3 Split-half reliability (CENTER-TBI data), test-retest reliability (original validation study); N = number of cases; values in bold represent at least satisfactory reliability (≥0.70).

QOLIBRI
At the total score level, Cronbach's alpha and the split-half reliability coefficients of all translated QOLIBRI versions were above 0.90. Item-total correlations displayed medium to high correlations with the total score except for the German version. Here, the item "How bothered are you by feeling angry or aggressive" revealed a low correlation with the total score (r = 0.25). Below, item distributions and reliabilities are reported for each subscale. All reliability coefficients were comparable (i.e., within the same or higher range) to those reported in the original publication on a TBI population. As the QOLIBRI was developed for use in the TBI field, no validation studies in non-TBI populations are reported (see Table 5).

QOLIBRI-OS
The items of the QOLIBRI-OS were close to being normally distributed (SK: M = −0.71, SD = 0.23; KU: M = −0.05, SD = 0.53) and were moderately to highly correlated with the total scores of the QOLIBRI-OS (0.59 to 0.83) across all languages. At the total score level, the Cronbach' alpha values were close to or above 0.90 (0.88 to 0.92), and the split-half reliability ranged from 0.90 to 0.94. Moreover, the values of the Cronbach's alpha if item omitted were smaller than the Cronbach's alpha in each language. The reliabilities of the translated versions were in general within the same range as those of the original ones. The split-half coefficients were greater than the test-retest reliability of the original QOLIBRI-OS. For details, see Table 6.

SF-36v2
All SF-36v2 translations were available prior to the CENTER-TBI study. The instrument was investigated on the scale and item level and with respect to the mental (MCS) and physical (PCS) component score.
Physical The internal consistency of the translated versions of the SF-36v2 was comparable to the original English version, which was validated in a U.S. general population [40,41]. The Cronbach's alpha coefficients on the scale levels were within the same ranges or above. The split-half reliability coefficients were within the same or higher ranges compared to the original version. Despite the wide application of the SF-36v2, no studies on psychometric properties of the English version in the field of TBI for the English version were found.
Physical Component Score (PCS). Items were moderately to highly correlated with the PCS (0.35 to 0.87) except for the item "I expect my health to get worse" in the English version (r = 0.23). Cronbach's alpha ranged from 0.32 to 0.95 and the split-half reliability coefficients from 0.93 to 0.95. When omitting an item, the newly calculated Cronbach's alpha did not exceed the initial value in any language sample. The reliability coefficients were within the same or higher range compared with the psychometric properties of the original SF-36v2.
Mental Component Score (MCS). The items were moderately to highly correlated with the MCS (0.43 to 0.88). Cronbach's alpha (0.92 to 0.95) and split-half coefficients (0.95 to 0.98) indicted a high reliability. When omitting an item, the newly calculated Cronbach's alpha values did not exceed the initial one. Here, again, the reliability of the instrument translations was comparable (i.e., was within the same or higher range) with the results obtained from the original validation study (see Table 7). Note. 1 Reliability coefficients (CENTER-TBI study sample). 2 Reliability coefficients (original English validation of the QOLIBRI in a non-TBI sample [41]). 3

SF-12v2
All SF-12v2 translations were available prior to the CENTER-TBI study. Many of the scales of the SF-12v2 consist of two items (PF, RP, RE, MH), and some include one item (BP, VT, SF, GH); therefore, the reliability coefficients are provided on the physical (PCS) and mental (MCS) component score level.
Physical Component Score (PCS). The items were close to being normally distributed The reliability of the translated versions of the SF-12v2 was comparable to the original English version, which was validated in a general U.S. population [42]. The split-half reliability coefficients (using the CENTER-TBI data) were within the higher range for both component scores compared with the original version. Despite the wide application of the SF-12v2, no studies on psychometric properties of the English version in the field of TBI were found (see Table 8). Note. French, Norwegian, and Swedish language samples were excluded from the reliability analyses due to the low number of participants (N < 50). 1 Reliability coefficients (CENTER-TBI study sample). 2 Reliability coefficients (original English validation of the SF-12v2 in a non-TBI sample [42]). 3 Split-half reliability (CENTER-TBI study), test-retest reliability (original validation study); N = number of cases; PCS = Physical Component Score; MCS = Mental Component Score; values in bold represent at least satisfactory reliability (≥0.70).

Convergent and Discriminant Validity
Validity coefficients for all PROMs and the PCS and MCS of the SF-36v2 and the SF-12v2 are provided on the total score level (see Table 9). For details concerning the validity of the PCL-5, the QOLIBRI, and the SF-36v2 on the scale level, see Appendix B  Tables A2-A4. Most instruments indicating a degree of impairment (i.e., GAD-7, PHQ-9, PCL-5, and RPQ) displayed medium to high negative correlations with the PCS of the SF-36v2 (−0.30 to −0.82). Some exceptions were observed in the English (r S = −0.15) and the Swedish (r S = −0.12) versions of the GAD-7, as well as in the French version (r S = −0.25) of the PCL-5 which demonstrated low negative correlations. For the instruments measuring disease-specific HRQoL after TBI (i.e., the QOLIBRI and the QOLIBRI-OS) medium to high positive correlations with the SF-36v2 PCS domain (0.49 to 0.65) were found across all languages. Table 9. Convergent and discriminant validity of the GAD-7, PHQ-9, PCL-5, RPQ, QOLIBRI, and QOLIBRI-OS with the SF-36v2, the SF-12v2, the GOSE, and the GCS.   Significant medium to high correlations were found between the PROMs and the GOSE total score, whereby greater impairment was associated with lower functional recovery status in almost all languages across all instruments (from −0.30 to −0.63). Only the German version of the GAD-7 (r S = −0.24) and the German version of the PCL-5 (r S = −0.16) demonstrated low associations with the GOSE. Higher TBI-specific HRQoL was associated with a better functional recovery status across all languages (0.37 to 0.64).

Convergent Validity
The associations of the PROMs and the GCS were weak and not significant in most languages. Only the Swedish translations of the GAD-7 (r S = −0.30), the PHQ-9 (r S = −0.33), the RPQ (r S = −0.34), the QOLIBRI (r S = 0.34), and the QOLIBRI-OS (r S = 0.40) displayed medium correlations with the GCS. Table 10 gives an overview on the goodness of fit statistics for the estimated models. Factor loadings are provided in the Online Supplement (OS-3 Factorial validity, Tables S1-S8).  RPQ. All RPQ translations revealed significant χ 2 statistics and the RMSEA and SRMR values (except for the Dutch and Norwegian versions) were above the respective cut-offs. The factor loadings varied from 0.41 to 0.92. The item "Headaches" of the Finnish RPQ and the item "Double Vision" of the Norwegian RPQ reached values below the cut-off. Overall, the one-factor solution demonstrated a rather poor fit.

Factorial Validity
QOLIBRI. All but two (the English and Finnish) QOLIBRI translations had satisfactory fit indices, except for the χ 2 statistic, which was significant across all translations. The English and Finnish models did not converge. The item loadings of the scales were above 0.70 (Cognition: 0.74 to 0.92; Self: 0.75 to 0.93; Daily Life and Autonomy: 0.76 to 0.96; Social: 0.65 to 0.92; Emotions: 0.63 to 0.97; Physical: 0.59 to 0.92). Overall, the original five-factor structure fitted the data well.
QOLIBRI-OS. For the most part, the CFA results of the QOLIBRI-OS translations displayed acceptable fit indices, with the RMSEA values of the Dutch, Italian, and Spanish translations slightly above the cut-off value and significant χ 2 statistics. All other indices were within acceptable ranges. The factor loadings ranged from 0.73 to 0.92 indicating the unidimensionality of the TBI-specific HRQoL construct across the QOLIBRI-OS translations.
SF-36v2. Two out of six models did not converge (Finnish and Italian). The CFI and the TLI of the other translations were satisfactory; nevertheless, χ 2 statistics were significant, and the RMSEA and the SRMR (except for the Dutch translation) were above the respective cut-off values. All factor loadings on the scale level were above 0.50; one item of the Dutch version of the SF-36v2 ("Walking several hundred yards") was exceedingly highly correlated with the Physical Functioning scale and therefore also with the PCS (r = 1.0). Overall, the factorial structure of the SF-36v2 with eight scales and two second-order factors did not show evidence of a good fit.
SF-12v2. The models displayed satisfactory CFI and TLI values across all languages as well as the SRMR of the Spanish translation. The χ 2 statistics were significant and the RMSEA and SRMR were above permissible cut-off values. The item loadings of the PCS ranged from 0.69 to 0.97 and of the MCS from 0.67 to 0.95.

Discussion
The present study examined psychometric properties of the eight PROMs administered in the CENTER-TBI study in individuals after TBI. Many of them were translated and linguistically validated for this study; others had not yet been psychometrically investigated in the field of TBI. Therefore, a classical test theorical framework was applied.
The results of the reliability and validity analyses performed on the PROMs indicate that most newly translated and already existing questionnaires generally displayed satisfactory to excellent psychometric characteristics in the field of TBI and were comparable to each other as well as to the original English versions investigated predominantly in non-TBI samples, in individuals after TBI, or both. On the scale level, high internal consistency and scale reliability of the newly translated and already existing instruments across all languages were observed. On the item level, only very few items from a few questionnaires demonstrated irregularities, mostly in no more than one language. However, the factorial validity analyses of the original instruments revealed some difficulties in the replicating the original factorial structures, indicating a need for further investigations.
Some translations displayed problems at the item level, displaying lower correlations with the respective total scale scores: the item "Moving or speaking so slowly that other people could have noticed" from the Swedish PHQ-9, the items "Nausea" in the Swedish and the German RPQ and "Double Vision" in the German RPQ, and the item "How bothered are you by feeling angry or aggressive" from the German QOLIBRI. Item-total correlations are directly related to the factorial structure of a questionnaire; therefore, low correlations may indicate that the questionnaire does not measure unidimensionally. The QOLIBRI consists of five scales; thus, the low correlation of the item "How bothered are you by feeling angry or aggressive" in the German translation is not problematic, as the scale level and total score level characteristics were satisfactory. Moreover, the low item-total correlations of the RPQ translations are not unexpected, as the questionnaire underwent several revisions regarding the scoring by different authors [30,72,73], whereby the items "Nausea" and "Double Vision" were assigned to different domains. Nevertheless, the low correlation of the item "Moving or speaking so slowly that other people could have noticed" in the Swedish PHQ-9 is more difficult to explain, as the PHQ-9 is a unidimensional measure. Problems with the wording might be a possible explanation, or more likely the composition of the respective language sample. The Swedish sample contained the most severely impaired patients (GCS), with the lowest functional level of recovery (GOSE) and the highest injury severity score (AIS). Thus, individuals in the Swedish sample seem to be more severely injured compared to other language samples. Therefore, the low correlation of this PHQ-9 item may be attributable to the particularities of the Swedish sample. Future research could review the wording of this item and examine the Swedish PHQ-9 in a broader spectrum of TBI severities.
Additionally, one item from the PCL-5 ("Trouble remembering important parts of the stressful experience") did not distinguish well between low and high levels of PTSD across all languages and displayed low correlations with the scale score (i.e., DSM-5 cluster) in French, Norwegian, and Swedish translations. The factorial structure of the original PCL-5 has been examined on several occasions [74,75], whereby this item was re-assigned to different dimensions. The results of the present study indicate that PCL-5 translations have adopted the methodological problem of the original questionnaire version. Thus, further investigation of the factorial structure of the PCL-5 could lead to an amelioration of the questionnaire's psychometric characteristics.
As expected, the validity inspection of the PROMs (newly translated and available prior to the CENTER-TBI study) indicated medium to strong correlations with the SF-36v2, the SF-12v2, and the GOSE in most languages. The PCS and MCS of the SF-36v2 and SF-12v2 generally demonstrated negatively medium to strong negative correlations with the GAD-7, PHQ-9, PCL-5, and RPQ. One exception was the GAD-7, which revealed a low correlation with the PCS of the SF-36v2 in English and Swedish and the PCS of the SF-12v2 in six out of nine languages (Dutch, English, French, Italian, Norwegian, and Swedish). This might be attributable to the items of the SF-12v2 constituting the PCS in the original version. While the items of the SF-36v2 cover a wider range of physical activities and activity-related problems, the items of the SF-12v2 focus on a limited number of physical problems that are most probably associated less with anxiety. Nevertheless, the results are generally in line with previous findings suggesting that negative emotions (i.e., anxiety, depression, or stress) are highly correlated with generic HRQoL, especially with the MCS [76,77]. Moreover, the assumption that the mental and physical components of the SF-36v2 and the SF-12v2 would have strong positive correlations with the QOLIBRI and the QOLIBRI-OS was affirmed across all languages, supporting results from previous studies [34,78].
Generally, the GAD-7 (except for the German language sample), the PHQ-9, the PCL-5 (except for the German language sample), and the RPQ exhibited medium to strong negative correlations with the GOSE. The German individuals after TBI had a relatively high recovery rate with 50% of full recovery after six months (i.e., GOSE = 8); they suffered a less severe TBI (50% had GCS of 15) and were, consequently, less impaired, as reflected by the low correlation. These results are in line with previous research showing that the functional recovery status after TBI is frequently associated with the absence of mental health problems [79] and post-concussion symptoms [80], and vice versa. The GOSE also revealed medium to strong positive correlations with the QOLIBRI and QOLIBRI-OS across all languages, indicating that higher disease-specific HRQoL is associated with better functional outcomes, which is in line with previous research findings [34,38,81].
Further, the TBI severity as assessed by the GCS rating the degree of consciousness displayed a low association with both the psychological and health-related PROMs in almost all languages except for the Swedish translations of the PHQ-9, the RPQ, the QOLIBRI, and the QOLIBRI-OS. Previously published validity results in the field of TBI [59] found no association between GCS and psychological outcomes and post-concussion symptoms. These populations contained a lower number of more severely injured individuals, as measured by the ISS, and therefore, smaller or no correlations were found. In the Swedish translations, the higher association of the GCS and the outcomes in the Swedish sample might be explained by the higher injury severity and stronger polytrauma of the participants.
Overall, the original factorial structures suggested by the instrument developers were replicated for the GAD-7, the PHQ-9, and the QOLIBRI-OS. The translations of the PCL-5 displayed an acceptable model fit, indicating that the initial factorial structure describes the data well. Nevertheless, we recommend a further investigation of the item "Trouble remembering important parts of the stressful experience" which displayed irregularities in both reliability analyses across all languages and the factor loadings of the CFA in some translations. The five-factor structure of the original QOLIBRI was replicated in all but two language samples; the English and Finnish models did not converge. This could be due to several reasons: extreme response categories rarely chosen by the participants, relatively large number of parameters that must be estimated in relation to the sample size, or (unconsidered) correlations between latent factors [61].
The original factor solutions could not be replicated for the RPQ translations; this is in line with previous research findings, as several factor solutions have been proposed for the RPQ [30,31,72,73]. Since the RPQ has primarily been developed for TBI populations, further investigation of the factorial structure and thus implementation of an appropriate scoring are strongly recommended.
The SF-36v2 (except for the Dutch version showing an acceptable model fit) and the SF-12v2 presented a poor model fit. Neither of these instruments were specifically developed for populations after TBI, and they use a wide range of different response scales formats, which might be confusing and tiring, especially for respondents with cognitive deficits, and affect their response behavior [82] resulting in less good fit of the estimated models [83,84]. For the assessment of generic HRQoL in TBI populations, further investigation of the factorial structure of both PROMs seems appropriate.
Objectivity. The layout and instructions for administering the newly translated PROMs were internationally harmonized and are therefore similar across all language versions. Moreover, instructions for the assessment, scoring, and interpretation were provided (see the SOPs of the CENTER-TBI study). For the interpretation of results, general populationbased norms or reference values are helpful. For example, for the QOLIBRI, populationbased reference values for the UK and the Netherlands have recently been made available [35].
Strengths and limitations of the study. The main strength of the present study is the broad overview of the psychometric properties of the various previous and newly translated and linguistically validated PROMs in the TBI field [20], which had not yet been carried out.
The psychometric results allow researchers and clinicians to rate the quality of the translated questionnaires before selecting them for national and international studies and clinical practice to evaluate outcomes after TBI.
Because of the small sample sizes in some languages, further modern test theoretical analyses cannot be reported here. Additional research concerning the assumption of measurement invariance (MI) across languages could increase the quality of the instruments even further with respect to the international administration and pooling of international data. MI analysis evaluates whether the same construct is understood and measured across different languages. Some of our recent studies have already shown that the PHQ-9 and GAD-7 [85], QOLIBRI [35], and QOLIBRI-OS [39] applied in the field of TBI measure one and the same construct across languages. Furthermore, follow-up studies will focus on assessing measurement invariance comparisons of the different constructs in the individual PROMs in the different languages and the sensitivity and responsiveness of the PROMs for different patient groups and risk factors.
The present study also has some limitations. Despite the large number of participants in the CENTER-TBI core study, the psychometric properties of some translations could not be examined because of the limited number of participants. Consequently, the Danish, Hebrew, Hungarian, Latvian, Lithuanian, Romanian, and Serbian translations of the PROMs need further investigation with a larger number of patients. Furthermore, given the range of TBI severity (mild to severe) covered, we observed that even six months after TBI, participants with higher TBI severity with and without extracranial injuries and polytrauma were not always able to complete the PROMs. To provide robust psychometric analyses in more severe patient groups, future assessments should be also conducted at later time points.

Conclusions
This study provides psychometric characteristics of the PROMs administered in the CENTER-TBI study for individuals after TBI. The psychometric properties of these PROMs are satisfactory to excellent on the scale level in nine European languages. These results highlight the value of a rigid process of translation and linguistic and cultural adaptation of questionnaires that goes far beyond a literal translation and that ensures the cultural comparability of the translated versions. Therefore, researchers and clinicians can now select reliable and valid instruments for clinical use, data collection, and aggregation, when evaluating outcomes after TBI in international studies, thus improving outcome assessment in national and international healthcare. Funding: CENTER-TBI was supported by the European Union 7th Framework programme (EC grant 602150). Additional funding was obtained from the Hannelore Kohl Stiftung (Germany), from OneMind (USA), and from Integra LifeSciences Corporation (USA). The funders of the study had no role in study design, data collection, data analysis, data interpretation, or writing of the report.

Institutional Review Board Statement:
The CENTER-TBI study (EC grant 602150) has been conducted in accordance with all relevant laws of the European Union (EU) if directly applicable or of direct effect and all relevant laws of the country where the recruiting sites were located. Informed consent by the patients and/or the legal representative/next of kin was obtained, accordingly to the local legislations, for all patients recruited in the Core Dataset of CENTER-TBI and documented in the electronic case report form (e-CRF). For the full list of sites, ethical committees, and ethical approval details, see the official CENTER-TBI website (https://www.center-tbi.eu/project/ethical-approval).
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: All relevant data are available upon request from CENTER-TBI, and the authors are not legally allowed to share it publicly. The authors confirm that they received no special access privileges to the data. CENTER-TBI is committed to data sharing and in particular to responsible further use of the data. Hereto, we have a data sharing statement in place: https://www. center-tbi.eu/data/sharing. The CENTER-TBI Management Committee, in collaboration with the General Assembly, established the Data Sharing policy, and Publication and Authorship Guidelines to assure correct and appropriate use of the data as the dataset is hugely complex and requires help of experts from the Data Curation Team or Bio-Statistical Team for correct use. This means that we encourage researchers to contact the CENTER-TBI team for any research plans and the Data Curation Team for any help in appropriate use of the data, including sharing of scripts. Requests for data access can be submitted online: https://www.center-tbi.eu/data. The complete manual for data access is also available online: https://www.center-tbi.eu/files/SOP-Manual-DAPR-20181101.pdf.

Acknowledgments:
We gratefully thank all CENTER-TBI participants and investigators. We are immensely grateful to our patients with TBI for helping us in our efforts to improve care and outcomes for TBI.

Conflicts of Interest:
The authors declare no conflict of interest.

GAD-7
Generalized Anxiety Disorder 7 Item Scale HRQoL Health