Traumatic brain injury (TBI) causes alterations in brain function, as a result of an external force [1
], for example, due to falls, road traffic accidents, sports, assaults, or violence. It is a considerable source of disability and death worldwide. The sequelae of TBI not only impact the lives of those affected and their relatives on many different levels [2
], but they can also result in high direct and indirect costs [3
Concerning the global prevalence of TBI, the vast majority of individuals experience mild TBI (70–90%), approximately 10% to 30% suffer from moderate or severe TBI [5
]. Regardless of the severity, individuals after TBI may suffer from short- or long-term impairments in cognition [7
], psychosocial functioning [9
], health-related quality of life (HRQoL) [10
], mental health [12
], and/or functional disability [14
]. These impairments can be assessed using domain-specific outcome measures.
The data analyzed in this study were collected in the international Collaborative European NeuroTrauma Effectiveness Research in TBI observational study (CENTER-TBI; clinicaltrials.gov NCT02210221), which has been conducted since 2014 in 18 European countries and Israel, with enrolment being completed at the six-month outcome assessment in 2018. This study aimed to capture a contemporary picture of TBI with respect to all severity groups, its care and outcome, to develop precision medicine approaches and apply comparative effectiveness research to identify best practices. It provides insights into the longitudinal detection of somatic, functional, behavioral, psychiatric, cognitive, psychological, and psychosocial sequelae after TBI and can serve as a basis for the development of a new multidimensional assessment approach [15
An important criterion when selecting instruments for research and clinical practice is their psychometric quality. For most patient-reported outcome measures (PROMs) administered in the CENTER-TBI study this had not yet been examined in the field of TBI, nor had the newly translated versions of the instruments been psychometrically investigated. Hence, the present study aims to investigate the classical psychometric properties of the newly and previously translated PROMs in the field of TBI administered in the CENTER-TBI study.
In research and clinical contexts, instruments offer insights into outcome after TBI. The comparability of the translated instruments with their original version and the validation in the field of TBI enables the reliable and valid aggregation of data in multi-center national and international studies on outcomes after TBI.
The study aims are the investigation of:
The reliability (total score, scale, and item level) of the PROMs, comparing them with the values of the original instrument versions to ascertain the quality and comparability of the translations and applicability in the field of TBI;
The convergent and discriminant validity of the PROMs with established measures assessing functional recovery after TBI (GOSE), generic HRQoL (SF-36v2/SF-12v2), and TBI severity (GCS);
The factorial validity using confirmatory factor analyses (CFA) to replicate the original factorial structure of the translated instruments.
2. Materials and Methods
Participants were recruited at 63 centers across 18 countries, from 19 December 2014 to 17 December 2017. Ethical approval was secured for each site and informed consent was obtained from all patients or from their legal representatives. The inclusion criteria for the core study were a clinical diagnosis of TBI, presentation within 24 h after injury, and an indication for a computed tomography (CT) scan. Patients were differentiated into three strata: emergency room (ER; patients primarily evaluated at an ER), admission (ADM; patients admitted primarily to a hospital ward), and intensive care unit (ICU; patients who were primarily admitted to an ICU). Further details can be found elsewhere [16
]. Data were retrieved from the core 2.1 of the CENTER-TBI database using the data access tool Neurobot.
The core study sample included 4509 individuals. In the present study, we focused on participants aged 16 years and above who had completed at least one outcome measure at the six months’ assessment after the TBI. The data were collected either on-site at the hospital by personnel, by face-to-face or telephone interviews (clinical ratings), or via mail (PROMs) and centrally entered using a web-based electronic case report form.
2.2. Sample Charachteristics
Language, sex, age, education, employment, marital status, and living situation were selected as sociodemographic characteristics. Samples were then aggregated by language. More specifically, individuals from German-speaking communities in Austria, Belgium, and Germany were integrated into the German sample, individuals from French-speaking communities in Belgium and France into the French sample, and individuals from Dutch-speaking communities in Belgium or the Netherlands were merged into the Dutch sample. Only few participants (N = 20) received the outcome questionnaires in a language other than in the local language of the participating site. These individuals were classified according to their respective language group: Dutch (7), English (8), German (1), Romanian (3), and Swedish (1).
The following variables were used to characterize extracranial and brain injuries: the individuals’ mental health status before the injury, clinical care pathways, cause of injury, loss of consciousness (LOC), post-traumatic amnesia (PTA), TBI severity (GCS), abnormalities on computed tomography (CT) scans, total injury severity score (ISS), and brain injury severity score from the Abbreviated Injury Scale (AIS) [17
2.3. Pataient-Reported Outcome Measures (PROMs)
Since most instruments applied in the CENTER-TBI study only existed in English, they had to be translated into the languages of the participating countries following a formalized approach (i.e., linguistic validation) to ensure their linguistic, cultural and conceptual comparability in the respective languages [18
]. For more details, see von Steinbuechel et al. [20
The selection of the outcome measures was informed by the Common Data Elements (CDE) recommendations [21
]. For six out of eight PROMs (see instrument description marked with an asterisk * below), at least one translation had to be performed. In this study, we report psychometrics for all eight PROMs newly and previously translated yet not validated instruments in the field of TBI.
The Generalized Anxiety Disorder 7 Item Scale (GAD-7)* [23
] measures the level of generalized anxiety disorder using seven items and a four-point Likert scale (from 0 “not at all” to 3 “nearly every day”). The total score ranges from 0 to 21 with values of 10 and above indicating impairment and cut-offs of 5, 10, and 15 representing mild, moderate, and moderately severe to severe anxiety, respectively [23
The Patient Health Questionnaire (PHQ-9)* [24
] assesses self-reported symptoms of major depression using nine items and a four-point Likert scale (from 0 “not at all” to 3 “nearly every day”). The PHQ-9 total score ranges from 0 to 27 with a score of 10 and above indicating clinically relevant impairment and cut-offs of 5, 10, 15, and 20 indicating mild, moderate, moderately severe, and severe depression, respectively [24
Both the GAD-7 and PHQ-9 were available in almost all languages except for Latvian (GAD-7 and PHQ-9) and Serbian (GAD-7 only). Nevertheless, we conducted analyses on both instruments to examine their psychometric properties in individuals after TBI.
The Posttraumatic Stress Disorder Checklist-5 (PCL-5)* [26
] comprises 20 symptoms of post-traumatic stress disorder (PTSD) based on the Diagnostic and Statistical Manual of Mental Disorders, 5th edition (DSM-5) [27
], using a five-point Likert scale (from 0 “not at all” to 4 “extremely”). The total score ranges from 0 to 80 with higher values indicating greater impairment. For clinical screening, either a cut-off score of 31 [28
] or 33 is applied [29
The Rivermead Post-Concussion Symptoms Questionnaire (RPQ)* 
uses a five-point Likert scale (from 0 “not experienced at all” to 4 “a severe problem”) to evaluate the following 16 post-concussion symptoms: headaches, dizziness, nausea and/or vomiting, noise sensitivity, sleep disturbance, fatigue, irritability, depression, frustration, forgetfulness and poor memory, poor concentration, slow thinking, blurred vision, light sensitivity, double vision, and restlessness. Participants rate how much they have been suffering from these symptoms during the past 24 h compared with their condition before the accident. The RPQ total score ranges from 0 to 64 with cut-offs of 13, 25, and 33 indicating mild, moderate, and severe symptoms, respectively [31
The Quality of Life after Brain Injury Scale (QOLIBRI)* [32
] measures TBI-specific HRQoL in individuals after TBI. It consists of six domains comprising 37 items using a five-point Likert scale (from 0 “not at all” to 4 “very”). The six domains comprise cognition, self, daily life and autonomy, social relationships, emotions, and physical conditions. The total score is transformed linearly to range from 0–100, whereby higher values indicate better TBI-specific HRQoL [34
]. Patients after TBI with a score below 60 may be assumed to display impaired HRQoL [34
]; country-specific reference values can be found elsewhere [35
]. For the QOLIBRI, psychometric criteria of almost all target language versions involved in the present study (except for Swedish) had already been published [32
]. The Spanish translation was published after CENTER-TBI had started [37
]. To be congruent with the analyses of other PROMs, we replicated the psychometric analyses for the nine language versions of the QOLIBRI.
The Quality of Life after Brain Injury—Overall Scale (QOLIBRI-OS)* [38
] is the short version of the QOLIBRI measuring the physical condition, cognition, emotions, daily life and autonomy, social relationships, and current and future prospects with using six items. The items are answered on a five-point Likert scale (from 0 “not at all” to 4 “very”). Patients after TBI with a score below 52 may be assumed to display impaired HRQoL [34
]; country-specific reference values can be found elsewhere [39
]. For the QOLIBRI-OS too, psychometric properties have already been examined in almost all languages, except for Spanish and Swedish [38
]. Here, again, psychometric analyses were replicated in all languages to be congruent with the other PROMs.
The 36-item Short Form Health Survey—Version 2 (SF-36v2) [40
]. The SF-36v2 measures subjective health status using 36 items with various response formats for each of the eight scales (from dichotomous “yes/no” to polytomous five-point Likert scale responses). The scales can be summed to produce the physical component score (PCS) and mental component score (MCS) measuring physical and mental functioning, respectively. Both scores range from 0 to 100 with higher values indicating better HRQoL. The values can be transformed into T
= 50, SD
= 10) based on a normative U.S. sample. A value below 47 on a single health domain scale or component summary score is indicative of functional impairment in comparison to the U.S. population [40
The 12-Item Short Form Survey—Version 2 (SF-12v2) [42
] is a short, 12-item version of the SF-36v2. The scores range from 0 to 100 with higher values indicating better HRQoL. The raw values can be transformed into T
= 50, SD
= 10) based on a normative U.S. sample. However, the authors recommend using country- and group-specific cut-off values as not every country/group has a mean health of 50 [42
]. In the CENTER-TBI study, the SF-12v2 was found to have more missing data than the SF-36v2. Therefore, to increase the power for the calculation of the PCS and MCS of the SF-12v2, missing values were replaced by values derived from the respective items of the SF-36v2 and combined with reported data. For the analyses on the item level, only reported data were used.
The SF-36v2 and SF-12v2 translations were already available in the target languages and had to be purchased from Optum for one-time use [44
]. However, since most translated versions of both the SF-36v2 and the SF-12v2 were not subjected to psychometric analyses in the field of TBI, they were included in the analyses of the present study. Both instruments were also used for validity analyses.
2.4. Clinician-Reported Outcome (ClinRo) and a Clinical Scale
The instruments listed below were used to analyze convergent and discriminant validity.
The Glasgow Outcome Scale Extended (GOSE) [45
] is a clinician-reported outcome (ClinRo) of functional recovery after TBI using an eight-point scale (1 = dead, 2 = vegetative state, 3/4 = lower/upper severe disability, 5/6 = lower/upper moderate disability, 7/8 = lower/upper good recovery) and is based on structured interviews (GOSE) or self-ratings by individuals after TBI or their proxy (the questionnaire version; GOSE-Q [46
]). Missing GOSE values were centrally replaced by values derived from the GOSE-Q. Since the GOSE-Q is not able to differentiate between vegetative state and lower severe disability, GOSE levels 2 and 3 were collapsed into one category. The missing values at six-months outcome assessments were imputed using a multi-state model; the imputation procedure is described elsewhere [47
]. The GOSE was not subjected to reliability analyses, as it would require data from independent raters to provide interrater reliability, which was not available in the CENTER-TBI database.
The Glasgow Coma Scale (GCS) [48
] allows healthcare professionals to consistently evaluate the level of consciousness of individuals after TBI, also classifying the severity of TBI. The GCS scores range from 3 (no response) to 15 (normal level) with higher values indicating less impaired consciousness and lower TBI severity. Scores of 13 to 15 indicate mild TBI, 9 to 12 moderate TBI, and 3 to 8 severe TBI.
2.5. Statistical Analyses
The present study focuses on the analyses of reliability, convergent and discriminant validity of eight PROMs in nine TBI language samples with enough participants (i.e., at least 50 participants in the Dutch, English, Finnish, French, German, Italian, Norwegian, Spanish, and Swedish samples) as well as factorial validity in six samples (i.e., at least 150 participants in the Dutch, English, Finnish, Italian, Norwegian, and Spanish samples). Figure 1
provides an overview of our psychometric analyses according to the classical test theoretical (CTT) criteria with the respective cut-off values [49
2.6. Descriptive Statistics
Descriptive statistics include information on the sample sizes, percentage of missing data, mean (M
), standard deviation (SD
), skewness (SK
), and kurtosis (KU
) for each item per language version of an instrument and an average of the item characteristics across all languages. For skewness, values less than −1 or greater than 1 indicate a highly skewed distribution; values from ±1 to ±0.5 show that the distribution is moderately skewed; values from −0.5 to +0.5 denote a symmetrical distribution. For asymmetry and kurtosis, values between −2 and +2 are considered acceptable [50
For reliability analyses, researchers often accept data of 30 participants as being sufficient to detect a required minimal effect of 0.70 as a cut-off value for reliability coefficients [51
]. However, some researchers argue that larger sample sizes are required to avoid bias [51
]. In the present study, reliability coefficients were therefore only calculated if the sample size comprised at least 50 individuals per language, to provide more robust results.
To examine the reliability of each instrument, Cronbach’s alpha, split-half reliability with the Spearman–Brown correction (odd vs. even items), and Cronbach’s alpha if an item is omitted were reported. Both, the split-half reliability and the Cronbach’s alpha if item omitted were calculated for scales with at least three items. Although different recommendations in terms of cut-off points for the Cronbach’s alpha do exist, there is an agreement that in group comparisons Cronbach’s alpha should reach at least a value of 0.70 implying acceptable internal consistency [53
]; an alpha above 0.90 indicates excellent internal consistency [54
]. The Cronbach’s alpha value, if an item has been omitted, should not exceed the total Cronbach’s alpha of a scale. A value higher than the total Cronbach’s alpha indicates that the excluded item decreases the reliability of the instrument and requires further revision [55
To evaluate the discriminating ability of the items, item–total correlations either at the scale or at the total score level, or both were calculated. A correlation coefficient of 0.30, corresponding to a medium effect size, was chosen as the cut-off criterion, based on the guidelines for effect size proposed by Cohen [56
]. An item–total correlation below 0.30 implies that the item cannot discriminate well between high-performing and low-performing individuals. Furthermore, low item–total correlations, especially at the scale level, may identify irregularities of the factorial structure of an instrument.
2.8.1. Convergent and Discriminant Validity
All language samples analyzed in this study included at least 50 observations, which is recommended for validity analyses [58
Spearman correlation coefficients were used to examine associations between the GOSE, physical (PCS) and the mental component score (MCS) of the SF-36v2 and SF-12v2, and the total scores/domain-specific scores of all other measures.
Discriminant validity was investigated by calculating Spearman correlation coefficients for the GCS and the total and scale scores of all instruments, to be in line with analyses already provided in the field of TBI [59
]. To evaluate the strength of correlations, the Cohen criteria [56
] were applied to identify small (0.10), medium (0.30), and large (0.50) effect sizes.
2.8.2. Factorial Validity
Factorial validity was examined by means of confirmatory factor analyses (CFA) and a robust weighted least squares estimator (WLSME) for ordinal data, whereby only the original factor structure of the instruments was analyzed. Therefore, one-factor solutions were estimated for the GAD-7 [23
], the PHQ-9 [24
], the RPQ [30
], and the QOLIBRI-OS [38
]. For the other instruments, respective multiple scale models were inspected: a four-factor model for the PCL-5 [26
], a five-factor model for the QOLIBRI [32
], an eight-factor model with two second-order factors for the SF-36v2 [40
], and finally, a two-factor model for the SF-12v2 [42
]. In CFA analyses, samples should comprise at least 150 observations to provide stable results [60
]. Therefore, only language samples fulfilling this criterion were analyzed.
The model fit was evaluated based on the following fit indices using the respective cut-off values (in paratheses): χ2
statistics with respective p
> 0.01) [61
], comparative fit index (CFI > 0.95) [62
], Tucker–Lewis index (TLI > 0.95 [63
]) root mean square error of approximation (RMSEA < 0.06) [64
] with a 90-percent confidence interval (CI), and standardized root mean square residual (SRMR < 0.08) [63
]. As some of the fit indices may be biased (e.g., χ2
test can be influenced by large sample size [61
]), all indices were considered simultaneously to evaluate the model fit. Furthermore, item loadings over 0.50 were considered acceptable and over 0.70 desirable [53
Analyses were performed using the packages psych [65
] for psychometric characteristics and lavaan [66
] for the factorial validity analyses applying the R version 4.0.2 [67
2.9. Comparability of the Translated Versions
To evaluate the quality of the translated versions of the eight PROMs, psychometric criteria obtained from the CENTER-TBI language samples were compared with those reported for the original English instrument versions. For this purpose, a systematic literature search was carried out. Psychometric characteristics were compared with those obtained from the original validation studies in the original populations, for which the respective instrument was developed. If available, they were also compared with the validation studies in the field of TBI. If the original articles did not provide information on all coefficients, these were retrieved from more recent studies.
These comparisons were confined to the reliability coefficients (i.e., Cronbach’s alpha coefficients, split-half or test–retest reliability), as validity testing in the original studies was performed using instruments not applied in the CENTER-TBI study. Instruments showing reliability within the same ranges (i.e., <0.70—acceptable, 0.70–0.89—good, ≥0.90—excellent) or higher in both original and the PROMs applied in the CENTER-TBI study were considered comparable.
The present study examined psychometric properties of the eight PROMs administered in the CENTER-TBI study in individuals after TBI. Many of them were translated and linguistically validated for this study; others had not yet been psychometrically investigated in the field of TBI. Therefore, a classical test theorical framework was applied.
The results of the reliability and validity analyses performed on the PROMs indicate that most newly translated and already existing questionnaires generally displayed satisfactory to excellent psychometric characteristics in the field of TBI and were comparable to each other as well as to the original English versions investigated predominantly in non-TBI samples, in individuals after TBI, or both. On the scale level, high internal consistency and scale reliability of the newly translated and already existing instruments across all languages were observed. On the item level, only very few items from a few questionnaires demonstrated irregularities, mostly in no more than one language. However, the factorial validity analyses of the original instruments revealed some difficulties in the replicating the original factorial structures, indicating a need for further investigations.
Some translations displayed problems at the item level, displaying lower correlations with the respective total scale scores: the item “Moving or speaking so slowly that other people could have noticed”
from the Swedish PHQ-9, the items “Nausea”
in the Swedish and the German RPQ and “Double Vision”
in the German RPQ, and the item “How bothered are you by feeling angry or aggressive”
from the German QOLIBRI. Item–total correlations are directly related to the factorial structure of a questionnaire; therefore, low correlations may indicate that the questionnaire does not measure unidimensionally. The QOLIBRI consists of five scales; thus, the low correlation of the item “How bothered are you by feeling angry or aggressive”
in the German translation is not problematic, as the scale level and total score level characteristics were satisfactory. Moreover, the low item–total correlations of the RPQ translations are not unexpected, as the questionnaire underwent several revisions regarding the scoring by different authors [30
], whereby the items “Nausea”
and “Double Vision”
were assigned to different domains. Nevertheless, the low correlation of the item “Moving or speaking so slowly that other people could have noticed”
in the Swedish PHQ-9 is more difficult to explain, as the PHQ-9 is a unidimensional measure. Problems with the wording might be a possible explanation, or more likely the composition of the respective language sample. The Swedish sample contained the most severely impaired patients (GCS), with the lowest functional level of recovery (GOSE) and the highest injury severity score (AIS). Thus, individuals in the Swedish sample seem to be more severely injured compared to other language samples. Therefore, the low correlation of this PHQ-9 item may be attributable to the particularities of the Swedish sample. Future research could review the wording of this item and examine the Swedish PHQ-9 in a broader spectrum of TBI severities.
Additionally, one item from the PCL-5 (“Trouble remembering important parts of the stressful experience
”) did not distinguish well between low and high levels of PTSD across all languages and displayed low correlations with the scale score (i.e., DSM-5 cluster) in French, Norwegian, and Swedish translations. The factorial structure of the original PCL-5 has been examined on several occasions [74
], whereby this item was re-assigned to different dimensions. The results of the present study indicate that PCL-5 translations have adopted the methodological problem of the original questionnaire version. Thus, further investigation of the factorial structure of the PCL-5 could lead to an amelioration of the questionnaire’s psychometric characteristics.
As expected, the validity inspection of the PROMs (newly translated and available prior to the CENTER-TBI study) indicated medium to strong correlations with the SF-36v2, the SF-12v2, and the GOSE in most languages. The PCS and MCS of the SF-36v2 and SF-12v2 generally demonstrated negatively medium to strong negative correlations with the GAD-7, PHQ-9, PCL-5, and RPQ. One exception was the GAD-7, which revealed a low correlation with the PCS of the SF-36v2 in English and Swedish and the PCS of the SF-12v2 in six out of nine languages (Dutch, English, French, Italian, Norwegian, and Swedish). This might be attributable to the items of the SF-12v2 constituting the PCS in the original version. While the items of the SF-36v2 cover a wider range of physical activities and activity-related problems, the items of the SF-12v2 focus on a limited number of physical problems that are most probably associated less with anxiety. Nevertheless, the results are generally in line with previous findings suggesting that negative emotions (i.e., anxiety, depression, or stress) are highly correlated with generic HRQoL, especially with the MCS [76
]. Moreover, the assumption that the mental and physical components of the SF-36v2 and the SF-12v2 would have strong positive correlations with the QOLIBRI and the QOLIBRI-OS was affirmed across all languages, supporting results from previous studies [34
Generally, the GAD-7 (except for the German language sample), the PHQ-9, the PCL-5 (except for the German language sample), and the RPQ exhibited medium to strong negative correlations with the GOSE. The German individuals after TBI had a relatively high recovery rate with 50% of full recovery after six months (i.e., GOSE = 8); they suffered a less severe TBI (50% had GCS of 15) and were, consequently, less impaired, as reflected by the low correlation. These results are in line with previous research showing that the functional recovery status after TBI is frequently associated with the absence of mental health problems [79
] and post-concussion symptoms [80
], and vice versa. The GOSE also revealed medium to strong positive correlations with the QOLIBRI and QOLIBRI-OS across all languages, indicating that higher disease-specific HRQoL is associated with better functional outcomes, which is in line with previous research findings [34
Further, the TBI severity as assessed by the GCS rating the degree of consciousness displayed a low association with both the psychological and health-related PROMs in almost all languages except for the Swedish translations of the PHQ-9, the RPQ, the QOLIBRI, and the QOLIBRI-OS. Previously published validity results in the field of TBI [59
] found no association between GCS and psychological outcomes and post-concussion symptoms. These populations contained a lower number of more severely injured individuals, as measured by the ISS, and therefore, smaller or no correlations were found. In the Swedish translations, the higher association of the GCS and the outcomes in the Swedish sample might be explained by the higher injury severity and stronger polytrauma of the participants.
Overall, the original factorial structures suggested by the instrument developers were replicated for the GAD-7, the PHQ-9, and the QOLIBRI-OS. The translations of the PCL-5 displayed an acceptable model fit, indicating that the initial factorial structure describes the data well. Nevertheless, we recommend a further investigation of the item “Trouble remembering important parts of the stressful experience”
which displayed irregularities in both reliability analyses across all languages and the factor loadings of the CFA in some translations. The five-factor structure of the original QOLIBRI was replicated in all but two language samples; the English and Finnish models did not converge. This could be due to several reasons: extreme response categories rarely chosen by the participants, relatively large number of parameters that must be estimated in relation to the sample size, or (unconsidered) correlations between latent factors [61
The original factor solutions could not be replicated for the RPQ translations; this is in line with previous research findings, as several factor solutions have been proposed for the RPQ [30
]. Since the RPQ has primarily been developed for TBI populations, further investigation of the factorial structure and thus implementation of an appropriate scoring are strongly recommended.
The SF-36v2 (except for the Dutch version showing an acceptable model fit) and the SF-12v2 presented a poor model fit. Neither of these instruments were specifically developed for populations after TBI, and they use a wide range of different response scales formats, which might be confusing and tiring, especially for respondents with cognitive deficits, and affect their response behavior [82
] resulting in less good fit of the estimated models [83
]. For the assessment of generic HRQoL in TBI populations, further investigation of the factorial structure of both PROMs seems appropriate.
. The layout and instructions for administering the newly translated PROMs were internationally harmonized and are therefore similar across all language versions. Moreover, instructions for the assessment, scoring, and interpretation were provided (see the SOPs of the CENTER-TBI study). For the interpretation of results, general population-based norms or reference values are helpful. For example, for the QOLIBRI, population-based reference values for the UK and the Netherlands have recently been made available [35
Strengths and limitations of the study
. The main strength of the present study is the broad overview of the psychometric properties of the various previous and newly translated and linguistically validated PROMs in the TBI field [20
], which had not yet been carried out.
The psychometric results allow researchers and clinicians to rate the quality of the translated questionnaires before selecting them for national and international studies and clinical practice to evaluate outcomes after TBI.
Because of the small sample sizes in some languages, further modern test theoretical analyses cannot be reported here. Additional research concerning the assumption of measurement invariance (MI) across languages could increase the quality of the instruments even further with respect to the international administration and pooling of international data. MI analysis evaluates whether the same construct is understood and measured across different languages. Some of our recent studies have already shown that the PHQ-9 and GAD-7 [85
], QOLIBRI [35
], and QOLIBRI-OS [39
] applied in the field of TBI measure one and the same construct across languages. Furthermore, follow-up studies will focus on assessing measurement invariance comparisons of the different constructs in the individual PROMs in the different languages and the sensitivity and responsiveness of the PROMs for different patient groups and risk factors.
The present study also has some limitations. Despite the large number of participants in the CENTER-TBI core study, the psychometric properties of some translations could not be examined because of the limited number of participants. Consequently, the Danish, Hebrew, Hungarian, Latvian, Lithuanian, Romanian, and Serbian translations of the PROMs need further investigation with a larger number of patients. Furthermore, given the range of TBI severity (mild to severe) covered, we observed that even six months after TBI, participants with higher TBI severity with and without extracranial injuries and polytrauma were not always able to complete the PROMs. To provide robust psychometric analyses in more severe patient groups, future assessments should be also conducted at later time points.