1. Introduction
In the current Romanian health technology assessment (HTA) process, decisions on reimbursement for the use of new technologies have not been conditioned by a threshold of effectiveness or an analysis of the budgetary impact [
1]. More exactly, the process has so far used a scorecard system called “de facto” or “rapid” HTA [
2,
3]. This system is based, among others, on determining the number of countries where reimbursements for the use of new technologies have already been implemented, with a key role in deciding what new technologies will receive funding based on reimbursement decisions in the UK, Scotland, Germany, and France [
4]. The Romanian authorities have expressed their intention to make the transition to a complete HTA process based, interalia, on cost-utility studies, using real-world data that require country-specific costs and utilities [
4,
5].
The determination of the costs depends on the particularities and the specific structure of the Romanian health system. On the other hand, utilities (index values) reflect the preference of the general population for different health states and are obtained using various methods, such as time trade-off (TTO), standard gamble (SG), and visual analogue scale (VAS), derived from the national general population samples [
6]. The collection of utilities for different health states, also known as value sets, allows comparisons between different types of interventions and treatments for different diseases. These comparisons are essential for making decisions on how to distribute healthcare resources, thus supporting the HTA process.
The best known tool for measuring health is the EQ-5D-3L introduced by EuroQoL in the 1990s. This is an easy to administer generic tool, which allows the measurement of various health conditions during the evolution of a patient’s disease as well as the comparison of results with other disease areas. [
7,
8] The EQ-5D-3L consists of the descriptive system and the visual analogue scale (EQ-VAS). The descriptive system captures five dimensions of health-related quality of life (HRQoL), namely, mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. EuroQol’s first instrument, EQ-5D-3L, uses only three levels of discrimination for each dimension (no problems, some problems, extreme problems). The EQ-5D-3L describes 243 health states that result from combining the three response possibilities on five dimensions. The EQ-VAS consists of a visual scale that ranges from the best condition you can imagine to the worst imaginable state, divided into 100 units. VAS is used as a quantitative measure of the perception of one’s health [
6]. Later, in 2009, EuroQoL developed a tool based on five levels of discrimination and a tool dedicated to children and adolescents [
9].
To measure health outcomes, country-specific index values (utilities) have been developed in many other countries: in Belgium [
10], Denmark [
11], France [
12], Germany [
13,
14], Greece [
15], Holland [
16], Italy [
17], Poland [
18,
19], Portugal [
20], Spain [
21], Slovenia [
22], Sweden [
23], and the United Kingdom [
24,
25]. The transition to the complete HTA also implies the existence of a set of index values for EQ-5D-3L based on the social preferences of the general Romanian population [
20]. We aimed, as the main objective of our study, to determine the values for the different health states of EQ-5D-3L using a TTO method.
Evidence shows that there are minor differences in the value sets of countries with a comparable economic level [
26,
27]. These differences seem to be due to the socio-economic context, the characteristics of the health status of the population, the socio-demographic characteristics, and less due to the technique of estimation or methodology. Hence, comparing the results of cost-utility studies from countries whose value sets are very different can give misleading results and subsequently determine misuse of healthcare resources [
28,
29,
30,
31,
32,
33,
34,
35]. The premise that what is cost-effective in the UK, Germany, Scotland, or France is just as cost-effective in Romania is questionable, because Romania allocates the lowest amount for health in the EU (Romania spends only 814 euros per head per capita, 3 times less than the average for European countries) [
32]. One of the main barriers to the development of HTA in Romania is precisely the absence of a standardized preference value for the EQ-5D or another HRQol instrument for measuring health outcomes. Prior to the present study, in the absence of a national set of values for any of the EQ-5D instruments, Romanian researchers have used the value set from another jurisdiction, more specifically the value set from the UK [
33,
34,
35,
36].
3. Results
A total of 1674 people were interviewed. Refusal rates varied from 0% to 73%, being higher in urban areas. The study was stopped when the minimum valid number of interviews was reached.
Of the total interviews performed, 25 respondents were excluded for having been interviewed by interviewers that were later on excluded from the team of interviewers due to noncompliance and poor interviewing performance (dataset V1: 1649 respondents). Another 81 people were excluded based on exclusion criteria b, c, and d; they had been interviewed by interviewers who had more than 40% of the interviews flagged or conducted less than 20 interviews, or the interviewer did not show the worse than dead element of the training part of the survey and no negative values were elicited for all health states presented. Finally, 12 people were excluded because they were marked either as illogical or nontraders, or had the same value (different from 1) for all evaluated states.
Only nine respondents had inconsistencies in their health state valuations in the V3 dataset, and 15 in the V1 dataset.
Sociodemographic characteristics for the final dataset (V3 = 1556), weighted and unweighted, are presented in
Table 2. As shown in
Table 2, women and urban areas were overrepresented in our sample. Sociodemographic characteristics for dataset V1 used for the sensitivity analysis are presented in
Table S1. The mean age was 48.50 years (SD = 16.21) for the final dataset and 48.43 (SD = 16.35) for the dataset used for sensitivity analysis (V1). The mean VAS3L was 83.45 (SD = 14.38) and the mean utility for observed health states was 0.50 (SD = 0.46) for the final dataset. Similar values were obtained for the dataset used for sensitivity analysis (V1: mean VAS3L = 83.50 (SD = 14.49); mean utility = 0.51 (SD = 0.46)).
We computed the mean, standard deviation (SD), median, and quartiles for the observed cTTO values (
Table 3). The mean values ranged from 0.942 for state 21111 to −0.510 for state 33333, with similar median values (from 0.95 for states 11112, 11121, 12111, 11211, 21111 to −0.60 for state 33333). The standard deviations seemed to increase as profiles indicated worse health states, which was an early indication of heteroskedasticity. This finding is similar to those in other countries [
12,
16,
45]. One-third of the states had no negative value evaluation (22222, 22121, 12212, 11122, 21211, 11112, 11121, 12111, 11211, and 21111), and among the rest of the sample, the percent ranged between 0.64% for state 12222 to 78.57% for state 33333. Each of the 30 states was evaluated by at least 149 respondents (
Table 3).
We began our model testing process with the simplest one, the ordinary least squares model (OLS). A list of all models that were tested can be found in
Table S2. After having estimated the OLS model, we found an indication of strong heteroskedasticity in the data, which we confirmed using the Breusch–Pagan test (
p < 0.0001). Hence, we decided that all our candidate models had to account for heteroskedasticity besides the significance and logical consistency of parameters.
Table 4 presents a list of the candidate models with the highest number of consistent and significant parameters corrected for heteroskedasticity and/or accounting for the censored nature of the data. As seen in
Table 4, our candidate models for the final value set were the robust ordinary least square model (ROLS), interval regression model (IRM), and interval regression model censored at −1 (IRMC). We tested all models for goodness-of-fit, focusing on the ones with the smallest AIC/BIC (
Table 4). The IRM and IRMC models had the lowest AIC/BIC. Given that the prediction accuracy, the range of values, and ranking of dimensions were very similar for both IRM and IRMC, we chose IRM as our final model given its lowest AIC/BIC. The full model can be found in
Table S3.
The form of the final chosen model (IRM from
Table 4), is presented below:
ROLS, robust ordinary least-squares; IRM, interval regression model; IRMC, interval regression model censored at −1.
MO—Mobility; SC—Self-care; UA—Usual activities; PD—Pain/discomfort; AD—Anxiety/depression.
All coefficients of the dummy variables were significant at 0.05 level, meaning that having any type of problem with mobility, self-care, usual activities, pain/discomfort, or anxiety/depression significantly decreased the utility (
Table 4). Predicted values for EQ-5D-3L are shown in
Table S4.
Most utility decrease was estimated for severe problems, with a cumulative impact of 1.37 utility units. This led to a negative utility of 0.4 for the health state 33333, which was the worst possible state, with more severe problems for all dimensions. Issues with mobility and pain/discomfort had the biggest impact, causing a drop in utility of 0.39 and 0.37 units, respectively. The cumulative effect of the other three dimensions was smaller than the effect of the first two taken together, which suggests that for severe problems with mobility and pain, the quality of life of a person is worse than for severe problems with anxiety/depression, being unable to take care of oneself, and carrying on with usual activities.
Moderate problems on the five dimensions had a total impact of 0.25 utility units, leading to a utility for the 22,222 health state of 0.72 units. About half of the impact came from moderate pain/discomfort (0.07) and anxiety/depression (0.05). In contrast to severe problems, in this category mobility had the smallest impact.
Based on the results obtained for the two categories, namely severe and moderate problems, the conclusion was that pain and discomfort is an important factor in perceived utility, regardless of its severity. For the other dimensions, mobility is perceived as a major impediment only if the problems are severe, while depression and anxiety matter more for moderate problems. Being able to perform self-care tasks and usual activities, while having a statistically significant impact, are not seen as major contributors to final utility.
To test the robustness of our model, we estimated and tested the IRM model (RO model) using all available responses (dataset V1) and a weighted version of dataset V3 (
Table 5). The RO model performed the worst in all categories except for prediction accuracy when it was run using all available data (V1). The model performed the best when it was estimated on V3 in terms of AIC/BIC and prediction accuracy, and had similar performance in both V3 and weighted V3 datasets in terms of ranking of dimensions and number of WTD health states. The full model runs on both V1 and weighted V3 can be found in
Tables S5 and S6.
Finally, we tested the prediction accuracy of the RO model in dataset V1 and the weighted V3 dataset by comparing the predicted values with the observed mean TTO values for each evaluated health state. As shown in
Figure 1, the model estimated well the mean observed values in all cases.
We compared the observed cTTO values from our study with the observed TTO values from the UK MVH study [
43] for the 14 health states that were common to both studies.
Table 6 shows the observed means (Observed) and standard deviations (SD) for both countries, the number of respondents (n) who evaluated the health states in each study, and the predicted values (Predicted). Additionally, we tested the statistical significance of the differences between the observed means for Romania and the observed means in the UK. We found significant results at the 0.05 level for all compared states, except 33333. Values for Romania were generally higher than those recorded in the UK for all states, except 33232, for which Romanian values were significantly smaller (
Table 6). Differences between health states ranged from −0.42 (for 21133) to −0.06 (corresponding to 11121) (see
Figure 2).
When comparing the estimated values for all health states, the values for the Romanian EQ-5D-3L value set were higher than the values for the UK value set, but fairly similar to the values for the Polish EQ-5D-3L value set, although the estimations of individual health states differed (
Figure 3).
4. Discussion
Our study estimated for the first time in Romania a value set for the EQ-5D-3L questionnaire. This constitutes a stepping stone to further development of HTA in Romania, as it will potentially lead to more transparent and consistent decision-making in healthcare and more efficient use of relatively scarce local resources.
To develop our EQ-5D-3L value set, we tested several regression models. We chose the interval regression model as our final model because of all candidate models, it performed the best in terms of AIC/BIC and had similar performance with the second-best model in terms of prediction accuracy, range of values, and number of WTD health states. Our final model accounted for heteroskedasticity and all coefficients were significant at the level of 0.05. Finally, the model provided utility estimates with a range similar to the observed ones.
We compared our value set with those of the UK and Poland. We chose the UK because HTA results from the UK are often used as a guide for the Romanian HTA and because local researchers have used this value set in the absence of a local one. Even though differences were found between the two value sets, these might also be because the EQ-5D-3L valuation methodology has changed in the meantime with the use of cTTO and computer-assisted interviews. This will more likely lead to a decrease in interviewer bias, processing errors, and easier randomization of the question order. [
46] We also compared our value set with that of Poland due to the higher similarities in economic and historical background with Romania. Nevertheless, intercountry differences were still observed, thus stressing the importance of using country-specific value sets for instruments such as the EQ-5D and calling for an urgent refinement of current HTA practices in Romania. This is supported by an increasing body of literature that shows that using multinational value sets or other countries’ value sets might misrepresent the value sets of individual countries [
47,
48].
Our sensitivity analyses performed using dataset V1 were conducted on more relaxed criteria than the primary analysis and showed that modeling can be severely undermined by data of poor quality. This is in line with other studies’ results that show that data not meeting the minimum quality criteria as set by the EQ-VT software can lead to low face validity, difficulties in data modeling, and measurement errors with a final value set not discriminating very well between more severe health states [
14,
49]. In our sensitivity analysis, we did not explore the effect of excluding inconsistent respondents from our model. We based our decision on the results of a systematic review of exclusion criteria in national health state valuation studies that showed that the effect of excluding inconsistent respondents on national tariffs was not consistent [
50].