Estimating an EQ-5D-3L Value Set for Romania Using Time Trade-Off

Objective: To provide health-related quality of life (HRQoL) data to support health technology assessment (HTA) and reimbursement decisions in Romania, by developing a country-specific value set for the EQ-5D-3L questionnaire. Methods: We used the cTTO method to elicit health state values using a computer-assisted personal interviewing approach. Interviews were standardized following the most recent version of the EQ-VT protocol developed by the EuroQoL Foundation. Thirty EQ-5D-3L health states were randomly assigned to respondents in blocks of three. Econometric modeling was used to estimate values for all 243 states described by the EQ-5D-3L. Results: Data from 1556 non-institutionalized adults aged 18 years and older, selected from a national representative sample, were used to build the value set. All tested models were logically consistent; the final model chosen to generate the value set was an interval regression model. The predicted EQ-5D-3L values ranged from 0.969 to 0.399, and the relative importance of EQ-5D-3L dimensions was in the following order: mobility, pain/discomfort, self-care, anxiety/depression, and usual activities. Conclusions: These results can support reimbursement decisions and allow regional cross-country comparisons between health technologies. This study lays a stepping stone in the development of a health technology assessment process more driven by locally relevant data in Romania.


Introduction
In the current Romanian health technology assessment (HTA) process, decisions on reimbursement for the use of new technologies have not been conditioned by a threshold of effectiveness or an analysis of the budgetary impact [1]. More exactly, the process has so far used a scorecard system called "de facto" or "rapid" HTA [2,3]. This system is based, among others, on determining the number of countries where reimbursements for the use of new technologies have already been implemented, with a key role in deciding what new technologies will receive funding based on reimbursement decisions in the UK, Scotland, Germany, and France [4]. The Romanian authorities have expressed their intention to make the transition to a complete HTA process based, interalia, on cost-utility studies, using real-world data that require country-specific costs and utilities [4,5].
The determination of the costs depends on the particularities and the specific structure of the Romanian health system. On the other hand, utilities (index values) reflect the preference of the general population for different health states and are obtained using various methods, such as time trade-off (TTO), standard gamble (SG), and visual analogue scale (VAS), derived from the national general population samples [6]. The collection of utilities for different health states, also known as value sets, allows comparisons between different types of interventions and treatments for different diseases. These comparisons are essential for making decisions on how to distribute healthcare resources, thus supporting the HTA process.
The best known tool for measuring health is the EQ-5D-3L introduced by EuroQoL in the 1990s. This is an easy to administer generic tool, which allows the measurement of various health conditions during the evolution of a patient's disease as well as the comparison of results with other disease areas. [7,8] The EQ-5D-3L consists of the descriptive system and the visual analogue scale (EQ-VAS). The descriptive system captures five dimensions of health-related quality of life (HRQoL), namely, mobility, self-care, usual activities, pain/discomfort, and anxiety/depression. EuroQol's first instrument, EQ-5D-3L, uses only three levels of discrimination for each dimension (no problems, some problems, extreme problems). The EQ-5D-3L describes 243 health states that result from combining the three response possibilities on five dimensions. The EQ-VAS consists of a visual scale that ranges from the best condition you can imagine to the worst imaginable state, divided into 100 units. VAS is used as a quantitative measure of the perception of one's health [6]. Later, in 2009, EuroQoL developed a tool based on five levels of discrimination and a tool dedicated to children and adolescents [9].
To measure health outcomes, country-specific index values (utilities) have been developed in many other countries: in Belgium [10], Denmark [11], France [12], Germany [13,14], Greece [15], Holland [16], Italy [17], Poland [18,19], Portugal [20], Spain [21], Slovenia [22], Sweden [23], and the United Kingdom [24,25]. The transition to the complete HTA also implies the existence of a set of index values for EQ-5D-3L based on the social preferences of the general Romanian population [20]. We aimed, as the main objective of our study, to determine the values for the different health states of EQ-5D-3L using a TTO method.
Evidence shows that there are minor differences in the value sets of countries with a comparable economic level [26,27]. These differences seem to be due to the socio-economic context, the characteristics of the health status of the population, the socio-demographic characteristics, and less due to the technique of estimation or methodology. Hence, comparing the results of cost-utility studies from countries whose value sets are very different can give misleading results and subsequently determine misuse of healthcare resources [28][29][30][31][32][33][34][35].
The premise that what is cost-effective in the UK, Germany, Scotland, or France is just as cost-effective in Romania is questionable, because Romania allocates the lowest amount for health in the EU (Romania spends only 814 euros per head per capita, 3 times less than the average for European countries) [32]. One of the main barriers to the development of HTA in Romania is precisely the absence of a standardized preference value for the EQ-5D or another HRQol instrument for measuring health outcomes. Prior to the present study, in the absence of a national set of values for any of the EQ-5D instruments, Romanian researchers have used the value set from another jurisdiction, more specifically the value set from the UK [33][34][35][36].

Survey Design and Sampling Framework
We conducted a cross-sectional survey in non-institutionalized persons older than 18 years, selected from the Romanian population. The data collection occurred from November 2018 to November 2019. We selected study participants from both rural and urban areas, out of a total number of 3181 settlements. Settlements were randomized and stratified to ensure representativeness, using the Romanian electoral register as the sampling frame. A total of 32 settlements (primary sampling units) were selected. Within each settlement, households and individuals were selected using a random route sampling method and the next birthday rule (the person whose birthday was closest to the interview date was interviewed), respectively [37]. Respondents signed a written consent to take part in the research and were offered no incentives for their participation. Ten percent of respondents were contacted by phone to verify that all data collection procedures were implemented as intended.
The sample size needed for national representativeness was estimated at 1794 with a maximum error of ±3% for a confidence level of 95% and including a 10% nonresponse rate [38]. The sample size needed for an EQ-5D-3L valuation study is 300 people [16]. To meet the representativeness criteria and to allow implementation of a parallel EQ-5D-5L valuation study, the final sample size was set at 1794. This article only targets the results of the EQ-5D-3L.

Interview Procedure
Interviews were face-to-face, computer-assisted, and took place in respondents' homes. Interviewers were selected from members of patient associations or independent expert evaluators of the quality of medical services. All interviewers were trained in two 2-day training sessions conducted by the principal investigators and team members who were experts in statistics and data collection (October 2018, June 2019). One interviewer was excluded for protocol non-compliance and another five dropped out of the interviewers' team after having performed less than 20 interviews.
The Romanian version of the EQ-5D instrument and the EQVT interviewing software were based on the English version of the EQVT software developed by the EuroQoL Foundation. The software was translated by a professional translation company.
Respondents valued three EQ-5D-3L states that were inserted at the end of the EQ-5D-5L sections (valuation tasks and EQ-5D-5L questionnaire). They also filled in the EQ-5D-3L questionnaire and self-reported their health on the EQ-VAS. Finally, the interview ended with an array of socio-demographic questions.
More details on the study's protocol can be found elsewhere [38].

Valuation Protocol and Procedure
We elicited the population's preferences using composite TTO (cTTO) techniques. The cTTO task was implemented following the most up-to-date version of the EQ-VT protocol (version 2.1) [39]. The smallest trading unit was 6 months (which by transformation means 0.05).
Values for states better than dead were elicited using the conventional TTO approach (10-year time frame). In this case, the value of a health state was defined as the number of years spent in full health considered equivalent to the number of years spent in an impaired health state divided by 10. Values for health states worse than dead were elicited using the lead time TTO (10 years in full health followed by 10 years in the state to be valued), and were calculated as the difference between the number of years selected in the lead-time as the respondent's indifference point and 10 divided by 10.
A quality control check developed by the EuroQoL Foundation and set up for the EQ-5D-5L valuation criteria [40] was performed weekly by the principal investigators and with the bi-monthly support of EuroQol experts. This quality control check identified interviews of suspect quality. Suspect quality was defined based on the criteria set for the EQ-5D-5L valuation study: too short explanations for the training part of the survey, not showing the worse than dead example in the training part of the survey, too short duration of the cTTO task, and not assigning the lowest value to the worse health state [40]. Based on the results of the quality control check, interviewers were sent feedback about their performance either by email or telephone.

Selection of EQ-5D-3L Health States
The cTTO values were collected for 30 out of the 243 EQ-5D-3L health states. The set of health states included 18 states that were selected using an orthogonal design [41], all 5 mild states (11112, 11121, 11211, 12111, 21111), and 7 other health states that had been handpicked. Ten blocks of 3 states were created and randomly allocated to participants, thus meeting the minimum number of valuations needed to develop a value set for EQ-5D-3L [16]; the same protocol has been used and published recently by other authors [42,43].

Exclusion Criteria
The following interviews were excluded: 1.
Interviews of suspect quality, performed by interviewers that were excluded from the team of interviewers due to protocol noncompliance.

2.
Interviews performed by interviewers having more than 40% of the interviews performed flagged as interviews of suspect quality (as defined in the EQ-5D-5L valuation study).

3.
Interviews performed by interviewers not performing enough interviews to achieve a harmonized learning effect between interviewers [44]. The minimum number of interviews was set at 20.

4.
Interviews for which the interviewer had not shown the worse than dead example in the training part of the survey.

5.
Participants with a positive slope on the regression between their values and the misery index of the health states assessed for participants who gave the same value to all health states or did not trade time (non-traders).
Based on these criteria, 2 datasets were created, from the most inclusive to the least inclusive for sensitivity analysis purposes: • Set 1 (denoted V1) contained all valid responses and corresponded to the exclusion criterion 1. • Set 2 (denoted V3) corresponded to the exclusion criteria.1, 2, 3, 4, 5.
Logical inconsistencies were identified, but respondents were not excluded from the sample based on this criterion.

Modeling
Based on standard practice and the following EuroQoL guidance, we tested the following models:
Models that account for the panel structure of the data (random intercept models with respondent and interviewer mixed effects; random coefficient models) 4.
Models that account for the censored nature of the data (tobit and interval regression models) To estimate the value set for EQ-5D-3L, various types of linear regression models were tested. We initially considered a multiple linear regression model. Given that we expected to find patterns of heteroskedasticity in the data, we also considered a robust ordinary least squares model. This model corrects for heteroskedasticity but does not account for data clustering. Given that the analysis was performed at a health state level, but a respondent evaluated more than one health state, the data collected for various health states from the same respondent were most likely correlated. Furthermore, when collecting data with the help of interviewers, it was impossible to eliminate completely the interviewer effect. To consider these aspects in the analysis, we also developed 3 more models that account for the panel structure of the data: a model with a random intercept for respondent effect on the data, one with a random effect for the interviewer effect, and finally, a random coefficient model with random effects at the respondent level. Finally, given the censored nature of the data (values below −1 could not be assigned to health states considered worse than death), we also tested tobit and interval regression models.
The dependent variable was computed as 1 minus cTTO (representing the loss of utility associated with the health state), and the independent variables were dummy variables created based on individual responses. A set of 10 dummy variables (MO2, MO3, SC2, SC3, UA2, UA3, PD2, PD3, AD2, AD3) was created to be used as main effects within every dimension to EQ-5D-3L [45]. Each variable represented the impact on the dependent variable generated by transitioning from level 1 (no problems) to level 2 (moderate problems), and respectively 3 (severe problems). The definitions of the variables used, as well as the model specification and methods employed, are presented in Table 1.  The final model, which will be the Romanian model, was chosen based on several selection criteria:

1.
Logical consistency and significance of parameters: All regression coefficients obtained need to be logically consistent between health states and, if possible, significant at the level of 0.05. This means that health states with severe problems on a dimension must have a lower predicted utility than health states with moderate or no problems. This holds true only for intensity levels of the same dimension.
More precisely, we expected health state 11233 to be considered better than the state 11333, but we could not compare it with 21333, for example. This criterion was verified through the parameter estimates of the model, where the coefficients for level-3 states need to be higher than those for level-2 states, which, in turn, must be greater than zero.

2.
Theoretical considerations: As we expected the standard deviation of the observed values to increase with worsening severity of health states, models that accounted for heteroscedasticity were favored. If the percentage of observed values at −1 exceeded the normal range expected for valuation studies (2-10%), preference was given to censored models. For models meeting multiple criteria, preference was given to the model with the lowest number of independent variables (principle of parsimony). 3.
The goodness of fit. For all logically consistent models, we calculated the Akaike information criterion (AIC) and Bayesian information criterion (BIC). The smaller the values for AIC and BIC, the better the goodness of fit of the model. In case of different conclusions for the two indicators, BIC was preferred to also account for model parsimony.

4.
Prediction accuracy (Spearman's correlation between predicted and observed utilities), value range, and the ranking of dimensions based on the size of the coefficient for the worst level on each dimension were also taken into account.
Therefore, we considered as candidate models for our EQ-5D-3L value set only those models whose parameters were all logically consistent and significant at the level of 0.05, which accounted for the heteroscedasticity and/or for the censored nature of the data. From the candidate models, the final model was chosen based on the AIC/BIC ranking, higher predictive accuracy, higher value range, and ranking of dimensions.
For the final value set, the intercept was constrained to be equal to 1 (full health) if it were insignificant at the level of 0.05.
All analyses were performed on the most restrictive dataset (V3).
To determine the impact of different exclusion criteria on our final model ("the Romanian model"), we estimated and tested the model using dataset V1, which included all available interviews. Finally, if our sample was found to be very different from the general population in terms of age, sex, and place of residence, we also tested the impact of these variables on the estimated values by adjusting the model with these variables and by including survey weights in the model. Survey weights were calculated as the product of design weights (the inverse of the respondents' probability of selection for each of the stages of the survey), nonresponse weights (percentage of people responding to the survey in each settlement), and poststratification weights. Post-stratification weights were computed using age, gender, and place of residence as variables to create poststrata. Place of residence (urban/rural) rather than the type of settlement was used to create poststrata so that each poststratum included at least 5 observations to ensure efficiency in poststratification. Population control totals for each poststratum were taken from the 2011 Romanian census.

Comparison with Other Countries' Value Sets
We compared our observed cTTO values with the UK values for the 14 health states that were common to both studies. We determined the observed means (Mean) and standard deviations (SD) for both countries and the predicted values (Predicted). We also tested the statistical significance of the differences between the observed means for Romania and the UK.
Finally, we compared our value set (predicted values) with the UK value set and Polish value set using a kernel distribution plot to observe the range of values, modality, or skewness.
For the entire data analysis, the significance level used was α = 0.05. The results were generated using STATA version 16 and IBM SPSS Statistics 25.

Results
A total of 1674 people were interviewed. Refusal rates varied from 0% to 73%, being higher in urban areas. The study was stopped when the minimum valid number of interviews was reached.
Of the total interviews performed, 25 respondents were excluded for having been interviewed by interviewers that were later on excluded from the team of interviewers due to noncompliance and poor interviewing performance (dataset V1: 1649 respondents). Another 81 people were excluded based on exclusion criteria b, c, and d; they had been interviewed by interviewers who had more than 40% of the interviews flagged or conducted less than 20 interviews, or the interviewer did not show the worse than dead element of the training part of the survey and no negative values were elicited for all health states presented. Finally, 12 people were excluded because they were marked either as illogical or nontraders, or had the same value (different from 1) for all evaluated states.
Only nine respondents had inconsistencies in their health state valuations in the V3 dataset, and 15 in the V1 dataset.
Sociodemographic characteristics for the final dataset (V3 = 1556), weighted and unweighted, are presented in Table 2. As shown in Table 2, women and urban areas were overrepresented in our sample. Sociodemographic characteristics for dataset V1 used for the sensitivity analysis are presented in Table S1. The mean age was 48.  We computed the mean, standard deviation (SD), median, and quartiles for the observed cTTO values ( Table 3). The mean values ranged from 0.942 for state 21111 to −0.510 for state 33333, with similar median values (from 0.95 for states 11112, 11121, 12111, 11211, 21111 to −0.60 for state 33333). The standard deviations seemed to increase as profiles indicated worse health states, which was an early indication of heteroskedasticity. This finding is similar to those in other countries [12,16,45]. One-third of the states had no negative value evaluation (22222, 22121, 12212, 11122, 21211, 11112, 11121, 12111, 11211, and 21111), and among the rest of the sample, the percent ranged between 0.64% for state 12222 to 78.57% for state 33333. Each of the 30 states was evaluated by at least 149 respondents (Table 3). We began our model testing process with the simplest one, the ordinary least squares model (OLS). A list of all models that were tested can be found in Table S2. After having estimated the OLS model, we found an indication of strong heteroskedasticity in the data, which we confirmed using the Breusch-Pagan test (p < 0.0001). Hence, we decided that all our candidate models had to account for heteroskedasticity besides the significance and logical consistency of parameters. Table 4 presents a list of the candidate models with the highest number of consistent and significant parameters corrected for heteroskedasticity and/or accounting for the censored nature of the data. As seen in Table 4, our candidate models for the final value set were the robust ordinary least square model (ROLS), interval regression model (IRM), and interval regression model censored at −1 (IRMC). We tested all models for goodness-of-fit, focusing on the ones with the smallest AIC/BIC ( Table 4). The IRM and IRMC models had the lowest AIC/BIC. Given that the prediction accuracy, the range of values, and ranking of dimensions were very similar for both IRM and IRMC, we chose IRM as our final model given its lowest AIC/BIC. The full model can be found in Table S3.  MO-Mobility; SC-Self-care; UA-Usual activities; PD-Pain/discomfort; AD-Anxiety/depression.
All coefficients of the dummy variables were significant at 0.05 level, meaning that having any type of problem with mobility, self-care, usual activities, pain/discomfort, or anxiety/depression significantly decreased the utility (Table 4). Predicted values for EQ-5D-3L are shown in Table S4.
Most utility decrease was estimated for severe problems, with a cumulative impact of 1.37 utility units. This led to a negative utility of 0.4 for the health state 33333, which was the worst possible state, with more severe problems for all dimensions. Issues with mobility and pain/discomfort had the biggest impact, causing a drop in utility of 0.39 and 0.37 units, re-spectively. The cumulative effect of the other three dimensions was smaller than the effect of the first two taken together, which suggests that for severe problems with mobility and pain, the quality of life of a person is worse than for severe problems with anxiety/depression, being unable to take care of oneself, and carrying on with usual activities.
Moderate problems on the five dimensions had a total impact of 0.25 utility units, leading to a utility for the 22,222 health state of 0.72 units. About half of the impact came from moderate pain/discomfort (0.07) and anxiety/depression (0.05). In contrast to severe problems, in this category mobility had the smallest impact.
Based on the results obtained for the two categories, namely severe and moderate problems, the conclusion was that pain and discomfort is an important factor in perceived utility, regardless of its severity. For the other dimensions, mobility is perceived as a major impediment only if the problems are severe, while depression and anxiety matter more for moderate problems. Being able to perform self-care tasks and usual activities, while having a statistically significant impact, are not seen as major contributors to final utility.
To test the robustness of our model, we estimated and tested the IRM model (RO model) using all available responses (dataset V1) and a weighted version of dataset V3 ( Table 5). The RO model performed the worst in all categories except for prediction accuracy when it was run using all available data (V1). The model performed the best when it was estimated on V3 in terms of AIC/BIC and prediction accuracy, and had similar performance in both V3 and weighted V3 datasets in terms of ranking of dimensions and number of WTD health states. The full model runs on both V1 and weighted V3 can be found in Tables S5 and S6.  Finally, we tested the prediction accuracy of the RO model in dataset V1 and the weighted V3 dataset by comparing the predicted values with the observed mean TTO values for each evaluated health state. As shown in Figure 1 Finally, we tested the prediction accuracy of the RO model in dataset V1 and weighted V3 dataset by comparing the predicted values with the observed mean values for each evaluated health state. As shown in Figure 1, the model estimated wel mean observed values in all cases. We compared the observed cTTO values from our study with the observed TTO ues from the UK MVH study [43] for the 14 health states that were common to both s ies. Table 6 shows the observed means (Observed) and standard deviations (SD) for countries, the number of respondents (n) who evaluated the health states in each st and the predicted values (Predicted). Additionally, we tested the statistical significan the differences between the observed means for Romania and the observed means in UK. We found significant results at the 0.05 level for all compared states, except 33 Values for Romania were generally higher than those recorded in the UK for all st except 33232, for which Romanian values were significantly smaller (Table 6). Differe between health states ranged from −0.42 (for 21133) to −0.06 (corresponding to 11121) Figure 2).  11211  21111  12111  11121  11112  21211  11122  12212  22121  12222  11113  22222  23112  13221  11313  21323  12331  22233  21332  21133  32113  31223  23323  31131  13133  32322  33311  32232  33232  33333 Observed IRM_V1_pred IRM_weighted V3 IRM_V3_pred We compared the observed cTTO values from our study with the observed TTO values from the UK MVH study [43] for the 14 health states that were common to both studies. Table 6 shows the observed means (Observed) and standard deviations (SD) for both countries, the number of respondents (n) who evaluated the health states in each study, and the predicted values (Predicted). Additionally, we tested the statistical significance of the differences between the observed means for Romania and the observed means in the UK. We found significant results at the 0.05 level for all compared states, except 33333. Values for Romania were generally higher than those recorded in the UK for all states, except 33232, for which Romanian values were significantly smaller (Table 6). Differences between health states ranged from −0.42 (for 21133) to −0.06 (corresponding to 11121) (see Figure 2). When comparing the estimated values for all health states, the values for the Romanian EQ-5D-3L value set were higher than the values for the UK value set, but fairly similar to the values for the Polish EQ-5D-3L value set, although the estimations of individual health states differed (Figure 3).

Discussion
Our study estimated for the first time in Romania a value set for the EQ-5D-3L questionnaire. This constitutes a stepping stone to further development of HTA in Romania, as it will potentially lead to more transparent and consistent decision-making in healthcare and more efficient use of relatively scarce local resources.
To develop our EQ-5D-3L value set, we tested several regression models. We chose the interval regression model as our final model because of all candidate models, it performed the best in terms of AIC/BIC and had similar performance with the second-best model in terms of prediction accuracy, range of values, and number of WTD health states. Our final model accounted for heteroskedasticity and all coefficients were significant at the level of 0.05. Finally, the model provided utility estimates with a range similar to the observed ones.

Discussion
Our study estimated for the first time in Romania a value set for the EQ-5D-3L questionnaire. This constitutes a stepping stone to further development of HTA in Romania, as it will potentially lead to more transparent and consistent decision-making in healthcare and more efficient use of relatively scarce local resources.
To develop our EQ-5D-3L value set, we tested several regression models. We chose the interval regression model as our final model because of all candidate models, it performed the best in terms of AIC/BIC and had similar performance with the second-best model in terms of prediction accuracy, range of values, and number of WTD health states. Our final model accounted for heteroskedasticity and all coefficients were significant at the level of 0.05. Finally, the model provided utility estimates with a range similar to the observed ones.
We compared our value set with those of the UK and Poland. We chose the UK because HTA results from the UK are often used as a guide for the Romanian HTA and because local researchers have used this value set in the absence of a local one. Even though differences were found between the two value sets, these might also be because the EQ-5D-3L valuation methodology has changed in the meantime with the use of cTTO and computer-assisted interviews. This will more likely lead to a decrease in interviewer bias, processing errors, and easier randomization of the question order. [46] We also compared our value set with that of Poland due to the higher similarities in economic and historical background with Romania. Nevertheless, intercountry differences were still observed, thus stressing the importance of using country-specific value sets for instruments such as the EQ-5D and calling for an urgent refinement of current HTA practices in Romania. This is supported by an increasing body of literature that shows that using multinational value sets or other countries' value sets might misrepresent the value sets of individual countries [47,48].
Our sensitivity analyses performed using dataset V1 were conducted on more relaxed criteria than the primary analysis and showed that modeling can be severely undermined by data of poor quality. This is in line with other studies' results that show that data not meeting the minimum quality criteria as set by the EQ-VT software can lead to low face validity, difficulties in data modeling, and measurement errors with a final value set not discriminating very well between more severe health states [14,49]. In our sensitivity analysis, we did not explore the effect of excluding inconsistent respondents from our model. We based our decision on the results of a systematic review of exclusion criteria in national health state valuation studies that showed that the effect of excluding inconsistent respondents on national tariffs was not consistent [50].

Limitations Our Study Has a Certain Number of Limitations
First of all, our study sample differed from the Romanian general population in terms of age, gender, and rural/urban distribution. We, however, corrected this imbalance by using survey weights and assessing the impact of their use on our final model and found no significant differences between weighted and unweighted analyses. Nevertheless, our survey weights did not account for other observed differences between our sample and the general population, such as education or income [51], on the values obtained. Additionally, our survey weights were based on the 2011 census data, the most recent census data available. Since 2011, migration rates have been increasing in Romania, with the country currently having the highest growth in the size of its diaspora population after Syria. Hence, our survey weights might not always have correctly adjusted the representativeness of our sample.
Third, the quality of our data would have been better had we been able to collect them using fewer well-trained interviewers. Optimally, the research should have been completed by an estimated number of 15-16 interviewers so that each would perform at least 100 interviews, thus having the time to acquire and maintain the skills of conducting interviews. Fieldwork difficulties such as the start of the data collection in winter, problems accessing certain rural areas, respondents' reluctance to participate in the study especially in urban areas, and the large number of interviewees assigned to each interviewer led to interviewer fatigue and demotivation. Hence, the performance of our interviewers varied greatly during the study period, with several of the initial interviewers being replaced during data collection or dropping out of the interviewers' team.
EQ-5D-3L valuation tasks were performed on the same individuals after the 10 EQ-5D-5L valuation tasks. Hence, we cannot exclude the fact that the quality of our EQ-5D-3L data might have been affected by respondent fatigue, as their attention and motivation might have dropped toward the end of the valuation task. Nevertheless, only 1.5% of the sample found the cTTO task hard to understand and only 15.9% admitted having problems deciding the value where the two lives presented were the same.

Conclusions
This is the first study conducted in Romania that estimates the index values for different health conditions in the EQ-5D-3L questionnaire, using a national population sample. These results can support reimbursement decisions and allow regional crosscountry comparisons between health technologies. This study lays a stepping stone in the development of a health technology assessment process more driven by locally relevant data in Romania.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/ijerph18147415/s1. Table S1: Respondents' characteristics-categorical variables, dataset V1, Table S2: All models that were tested, Table S3: Full value set model for the Romanian version of EQ-5D-3L, Table S4: Predicted values for the EQ-5D-3L-Romanian model (RO model),  Funding: The estimation of the EQ-5D-3L value set received partial funding as part of a project run by the Romanian Academic Society through the Romanian Operational Programme "Administrative Capacity" (VALUEMED-"Development of public policies in the field of health through the use of medical technology evaluation studies", SIPOCA Code 195/MySMIS 111603). By pure coincidence, two teams of independent researchers simultaneously requested the support and approval of EuroQol, one for valuing the EQ-5D-3L instrument (a team led by MS Paveliu-Romania) and the second for valuing the EQ-5D-5L instrument (a team led by E. Olariu-Newcastle University UK) but for the same country, Romania. EuroQol simultaneously trained the two groups, which later collaborated methodologically without intersecting funding. Both teams used the same software provided by EuroQoL, recently used by Law and Collab for the parallel valuation of the 3 L and 5 L variants of the EQ-5D tool in the US [42].

Institutional Review Board Statement:
The study was conducted according to the guidelines of the Declaration of Helsinki, and approved by The National Bioethics Committee of Medicines and Medical Devices of Romania (22 SNI/21.02.2019). Newcastle University's Research Ethics Committee also gave ethics approval.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request from the corresponding author.