Economic Evaluations Informed Exclusively by Real World Data: A Systematic Review

Economic evaluations using Real World Data (RWD) has been increasing in the very recent years, however, this source of information has several advantages and limitations. The aim of this review was to assess the quality of full economic evaluations (EE) developed using RWD. A systematic review was carried out through articles from the following databases: PubMed, Embase, Web of Science and Centre for Reviews and Dissemination. Included were studies that employed RWD for both costs and effectiveness. Methodological quality of the studies was assessed using the Consolidated Health Economic Evaluation Reporting Standards (CHEERS) checklist. Of the 14,011 studies identified, 93 were included. Roughly half of the studies were carried out in a hospital setting. The most frequently assessed illnesses were neoplasms while the most evaluated interventions were pharmacological. The main source of costs and effects of RWD were information systems. The most frequent clinical outcome was survival. Some 47% of studies met at least 80% of CHEERS criteria. Studies were conducted with samples of 100–1000 patients or more, were randomized, and those that reported bias controls were those that fulfilled most CHEERS criteria. In conclusion, fewer than half the studies met 80% of the CHEERS checklist criteria.


Introduction
Given limited health resources and growth of health technologies, it is vital to use these resources efficiently. Economic evaluation (EE) studies are a useful tool to facilitate decision-making regarding which health technologies to use and/or finance. EE studies can be performed directly, based on individual patient data during randomized controlled trials (RCT "piggyback"), or by employing data from pragmatic studies [1]. EE analyses can also be carried out using decision models. Studies that employ models can obtain data and parameters from RCTs, pragmatic studies, expert opinion or a combination of these sources. Other sources of data for EE are observational studies using Real World Data [RWD]. This information typology can be used to measure both effects and costs. As such, RWD

Materials and Methods
This was a systematic review of published studies. A protocol was drafted to conduct a systematic review following the recommendations of Preferred Reporting Items for Systematic Reviews and Meta-Analyses for systematic review protocols (PRISMA-P) (http://www.prisma-statement.org/ Extensions/Protocols.aspx). [13].
Eligibility criteria. Defined in PICO format in accordance with the aims of the present study (Population: any, Intervention: any, Comparator: any, Outcome: Full economic evaluation). Studies were considered complete economic evaluations if they compared two or more alternatives and assessed both costs and outcomes (effectiveness) and thus could be considered as cost-effectiveness analysis, cost-utility analysis and cost-benefit analysis [1]. There were no restrictions regarding date or publication status. Only studies in English or Spanish were included.

Sources of information.
A search was performed of the main clinical databases: PubMed, Embase, Web of Science (WOS) and the Centre for Reviews and Dissemination (CRD). This search was carried out between the 3rd of June and the 21st of August 2018.
Search strategy. The search strategy was adapted to each of the databases. An example of the PubMed Search is provided in Supplementary Material 1.
Inclusion criteria. Complete economic evaluations that used RWD (data not collected in conventional RCT [5]) for both costs and effectiveness.
Exclusion criteria. Decision-Analytic Models, cost-minimization analysis, cost-analysis, reviews, meta-analysis, comments, protocols, use of ad-hoc questionnaires in costs or effects, poster or presentations at congress or workshops, grey literature, abstract only, not health-related, book or letter.
Of the 14,011 studies identified, 593 were potentially eligible following review of title and abstract. Figure 1 shows article selection and reasons for exclusion, mainly those studies using economic evaluation models, studies not using RWD or those that did not perform complete economic evaluations. A total of 93 articles were finally included . Table 1 shows the characteristics of the included studies. Most were published between 2011 and 2018. The country with the most studies included was the USA (21.5%), while Europe provided 22 (notably France, Spain, Germany and United Kingdom). Sixteen studies did not report the country where it was carried out. Approximately 50% of studies performed the economic evaluation in a hospital setting and 29% did not specify the setting. Public funding was the type most frequently reported.    Regarding study population, the diseases most frequently assessed were neoplasms (14: chapter II), especially lung cancer and colorectal cancer (each representing 17% of the total of neoplasm studies), Int. J. Environ. Res. Public Health 2020, 17, 1171 6 of 18 followed by breast cancer and hepatocellular cancer. Some 58% of studies did not report a medication or have it as an inclusion criterion for the study population. In studies where it was included, the most frequent therapeutic group (23%) was antineoplastic and immunomodulatory agents. Pharmacological interventions were the most commonly evaluated (41%) followed by surgical (21%). Three out of four controls used were usual care or pharmacological treatment. Table 2 shows methodological information on RWD. The main source of RWD costs and outcomes was information systems, such as the use of administrative databases. Most frequent sample size was between 100 and 1000. Some 83% of studies were not randomized. The most frequently mentioned potential bias was confounding selection and the most common forms of control described were sensitivity analysis and propensity score.  Table 3 shows methodological aspects related to the economic evaluation. All studies employed direct costs while 15% combined direct and indirect. Mean survival measured as Life-years, life-years gained, life-years or mortality was the most frequently used clinical outcome in 46% of studies, while only 17% can be considered cost-utility evaluations. Around 50% of studies used a time horizon between 1 and 5 years. A total of 41% of studies carried out the evaluation from the payer's perspective. Supplementary Material 2 and 3 show general characteristics, aspects of RWD and information on the economic evaluation in each study.
The methodological quality of the economic evaluation for each included study is shown in Figure 2. It can be seen that the study items most frequently fulfilled with respect to the CHEERS checklist were the title, study population/subgroup, comparators, selection of clinical results and discussion. The least frequently fulfilled were setting and location, discount rate and characterizing heterogeneity. Of the items evaluated, two were not applicable in 100% of the reviewed studies as they are items on decision models (items 15 and 16) and item 12 did not apply to economic evaluation studies that did not use QALY as a measure of effectiveness. Forty-four studies (47.31%) met at least 80% of the CHEERS criteria (17 or more items, not including items 12, 15 and 16) and 5 (5%) met at least 50% of checklist items.
The CHEERS checklist focuses on the methodological aspects of EE rather than RWD. However, we examined whether there was any relationship between studies that met CHEERS criteria and RWD aspects, such as sample size, study randomization or whether the authors applied bias control methodology or not. This information is shown in Table 4, where it can be seen that most studies with a sample size lower than 1000 met at least 80% of CHEERS criteria, while 60% of those with 1000 or more reached the same threshold. Randomized studies and those that reported bias control met the highest percentage of CHEERS criteria (80% or more).
checklist were the title, study population/subgroup, comparators, selection of clinical results and discussion. The least frequently fulfilled were setting and location, discount rate and characterizing heterogeneity. Of the items evaluated, two were not applicable in 100% of the reviewed studies as they are items on decision models (items 15 and 16) and item 12 did not apply to economic evaluation studies that did not use QALY as a measure of effectiveness. Forty-four studies (47.31%) met at least 80% of the CHEERS criteria (17 or more items, not including items 12, 15 and 16) and 5 (5%) met at least 50% of checklist items.   15,16) or that in the great majority did not apply (12).
Please see Supplementary Material 4 for details on study quality assessments.

Discussion
An assessment was conducted of the methodological quality of the economic evaluations in the 93 studies using RWD. Pharmacological interventions were the most frequently evaluated. The most commonly used clinical outcome was survival. Approximately half of the studies used a time horizon of 1-5 years and carried out the evaluation from the funding entity's perspective. Roughly half of the studies met 80% of CHEERS criteria. The most frequently fulfilled items were discussion and measurement of clinical effectiveness and the least were setting and location and characterizing heterogeneity.
Some studies did not specify in which country or setting the EE was performed, which is an important aspect for decision-makers or researchers who would like to use the study as a basis. As the decision to apply the results of an EE study is used for inclusion and or reimbursement, it will depend on other aspects of the context in which it is carried out.
A systematic review of the use of RWD for EE in Germany was identified [111] and some comparisons with this are presented below. However, some inclusion criteria differ from those of the present study, such as studies that use routine data as a source of information for costs or effects or both. Our review includes studies that use RWD in costs and effects. Moreover, Gansen [111] only included studies based on data from Germany and the authors put no restrictions on the publication language while, in contrast, we placed no restrictions on specific country databases but did restrict the publication languages.
Gansen [111] found that the principal illnesses assessed were cardiovascular diseases followed by type II diabetes mellitus whereas neoplasms were the main focus of evaluation in our review. This may be due to the fact that Gansen [111] only reviewed studies performed in Germany. Regarding the higher frequency of evaluations related to neoplasm interventions, it may be because some, such as lung cancer, are among the most frequently diagnosed and cause the most mortality [112,113]. Approximately 50% of the EE studies on neoplasm interventions were conducted in Canada (89% of studies in this country were on this illness group) and the United States (40% of total illnesses).
QALYs were used as a measure of effectiveness in only 17% of studies, despite being the health outcome recommended by EE methodological guidelines [114,115], especially in illnesses that compromise quality of life, such as cancer, musculoskeletal diseases or mental disorders. Furthermore, QALYS are a useful measure of effectiveness as they allow comparison between interventions in distinct illnesses. Nevertheless, use of QALYs in the EE reviewed was higher than in the Gansen [111] studies, which identified only two studies using this outcome (5.7%). It is possible that one of the reasons why QALYs were not included in EE using RWD is because quality of life is not systematically recorded in health systems or that the utilities cannot be estimated with registered clinical outcomes.
To ensure quality in an EE study based on RWD in analysis of cost and outcomes variables, it is important to have a good source of information for use of RWD. Quality will depend on the degree of data reliability and how to use it to respond to specific questions [5]. A crucial aspect is sample size, which in the reviewed studies varied between less than 100 and more than 5000. Sample size affects results, so it is vital to determine what range of sample size ensures that the economic evaluation has sufficient validity.
Study randomization is another key issue in use of RWD where the potential risk of bias is the most significant limitation that can arise in this type of data [5]. In this review, there was a predominance of non-randomized studies as sources of RWD were mainly information systems, where randomization is not possible in retrospective studies. However, some studies reported using methods to control bias or the risk of bias. Nevertheless, despite the application of statistical approaches to adjust for selection bias in observational studies, they do not have the methodological rigor of RCTs [5] that favor internal validity but are easily generalized to a more heterogeneous population [4,116]. More critical are those studies that did not report biases or methods to control them which, in this review, totaled approximately 20 studies.
The relationship between quality of RWD and EE methodological quality is reflected in the results of this review where studies with sample sizes of 1000 or more, randomized and with bias control, met more CHEERS criteria.
Only one study was performed in two countries [75] which evaluated the effectiveness and costs of an intervention. This may be due to the complexity of harmonizing data from differing health and registration systems.
Temporal horizon is a fundamental aspect in EE. However, we observed that some studies did not report it, or if they did, its use was not justified. Very few studies employed the society perspective, which is the most extensive [1] and, depending on the EE, the most appropriate, as there are costs borne by families that could represent 30% of total costs [115]. To approach the study from the society perspective, information on costs is required which is not normally recorded as, in most cases, the sources of information were administrative databases and/or medical records. In common with the Gansen review, most studies employed the payer's perspective. Furthermore, some studies did not specify the study's perspective.
Regarding the results of the methodological quality assessment, study strengths (items marked "yes" for 80% or more of the studies reviewed) were: proper placement of the title; defined aims, study population and comparators; description of health-outcome selection; measurement of effectiveness; and an adequate discussion section. Areas for improvement were also identified (items marked "no" for 50% or more of studies reviewed). For instance, half of the studies did not specify relevant aspects of decision-making procedures. EE setting and location can greatly affect results. Approximately half of the studies reported a discount rate and very few characterized heterogeneity, that is, they did not explain whether differences in costs, effects or cost-effectiveness were due to variations between patient subgroups or other observed variability. Heterogeneity is of great relevance in studies that use RWD as the patients included do not usually meet strict inclusion criteria or, at least, these are not defined prior to their participation. Consequently, discussion of possible differences due to distinct subject characteristics is crucial.
Approximately half of the studies fulfilled 80% of checklist items. This is in line with results reported by Gansen [111], who also identified characterization of heterogeneity as a weakness; although, in contrast with this review, the German review identified other weaknesses, such as not reporting the discount rate (item no. 9), currency, price date and conversion, and characterization of uncertainty.
As reflected in this review and the one conducted in Germany, use of RWD in EE is increasing [111]. RWD are an essential source for coverage, funding and health-technology reimbursement decisions [5]. However, as mentioned previously, these studies have some methodological limitations, such as confounding selection, identified by the authors of the studies in this review as the main potential risk. In another review [2], the main biases were confounding and missing data. A further risk in this type of information, frequent but not detailed, is registration quality. Registration accuracy and the heterogeneity of each variable considered as health service use should be reflected.
These RWD limitations represent an important methodological challenge as the benefits of RWD with respect to RCTs. These designs should be complementary given that RCTs continue to be the standard for demonstrating the clinical efficacy of interventions. However, it is necessary to determine their effectiveness through long-term observational studies, pragmatic studies and use of administrative databases [5].
Assessment by independent pairs of researchers and methodological rigor were applied throughout the review. In cases of discrepancies, consensus was sought and if agreement was not reached, a third researcher made the decision. We hope that the results obtained contribute information on the methodological quality of the evaluated studies and are useful to decision-makers in terms of recognizing good quality studies on health technology. Nevertheless, this review has some limitations.
One limitation identified was ambiguity in interpreting the checklist used to evaluate EE methodological quality, especially on some items such as CHEERS item 18 which refers to study parameters. Conducting the review using independent pairs facilitated identification of the ambiguities and increased the likelihood of the results being transparent and comparable over time [117]. In addition, the researchers came from different disciplines (pharmacy, medicine and economics) which allowed a comprehensive view and was a strength of the study.
Another limitation was that only studies in English or Spanish were included, a restriction which led to the omission of published studies that met other inclusion criteria.
The methodological quality of the performance of the RWD studies was not assessed, although the one reported by the authors of the reviewed studies was considered valid. There are various checklists available to evaluate the quality of RWD studies. One is that proposed by the ISPOR Task Force [6], which consists of 7 items related to Hypothesis Evaluating Treatment Effectiveness (HETE). As noted in the systematic review protocol registered at Figshare, the intention was to use this checklist with the studies included. However, although we largely adhered to the protocol, RWD quality was not assessed with this checklist as the items seemed ambiguous and, as mentioned above, some items appeared on the CHEERS list, so it was decided to use the CHEERS one only. The possible reason for the non-tracking of the ISPOR Task Force checklist is its publication date, as most articles were published prior to this date.
Another checklist is that proposed by Kreif, which consists of 5 questions dealing with some statistical aspects [118]. Nevertheless, we did not use it as, although it was complementary, it did not substitute the EE checklist and had some limitations such as not covering some aspects of statistical analysis. Moreover, this checklist did not allow a description of which specific statistical method should be prescribed to approach selection bias in analysis of cost-effectiveness [2].
For future studies, it would be beneficial to have a validated, adapted tool available to assess the methodological quality of EE studies using RWD. Health Technology Assessment agencies in Europe could collaborate with RWD use policies and recommendations on practical aspects of RWD collection and analysis [7]. These policies could act as a basis for the use of RWD in EE. This standardization is essential because, as the results of this review showed, the use of RWD to create Real World Evidence is increasing. Despite its limitations, the use of this evidence is becoming more and more necessary. In the future, its limitations should be considered and compensated. This evidence will be crucial in enabling improvements in the quality, safety and value of health care.

Conclusions
This review demonstrates that the use of RWD in carrying out EE with individual patient data is an increasingly common practice. A total of 93 studies were identified that met these conditions and their methodological quality was assessed using the CHEERS checklist. It was observed that fewer than half of the studies fulfilled 80% of checklist criteria. It shows the low quality of the EE included in this review. More attention should be paid to the reporting of methodologies and results in EE.
Meeting CHEERS checklist criteria is associated with RWD methodological aspects; studies with samples sizes of 1000 or more, randomized studies and those that reported some method of controlling bias were those that had the greatest likelihood of fulfilling 80% of CHEERS criteria.
Some important methodological differences were noted in RWD use and in the presentation of results, which represents a considerable challenge regarding standardization of methodology in EE using RWD. Meeting this challenge would facilitate decision-making among policy makers.
Use of the CHEERS checklist showed that there are important aspects of RWD that are not considered and that the use of more than one checklist is neither practical nor efficient and, as such, it would be valuable to have an EE checklist that includes RWD available.