Mapping Canadian Data Assets to Generate Real-World Evidence: Lessons Learned from Canadian Real-World Evidence for Value of Cancer Drugs (CanREValue) Collaboration’s RWE Data Working Group

Canadian provinces routinely collect patient-level data for administrative purposes. These real-world data (RWD) can be used to generate real-world evidence (RWE) to inform clinical care and healthcare policy. The CanREValue Collaboration is developing a framework for the use of RWE in cancer drug funding decisions. A Data Working Group (WG) was established to identify data assets across Canada for generating RWE of oncology drugs. The mapping exercise was conducted using an iterative scan with informant surveys and teleconference. Data experts from ten provinces convened for a total of three teleconferences and two in-person meetings from March 2018 to September 2019. Following each meeting, surveys were developed and shared with the data experts which focused on identifying databases and data elements, as well as a feasibility assessment of conducting RWE studies using existing data elements and resources. Survey responses were compiled into an interim data report, which was used for public stakeholder consultation. The feedback from the public consultation was used to update the interim data report. We found that databases required to conduct real-world studies are often held by multiple different data custodians. Ninety-seven databases were identified across Canada. Provinces held on average 9 distinct databases (range: 8–11). An Essential RWD Table was compiled that contains data elements that are necessary, at a minimal, to conduct an RWE study. An Expanded RWD Table that contains a more comprehensive list of potentially relevant data elements was also compiled and the availabilities of these data elements were mapped. While most provinces have data on patient demographics (e.g., age, sex) and cancer-related variables (e.g., morphology, topography), the availability and linkability of data on cancer treatment, clinical characteristics (e.g., morphology and topography), and drug costs vary among provinces. Based on current resources, data availability, and access processes, data experts in most provinces noted that more than 12 months would be required to complete an RWE study. The CanREValue Collaboration’s Data WG identified key data holdings, access considerations, as well as gaps in oncology treatment-specific data. This data catalogue can be used to facilitate future oncology-specific RWE analyses across Canada.


Introduction
In recent years, real-world evidence (RWE) has gained increasing interest from decision makers with its potential to inform and support regulatory reviews, health technology assessments (HTAs), reimbursement decisions and price negotiations for novel therapies [1][2][3][4]. Traditionally, health technology assessment reviews have relied on evidence from randomized clinical trials (RCTs) to assess a drug's clinical benefit [5,6]. With increasing real-world studies examining post-market outcomes of drugs in clinical practice, there is growing evidence to suggest that effectiveness in the real-world may differ from efficacy observed in RCTs [7][8][9][10][11]. While RCTs are the gold standard for establishing a treatment's efficacy, clinical trials may not be representative of all patients from the general population who will receive the drug in clinical practice due to highly selective trial eligibility criteria [12]. This efficacy-effectiveness gap can be particularly troubling for decision makers evaluating novel anticancer therapies because of the rapidly evolving therapeutic space and high drug prices. In particular, previous studies in the literature have demonstrated that cost-effectiveness estimates derived from economic models using clinical trial data were often underestimation of the incremental cost-effectiveness ratios generated using realworld data [13][14][15]. As such, RWE, generated by the analysis of real-world data (RWD), can provide useful information that can inform decision makers when reassessing drug funding decisions as part of life-cycle health technology management.
RWD has been defined as data collected in a non-clinical trial setting, including data collected from electronic health records, disease registries, personal health devices, and administrative databases [12,16]. RWD have also been defined as data collected after RCTs, regulatory approvals, HTAs, reimbursement decisions or following price negotiations [12]. Since the majority of RWD is collected routinely through clinical practice or as part of the administrative claims process, it can be relatively more accessible compared to other data sources and relatively inexpensive compared to standard clinical trials, especially for jurisdictions with existing data infrastructure [12,17,18]. Consistent with patient-centered health care, the RWD collected can be used to develop many different types of information, including prevalence and incidence of disease, effectiveness and safety of treatments, as well as quality of life and patient-reported outcomes associated with treatments [16,[18][19][20][21][22][23][24][25][26][27][28][29][30][31]. Stakeholders, including clinicians, researchers, and decision makers, have suggested these types of information can be useful for post-funding reassessment for cancer drugs [16,32,33]. The insights gained from analysis of RWD can inform routine clinical practice by clinicians, recommendations by HTA agencies, and price negotiations and reimbursement decisions by decision makers.
In Canada, the majority of health care is publicly funded by provincial/territorial governments [34]. Despite being a publicly funded healthcare system, there are geographical variations in cancer incidence across provinces as shown by the Canadian Cancer Society, suggestive of differences in risk factors, diagnostic practices, and data collection [35]. Publicly funded cancer treatments are routinely administered and reimbursed by the provinces either through the Ministry/Department of Health or the provincial cancer agencies/programs [36]. Data collection aligns with this funding structure, wherein different governments across Canada collect real-world, population-based administrative data on health system resource utilization for their jurisdictions, including claims data on funded cancer drugs. In addition to federal and provincial/territorial governments, the Canadian Institute for Health Information (CIHI), a federally chartered, independent, not-for-profit organization, also collects and holds pan-Canadian databases on comprehensive health care data provided by each province [37]. In 2018, CIHI developed the pan-Canadian Minimal Oncology Dataset (pCMOD) report, which is a set of data standards and guidelines, with aims to harmonize the collection of oncology drug data in alignment with national and provincial/territorial interests [38]. Despite the significant efforts by government entities and third-party organizations to harmonize data collection, a recent qualitative study of key stakeholders across Canada on the perspective of RWD noted significant concerns regarding the siloed nature of data assets in the current system [33]. Another study also noted that the varying data access, data governance, and data availability across provinces are barriers to use of RWD for drug funding studies [39]. Notwithstanding the challenges to using RWD, there is a paucity of effort to map and catalogue the data elements that currently exist in each province that can be used for real-world studies in oncology.
The Canadian Real-world Evidence for Value in Cancer (CanREValue) Collaboration was established in 2017 with the aim to develop a framework for incorporating RWE into cancer drug funding decisions [40][41][42]. As part of the CanREValue Collaboration, five working groups (WGs) were established, including the CanREValue Data WG [40]. The CanREValue Data WG was established with the aim to explore and map the existing population-based administrative healthcare databases across Canadian provinces. The CanREValue Data WG also identified a list of data elements necessary for conducting real-world studies in oncology and explored the availability of these data elements within the existing databases. This paper will outline the main findings from the CanREValue Data WG's efforts to map existing administrative databases and data elements for conducting real-world analysis in oncology.

CanREValue Data Working Group
The Data WG was formed as a part of the CanREValue Collaboration and consists of 20 data experts and researchers across all 10 Canadian provinces. The objective of the Data WG was to map the databases and data elements that were available in each province which can be used to conduct cancer-specific RWE studies. From March 2018 to September 2019, the Data WG members convened for three teleconferences and two in-person meetings to iteratively identify and map the potential types of databases and data elements needed for conducting real-world retrospective administrative database studies in cancer. Following the meetings, the CanREValue Collaboration core research team developed surveys that were shared with the provincial experts for completion. Since data elements to conduct real-world studies were contained in cancer-specific and non-cancer-specific databases, both types of databases were considered in the mapping exercise. The surveys specifically aimed to explore population-based administrative databases that collect and maintain data on publicly funded health care services, as the current focus of CanREValue Collaboration centers around population-based RWE studies to inform funding decisions around publicly funded cancer drugs.

Surveys on Provincial Data Assets
Surveys on the data elements and databases required for conducting real-world studies were created by the CanREValue Collaboration core research team based on a previous real-world study conducted in Ontario, Saskatchewan, and British Columbia [13,43]. The first section of the survey focused on identifying databases containing relevant types of information (e.g., cancer registry data, hospitalization data, etc.), with questions including database name and custodian of the database. The second section of the survey focused on identifying data elements that are required for conducting cancer-specific real-world studies. The data elements chosen for this mapping exercise were selected during the teleconference discussions based on experiences with the feasibility of identifying these data elements from previous RWE studies conducted by the data experts. The data experts were also asked to identify the database that contains each data element, assess the availability and linkability of the data elements, and identify any limitations in coverage and/or completeness of the data element over time. The availability and linkability of each data element were categorized as (i) data available and linkable, (ii) data available and linkable with caveats, (iii) data availability and linkability to be determined after conducting RWE analysis, and (iv) data not available or linkable. The final section of the survey asked each provincial data expert to assess the feasibility of conducting an RWE study for intravenous and oral drugs based on the availability and linkability of each of the variables of interest. Data experts were asked to estimate, based on their previous experience, the time it would take for cohort creation and evaluation of each type of outcome as (i) 3-6 months, (ii) 6-12 months, and (iii) more than 12 months.

Stakeholder Consultation
After collecting the survey responses from the provincial experts, an interim data report was developed that contained information on the available data assets from the mapping exercise. A public stakeholder consultation on the interim data report was initiated from 13 November 2019 to 13 December 2019. The interim data report was publicly posted on the CanREValue Collaboration website (https://cc-arcc.ca/canrevalue/ (13 November 2019)) and was electronically sent to the CanREValue Collaboration mailing list, as well as on the social media account. Public feedback on the interim report was consolidated into a document and the relevant changes were incorporated into the updated interim Data Report. Along with the revised data report, the response document was published online on the CanREValue website on 21 April 2020.

Databases for RWE Studies
Across Canada, 97 databases were identified in this exercise. The data experts identified an average of 9 databases (range 8-11) in each province that contained data elements relevant for cancer-specific RWE analysis (Table 1). For all provinces, the Ministries/Departments of Health (MoH/DoH) maintains databases on publicly funded health services that are administered through provincial health insurance plans or health authorities within their jurisdiction. Most provincial MoH/DoH work with CIHI to capture standardized hospitalization data through the Discharge Abstract Database (DAD) and ambulatory care services (including emergency department visits) through the National Ambulatory Care Reporting System (NACRS). Québec is the only province that does not fully report to the DAD, while Ontario and Alberta are the only provinces that fully report to the NACRS. In other provinces, the services administered by MoH/DoH include both cancer and non-cancer treatments while in other provinces/territories, specific care is delegated to specialized agencies. For example, in some provinces, such as Ontario, Saskatchewan, British Columbia, Manitoba, Nova Scotia, and Newfoundland and Labrador, cancer treatments/funding are administered through provincial agencies/programs and, thus, detailed treatment data may be collected by the agency/programs on behalf of the MoH/DoH. In such circumstances, data may be shared between the two organizations or may require data sharing/linking for the purpose of health system planning and administration. Since the databases required to conduct RWE studies may be held across multiple data custodians, this can create barriers for timely data access and linkage. In some provinces/territories, there are third-party organizations (e.g., ICES (formerly known as the Institute for Clinical Evaluative Sciences) in Ontario and Health Data Nova Scotia (HDNS)) that are authorized to access and link provincial demographic and health-related databases for research and evaluation.

Variables Required for Conducting RWE
Variables that are necessary to conduct real-world comparative analysis were categorized into three essential components: (1) variables for cohort creation; (2) variables on baseline demographic and clinical characteristics; and (3) variables on outcomes of interest.
The first component of a real-world study is to build an appropriate study cohort that can answer the research question. Variables to define the disease of interest such as cancer diagnosis codes (ICD-O-3 morphology, topography, behavior code), stage, and date of diagnosis were considered necessary for cohort selection. Variables on receipt of treatment, including a drug identifier, date of treatment, and dose administered were also considered relevant for identifying the eligible patient cohort and conducting analysis. Further, given that specific drugs may be used for more than one setting, data elements defining treatment indication, line of therapy and/or intent of treatment were also considered relevant.
The second component of an RWE study includes demographic and clinical characteristics for describing the cohort and balancing differences between treatment groups to ensure comparability. These variables included age, sex, neighborhood income quintile, region/rurality, comorbidity, performance status, and prior treatment exposures (systemic therapy, radiotherapy, and cancer-directed surgery). Concurrent or subsequent treatments (systemic therapy, radiotherapy, and cancer-directed surgery) were also included as relevant clinical characteristics to consider.
The third component of an RWE study includes the outcomes. Five key types of outcomes were identified including clinical effectiveness, safety, cost-effectiveness, budget impact, and patient-reported outcomes ( Table 2). Within each type of outcome, there are distinct endpoints that can be studied. For example, endpoints within the clinical effectiveness outcome category include overall survival and other time-to-event endpoints (treatment discontinuation or progression-free survival). An initial assessment of some specific endpoints for each outcome type is listed in Table 2. The data elements required to generate these endpoints are also outlined in Table 2.

Mapping Real-World Data Elements in Provinces
Building upon the three essential components of a real-world study, the WG created The Essential Cancer RWD Table (Table 3), a list of data elements that are minimally necessary for conducting real-world studies in oncology. Each data element is also indicated for whether it is used for cohort creation, baseline/clinical characteristics, or outcome. For the outcome component, we designed the data element as relevant for real-world survival, real-world safety, real-world cost, or real-world budget impact. Some data elements may be needed for all three components of the RWE study such as participant ID, while some data elements may only be required for one component of the RWE study, such as cost of the drug, which is only required for real-world comparative cost-effectiveness. Since some of the variables listed in Table 2 are composite variables, such as comorbidity, multiple data elements in The Essential Cancer RWD Table are required to generate these composite variables.  The Expanded Cancer RWD Table presented in Table 4 includes a more comprehensive list of data elements, including those variables identified in The Essential Cancer RWD table in Table 3. The availability of these additional data elements within the Expanded Cancer RWD would enhance the real-world analysis but may not be routinely collected in each province. Variables that are relevant only to a specific disease or drug or are not routinely reported to population-based databases are not included in this list. While most provinces have data on patient demographics (e.g., age, sex) and cancer diagnosis related variables (e.g., morphology, topography), the availability and linkability of data on cancer treatment, clinical characteristics, and drug costs varies among provinces.   Note: While some variables listed in the table can be captured by one data element (e.g., sex), other variables are derived from multiple data elements (e.g., age at first treatment requires both birth date and date of first treatment).

Resource and Capacity Assessment
The capabilities of each province to perform RWE analysis were assessed, considering currently available data holdings and resources such as dedicated personnel and funding (Table 5). Analysis capabilities were assessed separately according to the outcomes to be analyzed (based on those outlined in Table 2) as well as the route of administration of the study drug(s) (IV vs. oral). As shown in Table 5, a province's capability to perform RWE analysis differs according to the outcomes of interest being measured, the province's current data holdings and infrastructure, and the route of administration of the study drugs being evaluated. Many provinces estimated that they could not complete an RWE study for cancer drugs within 12 months with their current resources.

Stakeholder Consultation
In the public consultation with the interim data report, responses were received from stakeholders of pharmaceutical companies, industry consultancies, non-profit organizations, and patient groups. The call for feedback prompted respondents to identify additional relevant data elements that had not been listed in the report. Some data elements noted by stakeholders include race/ethnicity, physical activity, smoking, and alcohol, which are important risk factors for cancer and are useful to collect at a population level to implement preventative health policies. Other data elements such as progression, biomarker status, and overall response rate are important endpoints to understand cancer treatment and disease trajectory. While these data elements may be relevant for real-world analysis, many are not systematically collected within publicly owned population-based databases currently. It is worth noting that some of these data elements may be documented in patient charts, which can be harnessed using advanced methods such as artificial intelligence or machine learning methods. A full list of these stakeholder-identified data elements can be found in the Supplementary Table S2. While the focus of the CanREValue Data WG was on population-based administrative databases, respondents were also prompted to identify privately/academically held databases that could be used for RWE. Many respondents suggested additional Canadian or international databases, such as disease site-specific databases (e.g., the Canadian Melanoma Research Network), pediatric oncology databases (e.g., Pediatric Oncology Group of Ontario Networked Information System, POGONIS), and private databases (e.g., IQVIA and RxDynamics). These databases were compiled and shared with the public for researchers interested in conducting RWE research using privately/academically held databases (Supplementary Table S3). In the updated interim data report, the Data WG members also conducted a comparison of the identified data elements to the pan-Canadian Minimal Oncology Dataset (pCMOD) as suggested by respondents to understand the concordance between the necessary data elements [38] (Supplementary  Table S1).

Discussion
The CanREValue Collaboration's Data WG conducted a descriptive study to map the existing real-world population-level administrative data assets across Canadian provinces. An inventory of key data custodians and databases maintaining RWD throughout each province was compiled. Two data asset inventories were identified, one containing a list of minimally necessary data elements, and another containing an expanded list of relevant data elements for conducting cancer-specific RWE studies. In addition to the different availability of data elements for conducting real-world studies, the current capacity and capability within each province to perform real-world analysis also vary significantly. The majority of provinces/territories do not have the capacity to conduct RWE analyses within 12 months based on current resourcing, but most could complete an RWE analysis within 3 to 12 months if dedicated funding and personnel were available.
In Canada, there is growing interest in RWD. In 2018, CIHI published the pCMOD report that compiled a list of standard data elements that should be collected across the provinces for RWE generation [38]. Many of the data elements listed in the pCMOD were explored by the CanREValue Data WG, with some notable exceptions including data elements on the health care facility where the drug was received and prescriber information. In future iterations of the CanREValue data report, these data elements can be explored. Health Canada has also started several projects focused on the integration of RWE in drug regulatory decisions and the generation of RWE [2,46,47]. In a recent report published by Health Canada, several principles regarding the generation of decision-grade RWE were outlined including protocols around retrospective and prospective data collection [2]. The findings from this mapping exercise conducted by CanREValue Data WG can enhance previous work by CIHI and Health Canada. By exploring the existing availabilities of these data elements in each province, existing gaps within the data infrastructure that may benefit from future dedicated investments are also identified.
Our study aligns with international interests for developing RWE. The minimal dataset developed by Minimal Common Oncology Data Elements (mCODE) in the United States was created to standardize interoperability between electronic health record systems. mCODE includes data elements such as a genomics markers and laboratory results [48] that were not included in our report as they are not routinely collected in provincial administrative datasets. There have also been efforts to evaluate RWD holdings throughout Europe. The RWD holdings for most of the 160 cancer registries across EU countries have not been mapped; however, major differences in data quality are believed to exist between countries [49,50]. The minimal dataset recommended by the European Medicines Agency aligns with the minimal dataset presented in this report and includes many of the same data elements [49]. The European Network of Cancer Registries (ENCR) has also recommended essential and optional datasets specifically for tumor-based cancers. The Essential and Expanded datasets in this report are generalizable for most types of cancer but are still aligned with many of the data elements recommended by the ENCR. The ENCR's optional dataset contains the patient's occupation and risk for developing cancer, which was identified in our stakeholder consultation as an additional data element to be explored (Supplementary Table S3).
This work was a first step to understanding pan-Canadian data assets across all ten provinces, but there are few limitations. First, our work did not include databases from the territories or federal drug plans. Future work will be needed to explore the data assets held in these jurisdictions. Second, the reported assessment of completeness and quality of the data elements is based on a high-level review by the data experts in the WG. We anticipate that our knowledge of the data elements will be enhanced as we conduct a pilot real-world demonstration project that is currently underway. Based on our learnings, we may iteratively update the data report in the future. Finally, it is likely that some provinces may be limited in their access to databases and data elements that may not be routinely used for research purposes. As already identified by the prior qualitative study, there still remain silos within the data access process [33]. Forsea et al. proposed that an increase in stakeholder participation, increased political support from patient advocacy groups and health professionals, and the harmonization of datasets could improve RWD holdings across Europe [51].
Notwithstanding the limitations, our study is the first initiative to catalogue existing population-based databases and real-world data elements that can be used to conduct studies in oncology for the purpose of informing drug funding in Canada. Building upon insights and recommendations from previous studies, we partnered with provincial data experts to map out the existing assets and gaps of the current Canadian data infrastructure. This catalogue of existing data assets is an essential and practical first step towards the vision of a pan-Canadian interprovincial data platform that can generate RWE to inform cancer drug funding decisions. Future work can be carried out to explore the differences in population-level data elements between provinces and to address these gaps. Lastly, our work highlights the importance and success of collaboration between different jurisdictions and stakeholders and may serve as an example to promote future efforts to advance data infrastructure and access.

Conclusions
In conclusion, the CanREValue Collaboration's Data WG conducted a mapping exercise that identified a data asset inventory of databases and data elements that are required to perform real-world analysis. Moreover, the CanREValue Data WG also provided an estimate of the capacity and capability required to complete real-world analysis based on existing circumstances and future ideal state. Using findings from this process, the CanREValue Collaboration has initiated a pan-Canadian multi-provincial real-world study. Following the real-world study, the Data WG will update the tables of data elements based on our first-hand experience accessing and analyzing the data. With continued efforts from the CanREValue Collaboration, RWE could be used to better assess and refine cancer drug funding across Canada, thus supporting cancer drug sustainability and value for money.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/curroncol29030165/s1, Table S1: Comparison between pan-Canadian Minimal Oncology Dataset (pCMOD) and CanREValue Interim Data; Table S2: Additional real-world data elements requiring future exploration; Table S3: Potential private/academic databases for RWE analysis; Table S4: Survey on databases and data elements; Table S5: Survey on capacity assessment; Table S6: Glossary.