Validation in Alberta of an administrative data algorithm to identify cancer recurrence.

Background
Readily available population-based data about cancer recurrence would improve surveillance and research for women of reproductive age.


Methods
We randomly selected 200 women from the Alberta Cancer Registry who had received a cancer diagnosis and who ever had a pregnancy between 2003 and 2012. Administrative data were obtained and linked. Several definitions of recurrence were assessed using various minimum lengths of time between the initial diagnosis date and subsequent diagnoses or treatments, or both. Chart review was used as a "gold standard" definition of recurrence.


Results
Chart review identified recurrences in 26 women. The definition that best captured "recurrence" was 2 or more cancer diagnosis codes 10 or more months from the diagnosis date [sensitivity: 80.8%; 95% confidence interval (ci): 60.7% to 93.5%; specificity: 81.0%; 95% ci: 74.4% to 86.6%; positive predictive value: 38.9%; 95% ci: 25.9% to 53.1%; negative predictive value: 96.6%; 95% ci: 92.2% to 98.9%; kappa = 0.42; 95% ci: 0.28 to 0.57].


Conclusions
Recurrence in reproductive-aged women can be captured with moderate validity using administrative data, but should be interpreted with caution.


INTRODUCTION
Cancer recurrence, risk of death, and disease-free survival are common endpoints used in cancer studies to evaluate the effect that cancer might have on an individual's long-term health 1 . The use of population-based data, including cancer registries and administrative databases, has become common in the investigation of cancers. By investigating all applicable cases over a long study period or large geographic area (that is, population-based retrospective cohorts), researchers can obtain a wealth of information at a relatively small cost 2 . However, cancer registries cannot typically record recurrence for a number of reasons, including cost, volume of cases, and movement of patients 3,4 , leaving researchers with three options: n Review personal health records, which have higher validity and reliability, but can be costly and inefficient for large sample sizes 2 .
n Apply algorithms to administrative data to identify evidence of recurrence, a method that has been shown to have variable levels of validity [5][6][7][8] . n Forgo the investigation of recurrence, a choice that critically affects patients and providers.
In the case of cancers in women of reproductive age, the risk of recurrence is an important consideration in how an affected woman might plan future pregnancies after a cancer diagnosis 9 -that is, within what timeframe would she be most likely to have a cancer-free pregnancy? With cancer being the 2nd leading cause of death in women of reproductive age and there being a long time for potential recurrence, reaching a better understanding of recurrence becomes increasingly important 10 .
In the present study, we assessed the ability of administrative data, compared with personal health records, to accurately ascertain cancer recurrence in women of reproductive age.

Study Population
Individual-level data were obtained from the Alberta Cancer Registry (acr). The study population consisted of women 18-50 years of age from Calgary, Alberta, who had a cancer diagnosis (excluding non-melanoma skin cancer and cervical carcinoma in situ) and who ever had a pregnancy reaching at least 20 weeks' gestation between 1 January 2003 and 31 December 2012. The study sample consisted of 200 women randomly selected from the cohort meeting those conditions. All women included in the analysis had a minimum of 5 years of follow-up after their initial cancer diagnosis. Ethics approval for the study was granted by the Health Research Ethics Board of the Alberta Cancer Committee.

Data Sources
The acr is a population-based registry that collects detailed information for all new cancer cases in the province of Alberta; it has received gold star certification from the North American Association of Central Cancer Registries, with more than 95% case ascertainment 11 . New cases are identified by, and forwarded from, physicians and laboratories in the province, from visits to cancer facilities, and from electronic linkage to Alberta Vital Statistics for information about deaths 12 . Variables collected include cancer site, morphology, cancer stage, patient demographics, and diagnosis date.
The National Ambulatory Care Reporting System (nacrs) contains data on all emergency department visits, day surgeries, and some hospital-and community-based outpatient clinics. The nacrs is available in 8 of the 13 Canadian provinces and territories 13 . The data are abstracted from hospital charts by trained medical records personnel and include admission and discharge dates, and up to 25 diagnoses and 15 procedures for each episode of care 13 . Diagnoses are coded using the Canadian modification to the International Statistical Classification of Diseases and Related Health Problems, 10th Revision 14 . Procedures are coded using the Canadian Classification of Health Interventions 15 .
Personal health records are collected and maintained by Alberta Health Services and contain information about an individual's interactions with the health care system, such as reports from health care providers, treatment information, and test results.

Defining Cancer Recurrence
Algorithms for identifying cancer recurrence were based on previous literature that used cancer registries and administrative data holdings for coding recurrence based on various combinations of treatment dates and diagnosis or procedure codes 4,7 . The definitions were then adapted for the data holdings in Alberta by the study analyst in consultation with a gynecologic oncologist. Recurrence was identified in nacrs by measuring the time from the diagnosis date (as recorded in the acr) to either subsequent cancer diagnosis codes or subsequent cancer diagnosis codes plus a procedure code for a physical or physiologic therapeutic intervention (Section 1 of the Canadian Classification of Health Interventions). We tested definitions ranging from 6 to 18 months between the diagnosis date and any subsequent cancer diagnosis codes to identify three definitions: high sensitivity, high specificity, and high sensitivity and specificity. Presence or absence of cancer recurrence was assigned to each patient once for each case definition.

Chart Review
Patient charts were considered the reference standard for all analyses. Patient charts, beginning after the diagnosis date, were reviewed by a gynecologic oncologist blinded to the potential recurrence status to determine if a woman experienced a cancer recurrence. Cancer recurrence was documented if a progress note or treatment plan in the chart clearly indicated a recurrence. A standardized data extraction sheet was used to record whether the cancer had recurred, and if it had, the first date of the recurrence.

Statistical Analyses
Proportions and 95% confidence intervals (cis) are used to describe relevant characteristics of the study population, including age, cancer site, and cancer stage at diagnosis. To assess the validity of nacrs compared with patient charts, we calculated sensitivity, specificity, positive and negative predictive values, and kappa coefficients for each definition across all cancer types. A stratified analysis was also conducted for breast cancer, the most prevalent cancer type. In the primary analysis, patients without available charts were assumed not to have experienced recurrence. A sensitivity analysis was conducted restricted to the subset of patients who had a record in the acr and who had at least one chart to review. Additional sensitivity analyses considered subsequent cancer diagnoses only when cancer was responsible for most of the services used.

RESULTS
Table i presents descriptive characteristics for the sample. Mean age of the patients was 33.8 ± 6.6 years. The most prevalent cancers in the sample were breast (32.5%, n = 65), thyroid (24.0%, n = 48), and melanoma of skin (11.0%,

DISCUSSION AND CONCLUSIONS
We randomly selected and reviewed the charts of 200 women of reproductive age included in the acr to determine the validity of multiple case definitions for identifying cancer recurrences. The results of the study are encouraging for future population-based research into cancers in women of reproductive age. Of the case definitions assessed, the definition using a minimum of 10 months between the diagnosis date and 2 or more subsequent cancer diagnosis codes had the best overall validity. That definition can be used in Alberta for women of reproductive age, but should be interpreted with caution; if used in studies outside Alberta, the definition should be re-validated before use.
Results of this study are unsurprising, given the mixed results in the existing literature where similar methods were used 4 . A review of the measurement of cancer recurrence based on administrative data in the United States found that the success of algorithms is often encouraging in small single-site studies 5,6 , but can encounter high rates of misclassification in larger population-based studies 7,8 . To account for how health data are coded differently in different jurisdictions, algorithms identifying recurrence in administrative data must therefore be validated in every health system before use. The objective of the present study was to test whether an algorithm could be created to identify any recurrence in women of reproductive age; the creation of multiple definitions specific to various cancer sites was therefore outside the scope and resources of the study. However, existing literature shows that the validity of definitions can often be improved if based on site-specific diagnoses, procedures, treatments, and recurrence patterns 5,7 .
Our study has several limitations. In Alberta, cancer data are recorded largely in patient charts, not administrative data. Although we were able to use nacrs to develop moderately accurate definitions, the data lack detail. For example, cancer diagnoses recorded 1 year after the initial diagnosis date might reflect recurrence, progression, or refractory cancer, or might simply have been an indication in a chart that a patient previously had cancer. To account for the latter situation, we conducted a sensitivity analysis using only most-responsible diagnosis codes, but the result One or more cancer diagnosis codes and a treatment procedure code 9 or more months from diagnosis. c Two or more cancer diagnosis codes 10 or more months from diagnosis. CI = confidence interval.
was a decrease in validity. The study was limited to women of reproductive age, and so results are not generalizable to the larger population, given different treatment and recurrence patterns. Nevertheless, the available literature about algorithms to detect recurrence use many of the same overarching methods, and therefore can and should be explored to expand research about recurrence across various age groups. Because of a small sample size, we were unable to test definitions in cancers other than breast cancer. Risk of recurrence is a critical outcome in cancer epidemiology, a field that is increasingly using population-based methods. Lack of information about recurrence precludes evidence-based discussions between physicians and affected patients, especially patients with rare cancers in which population-based surveillance systems are the primary mechanism to evaluate the natural history of the disease and the effect of various treatment strategies on clinically relevant outcomes such as survival and recurrence.