At the core of personalized medicine is a belief that genome-based medicine will lead to greater efficiencies in healthcare via informed predictions about individuals’ susceptibility to disease, risk of progression, and treatment outcomes. An underlying assumption associated with this belief is that the molecular diagnostic tests used to analyze cellular biomarkers or genetic alterations are clinically validated, precise, and provide reliable information to healthcare providers, enabling them to correctly assess risk and make better-informed treatment decisions. The current global regulatory framework for molecular diagnostic tests, including companion diagnostics, is fragmented and inconsistent. Challenges still exist toward ensuring the quality, safety, and effectiveness of molecular diagnostic tests, due to lack of uniform evidence requirements by the various regulatory entities that oversee the development and provision of diagnostic tests, and the clinical laboratories in which the tests are performed [1
]. Additionally, there is no standard health technology assessment (HTA) process for evaluating the value of molecular diagnostics, and there is a lack of guidance on how to measure the benefits of molecular diagnostic tests, appropriate study design, or test performance requirements [3
]. While it is well-understood that molecular diagnostics are a critical component to personalized medicine, the test performance and value of many of the tests routinely used to inform patient care is uncertain.
In the US, one way that in vitro diagnostic tests (IVDs), including molecular diagnostics, may be commercialized for clinical use, is upon approval or clearance by the US Food and Drug Administration (FDA) per the Medical Device Amendments to the Federal Food, Drug, and Cosmetic Act (FD&C Act) [4
]. As part of a premarket approval application, manufacturers are required to conduct rigorous technical performance validation studies (e.g., accuracy, reproducibility, reliability, sensitivity, specificity, limit of detection, inhibition, inclusivity, stability, etc.) to robustly demonstrate a test’s analytical validity (how well a test detects the presence of the intended analyte) and clinical validity (how well the presence or absence of the intended analyte predicts a clinical condition or predisposition in a patient) [2
]. Separately, hospitals, universities, and commercial laboratories may use their own components and procedures to develop diagnostic tests for commercial use within a single laboratory facility irrespective of whether a FDA-approved IVD is available for the same purpose; these are referred to as laboratory developed tests (LDTs) [4
]. Laboratories that develop their own tests used for clinical testing of patient specimens are regulated by the Clinical Laboratory Improvement Amendments (CLIA) program, primarily overseen by the Centers for Medicare and Medicaid Services (CMS). The CLIA program seeks to ensure the quality of laboratory facilities through focusing on quality control of testing procedures and appropriate training of laboratory personnel. Unlike FDA requirements, the CLIA program does not necessarily require demonstration of a test’s analytical and clinical validity, which often involves complex and multi-site trial designs [2
]. Compliance with CLIA regulations may attest to quality standards of the laboratory facility and personnel, but does not ensure that LDTs are accurate and reliable in aiding clinical decision-making. There is no systematic assessment process in the US for LDT accuracy and test performance (sensitivity and specificity). As such, there is limited evidence available in the public domain regarding the performance of most LDTs routinely used to diagnose disease or aid in clinical decision-making [2
Along with the proliferation of many new targeted cancer therapies, there has also been a proliferation in the number of highly complex molecular diagnostic tests that detect clinically relevant tumor biomarkers and aid in the identification of patients for targeted therapy [5
]. For example, activating mutations in the tyrosine kinase domain of the epidermal growth factor receptor (EGFR) have been identified as an oncogenic driver in non-small cell lung cancer (NSCLC) cases [6
]. First and second generation anti-EGFR tyrosine kinase inhibitors (TKIs) (e.g., erlotinib, gefitinib, afatinib) are first-line therapies for patients with EGFR mutation positive NSCLC, while conventional chemotherapy is recommended for patients who are EGFR wild type [7
]. International treatment guidelines call for molecular diagnostic testing for the detection of EGFR-sensitizing mutations, as an aid to treatment selection for NSCLC patients with non-squamous histology [7
]. LDTs for EGFR mutation testing are common and may be developed using polymerase chain reaction (PCR) or sequencing techniques. Very little information is available regarding the test performance of these LDTs, and there are no clinical guidelines about which testing platform or method offers optimal results [10
]. Given the importance of EGFR mutation testing for therapy selection and the differential safety and effectiveness of TKI therapies compared to conventional chemotherapy for the treatment of metastatic NSCLC, there are significant clinical and economic consequences for incorrect (false positive (FP) and false negative (FN)) molecular diagnostic test results. In the case of EGFR mutation status misclassification, the consequence of FN results is greatest when patients with EGFR mutations are incorrectly classified as wild type and treated with chemotherapy, denying them the survival benefits associated with TKI therapy. In addition to erroneous results, invalid or delayed results due to technical errors and/or the presence of inhibitors also pose a challenge for the laboratory (need to re-run samples), and to patients (delay in initiation of appropriate therapy, or an additional biopsy if no residual sample is available).
Although diagnostic errors are common across healthcare settings, the topic has only recently received more attention due to a series of notable public health incidents where inaccurate diagnostic test results have caused harm to patients [1
]. In 2014, the FDA stated intent to issue a new regulatory oversight framework for higher-risk LDTs, including companion diagnostics. However, the final guidance has not yet been released; it is uncertain if and when FDA will release the final guidance on this topic. As a result, inconsistencies in regulatory oversight and uncertain molecular diagnostic test performance remain open policy issues. The objective of this study was to use available data from the published literature in a case study to assess potential clinical and economic consequences of inaccurate EGFR mutation test results with LDTs compared to a FDA-approved IVD among a hypothetical cohort of newly diagnosed metastatic NSCLC patients in the US.
Using the referenced data inputs, it was estimated that 2.4% (n
= 1051 FP, 371 FN) of 60,502 patients in the US with newly diagnosed metastatic NSCLC and tested for EGFR mutation would be misclassified if all patients were tested with LDTs compared to 1% (n
= 353 FP, 224 FN) of patients if the same cohort was tested using the FDA-approved cobas EGFR Mutation Test. Figure 2
shows the individual patient probability of FP or FN test results from LDTs relative to the cobas test.
Additionally, it was estimated that 0.6% (n = 378) of the patient cohort tested with LDTs would have unresolved invalid tests and would be assumed as treated with chemotherapy. Among these patients, it was projected that 72 would actually have an EGFR mutation and therefore be incorrectly treated.
As a result of inaccurate and invalid diagnostic test results and subsequently being treated with an “incorrect” treatment regimen, it was estimated that if the entire patient cohort was tested with LDTs, it would lose at least an average of 477 progression-free life years (PFLYs) compared to 194 PFLYs if the cohort was tested with the FDA-approved test. This translated into approximately four months of lost PFS per any misclassified patient. When the survival was quality-adjusted to account for the impact of treatment-related severe adverse events, it was projected that the cohort tested with LDTs would lose at least an average of 319 quality-adjusted progression-free life years (QAPFLYs) (approximately five months of quality-adjusted PFS per misclassified patient) compared to 131 QAPFLYs (approximately three months of quality-adjusted PFS per misclassified patient) with the FDA-approved cobas test.
If the national analytic cohort of 60,502 patients was tested for an EGFR mutation with LDTs, the total aggregate treatment cost (drugs, drug administration, adverse events) to Medicare was estimated at $2,599,931,837 compared to $2,592,625,528 if the cohort was tested with the FDA-approved test. The difference of approximately $7.3 million in aggregate treatment costs between testing with LDTs and the FDA-approved cobas test was driven by higher drug costs among patients who tested FP and were incorrectly treated with EGFR TKI therapy, as well as higher costs to treat adverse events among patients who tested FN and were incorrectly treated with chemotherapy. Approximately 3% and 1% of the total aggregate treatment cost associated with LDTs and the FDA-approved cobas test, respectively, was attributed to misclassified patients. Figure 3
shows the difference in treatment costs per tested patient with LDTs compared with the cobas test in the base-case and scenario analyses.
The scenario analyses show that if the average test performance of EGFR mutation LDTs were approximately 61% sensitive and 84% specific, an estimated 20% (n
= 12,247) of the 2015 US patient cohort tested for EGFR mutations with LDTs were projected to be misclassified, 12.9% FP (n
= 7792) and 7.4% FN (n
= 4455). Consequent to the misclassification and incorrect treatment, an average of 4104 PFLYs or 2758 QAPFLYs would be lost among this patient cohort relative to all patients correctly classified; 23% (~$607 million) of total aggregate costs would be attributed to misclassified patients. Subsequently, if LDTs were 84% sensitive and 61% specific [32
], an estimated 34.4% (n
= 18,993 FP, 1828 FN) of the patient cohort would be incorrectly treated due to inaccurate test results, with a projected loss on average of 5848 PFLYs or 3839 QAPFLYs. It was estimated that 39% (~$1 billion) of aggregate treatment costs would be attributed to misclassified patients with a significant proportion attributed to higher drug costs for patients incorrectly treated with EGFR TKI therapy and higher costs to treat adverse events among patients incorrectly treated with chemotherapy. If LDTs had a higher invalid rate of up to 20% (sensitivity 98.1%, specificity 99.3% assumed in the base-case), it was estimated that 0.8% (n
= 491) of the national analytic patient cohort would have an unresolved test and be treated with chemotherapy by default. If EGFR mutation prevalence is 19% [16
], then it was estimated that 93 of these patients would be incorrectly treated.
We developed a decision analytic model to evaluate the probability of diagnostic error with LDTs for EGFR mutation testing compared to a FDA-approved test (cobas EGFR Mutation Test). We applied the decision analytic model to estimate the clinical and economic consequences of inaccurate test results on a cohort of patients with newly diagnosed metastatic NSCLC in the US. The primary limitation of the analysis was the lack of published data regarding test performance and the accuracy of the numerous EGFR mutation LDTs available across various hospitals, laboratories, and medical centers. For the base-case analysis, we used the best available data from a clinical validation study of the cobas test in which the study design compared the cobas test results retrospectively to results from LDTs used in the EURTAC clinical trial. This validation study provided a unique dataset from a direct comparison of the cobas test and LDTs for EGFR mutation testing. We noted that the LDTs used in the EURTAC clinical trial had similar sensitivity and specificity to the cobas test with only a slightly higher invalid test rate (cobas test: sensitivity 98.1%, specificity 99.3%, invalid rate 8.9%; LDTs: sensitivity 96.8%; specificity 97.8%, invalid rate 15.6%); we used these estimates in the base-case analysis to understand the clinical and economic impact of even small differences in test performance.
With sparse evidence describing the overall test performance of LDTs, it is uncertain how well the LDTs used in the EURTAC clinical study reflect the quality and real-world test performance of the various LDTs used across different laboratories for EGFR mutation testing. In Europe, many countries have external quality assessment (EQA) programs that utilize an independent external agency to objectively check laboratory results and testing methods [33
]. In one study that evaluated one hundred and seventeen labs across thirty European countries for EGFR mutation testing, only 72% of the laboratory participants passed the quality assessment, with false negative and false positive results being the main sources of error [33
]. In another EQA conducted in the United Kingdom, 24% of labs had genotyping errors in the first run, 6.7% in the second run, and 6.4% in the third run. The assessment observed there was a range of testing methodologies applied across different labs and wide variation in the degree of interpretation provided on the test reports [34
]. Given that the US does not have similar systematic quality assessment programs, we had very limited information about the robustness of laboratory methodologies and the quality of laboratory-developed “home-brew” tests. Given the uncertainty with LDT performance, we conducted scenario analysis to evaluate the impact if LDT performance varied. For the scenario analysis, we assumed LDT performance estimates derived from a Diagnostic Assessment conducted in the UK, which identified only six studies in the published literature that provided data on the accuracy of EGFR mutation testing for predicting response to TKI therapy [32
The base-case analysis showed that even very low individual patient probabilities of inaccurate test results (FP or FN) led to clinical and economic consequences at the population level in terms of the aggregate impact of incorrect treatment, negative clinical outcomes, morbidity and pre-mature mortality. Invalid test results were also impactful, due to greater probability that the uncertainty led to patient misclassification and incorrect therapy.
The magnitude of impact of inaccurate testing estimated in this analysis is likely not generalizable across all molecular diagnostic tests and tumor types, as the clinical and cost consequences of patient misclassification largely depend on the differential safety and efficacy of the indicated treatment regimens and the size of the population afflicted. For certain assays, high sensitivity (minimize FN) will be more important than specificity toward ensuring appropriate treatment of patients, whereas in other cases, high specificity (minimize FP) is a priority, in order to minimize patient harm and achieve optimal treatment outcomes.
A limitation associated with using the PFS endpoint is that it fails to capture survival time post-disease progression, and it was therefore likely that this study underestimated the “true” burden of inaccurate EGFR mutation tests on society, as the analysis also did not capture indirect or opportunity costs, nor other quality of life impacts associated with diagnostic error, incorrect treatment or treatment uncertainty. Given the available data, this study provided a base-line estimate of the impact of inaccurate EGFR mutation testing and highlights the importance of a holistic total cost of care perspective. When laboratories make decisions about product adoption, a primary focus on utilizing test platforms with lower adoption costs favoring LDTs fails to take into consideration the potential down-stream costs to patients and the broader healthcare system if LDT performance is uncertain relative to clinically validated FDA-approved products. From a total cost of care perspective, cost-savings in the laboratory budget may translate into unnecessary spending (medical waste) elsewhere in the system (e.g., pharmacy or hospital budget). Toward this end of reducing the societal burden of inaccurate testing, priority should be placed on adopting diagnostic tests with robust evidence of clinical validity and demonstrated analytical accuracy and reliability. This study is not intended to suggest that LDTs are somehow “bad” and should be avoided, as we recognize that LDTs have a significant role in diagnostics and are important for many applications, such as rare disease testing or public health crises when regulatory-approved, commercial test kits are not available. The intent of this analysis was to highlight the differences between evidence-based requirements and test performance data between regulatory-approved tests and LDTs, and to use a case example of EGFR mutation testing to demonstrate the potential clinical and economic consequences of incorrect treatment decisions due to diagnostic tests with uncertain test performance.
Given the sparse level of evidence for many LDTs routinely used to guide clinical decision-making, the value of molecular diagnostic tests should not all be perceived as equal. FDA-approved IVDs with robust supporting evidence are differentiated due to a greater certainty in their ability to provide the correct results and thereby improve patient outcomes and healthcare efficiency. Vyberg and colleagues analyzed the socioeconomic consequences of inaccurate HER2 test results between regulatory-approved tests and LDTs for the treatment of breast cancer, and suggested that using regulatory-approved HER2 tests rather than LDTs could result in annual savings of $46 million, largely due to correct treatment with trastuzumab and avoiding treatment costs associated with disease recurrence and progression. Vyberg, et al. also suggested that for every $1 saved by laboratories using cheaper LDT reagents, the healthcare system is potentially burdened with approximately $6 in additional costs due to inaccurate testing and incorrect treatment [35
]. Garrison and colleagues also examined the clinical and economic consequences of inaccurate HER2 testing on US patients with early-stage breast cancer and found that incorrect HER2 testing may contribute to total societal loss of up to $1 billion among a cohort of 12,025 misclassified patients [36
]. In-line with our findings, Garrison, et al. demonstrated that the consequences of FP and FN test results differ such that FP results led to the use of HER2-targeted therapy for patients with little chance of benefit and yielded an increased risk of adverse events and higher treatment costs. Conversely, FN results denied patients potential quality of life and survival benefits associated with targeted therapy, and led to increased risk of disease recurrence and progression to metastatic breast cancer [36