Multicenter External Validation of the Liverpool Uveal Melanoma Prognosticator Online: An OOG Collaborative Study

Uveal melanoma (UM) is fatal in ~50% of patients as a result of disseminated disease. This study aims to externally validate the Liverpool Uveal Melanoma Prognosticator Online V3 (LUMPO3) to determine its reliability in predicting survival after treatment for choroidal melanoma when utilizing external data from other ocular oncology centers. Anonymized data of 1836 UM patients from seven international ocular oncology centers were analyzed with LUMPO3 to predict the 10-year survival for each patient in each external dataset. The analysts were masked to the patient outcomes. Model predictions were sent to an independent statistician to evaluate LUMPO3’s performance using discrimination and calibration methods. LUMPO3’s ability to discriminate between UM patients who died of metastatic UM and those who were still alive was fair-to-good, with C-statistics ranging from 0.64 to 0.85 at year 1. The pooled estimate for all external centers was 0.72 (95% confidence interval: 0.68 to 0.75). Agreement between observed and predicted survival probabilities was generally good given differences in case mix and survival rates between different centers. Despite the differences between the international cohorts of patients with primary UM, LUMPO3 is a valuable tool for predicting all-cause mortality in this disease when using data from external centers.


Introduction
Uveal melanoma (UM) is a rare eye cancer occurring in adults, causing liver metastasis in approximately 50% of cases [1]. Patients' survival is directly related to the presence of hepatic metastases. After detection of metastatic disease, most patients die within a year, with only a few responding to current therapies [2].
There is some evidence that prognostication in UM improves the quality of life of some patients, even when the probability of survival is poor [3][4][5]. Prognostication is an important aspect of patient care, identifying high-risk UM patients requiring special care (e.g., increased frequency of liver surveillance using high-resolution imaging, enrollment in clinical trials of systemic adjuvant therapy including immunotherapies [6]), while allowing low-risk UM patients to be reassured and to have less intensive surveillance. Many predictive factors of metastasis from UM have been identified [3]. Several of these have been incorporated into our prognostic algorithm, the Liverpool Uveal Melanoma Prognosticator Online (LUMPO) (www.lumpo.net) [7].
LUMPO was developed to estimate survival probability in patients treated for UM, combining (a) anatomical predictors, such as largest basal diameter of the tumor, tumor thickness, ciliary body involvement and extra-ocular extension; (b) histological predictors, including epithelioid cell type, presence of closed loops and tumor mitotic count; and (c) genetic predictors, including chromosome-3 deletion and polysomy 8q [8,9]. The tool was validated in 2012 [7] with data from a cohort of patients with UM, with a follow up of more than 20 years at the Liverpool Ocular Oncology Clinic (LOOC).
The first externally available version of LUMPO was validated in 2015, at the Department of Ophthalmology, University of Medical Sciences in Poznan, Poland [10]. This validation study concluded that LUMPO is a useful tool for calculating survival probabilities in an individual patient with UM; however, the authors emphasized that the use of cytogenetic data, which were lacking in their analysis, would potentially improve the accuracy of the prognosis. In 2016, LUMPO was externally validated further by examining data from the USA, in a cohort of UM patients treated at the University of California, San Francisco (UCSF) [11]. Evaluation of these data revealed that there were differences between the two cohorts of patients with respect to anatomical and clinical characteristics, probably because these were not defined and measured in the same standardized fashion. There were Cancers 2020, 12, 477 3 of 13 also differences in the type of treatment provided to UM patients in the two centers, and, furthermore, genetic data were unavailable within the UCSF dataset at that time [11]. Despite these differences, the external validation showed that LUMPO accurately estimated all-cause mortality for UM patients treated at UCSF.
A revised version of LUMPO (called LUMPO3) was created, incorporating not only chromosome 3 but also 8q data and also calculating mortality using competing-risk methodology [12] This aspect is particularly relevant to prognostication in UM subjects, since in frail populations, such as elderly subjects, other causes of death may occur prior to the occurrence of the event of interest, thus preventing its realization. In that study, estimates of crude cumulative incidence from the raw data showed that metastatic death has a different pattern from death from other causes, thereby necessitating the need for a competing-risks model. Such a model facilitates prediction of metastatic death as a distinctive event from other causes of death. LUMPO3 was internally validated using bootstrap resampling [13], a nonparametric method that allows estimation of optimal model performance measures by random sampling with replacement of data used to fit the model.
The aim of this study was to perform an external validation of LUMPO3 as a tool for estimating all-cause mortality. All-cause mortality was selected as the primary outcome as it is a readily available outcome, obtainable from national records where relevant. All-cause mortality was estimated from LUMPO3 by aggregating the probability of metastatic death and death from other causes. To this end, the Liverpool Ocular Oncology Research Group (LOORG; wwww.loorg.org) facilitated collection of relevant independent data from members of the European Ophthalmic Oncology Group (OOG; www.oogeu.com) and ocular oncology centers located in the USA.

Patient Characteristics
The cohort comprised 1836 patients diagnosed with UM (ciliary body and choroidal). These included 1086 patients from Leiden (LUMC), 218 from Rotterdam (EMCH), 138 from San Francisco (UCSF), 138 from Rostock (UHSH), 134 from Moscow (HIED), 73 from Genoa (SCOO), and 49 from Essen (UHE). These data are shown in Table 1 together with characteristics of the original Liverpool dataset that was used for the development of the model for comparison purposes. Pooled estimates across the different cohorts are also provided. For the medians, the method described in [14] has been applied.
As seen in Table 1, compared to patients treated in Liverpool, those treated in Moscow tended to be more frequently female (Binomial Test: z = 3.421 (p = 0.001)), who were relatively young and with tumors having a greater basal diameter (T Test: t = 6.819 (p < 0.001) and t = 9.017 (p < 0.001) respectively). The latter was also true of patients from Genoa (T Test: t = 6.885 (p < 0.001)). A higher percentage of patients from Leiden (21%) had extraocular melanoma compared to those treated in other centers (Binomial Test: z = 52.75 (p < 0.001)). The prevalence of UM containing epithelioid cells also differed between the eight groups in which this feature was documented: it was significantly lower in tumors from San Francisco than those in the Liverpool data set (Binomial Test: z = 2.147 (p = 0.032)), and much lower than those from Rostock (Fisher's Exact Test (p < 0.001)). All UM from Genoa had epithelioid cells present, which is much higher than the Liverpool dataset (Fisher's Exact Test (p < 0.001)). Genetic data for the UM chromosome 3 status were available from all ocular oncology centers with the exception of Rostock (Table 1). Similarly, most centers also provided information concerning the status of chromosome 8q, with the exceptions of Rostock and Essen (Table 1). Of the cohorts with available genetic data, patients from Genoa had a higher percentage of alterations in both chromosome 3 and chromosome 8q than was seen in Liverpool (Binomial Test: z = 2.718 (p < 0.001) and z = 3.45 (p = 0.001) respectively). There was a moderate difference between the Liverpool and Rotterdam datasets in the percentage of alterations in chromosome 3 (Binomial Test: z = 2.341 (p = 0.02)) and significant difference for chromosome 8q (Binomial Test: z = 4.46 (p < 0.001)). The median follow-up period varied between the external cohorts (range, 0.7-5.2 years) with the shortest median follow-up time being from San Francisco (8 months). Kaplan-Meier curves for all-cause mortality based on the Liverpool dataset and the external datasets are shown in Figure 1. The datasets from Essen and San Francisco matched the Liverpool (development) dataset most closely.
Cancers 2020, 12, 477 6 of 13 The median follow-up period varied between the external cohorts (range, 0.7-5.2 years) with the shortest median follow-up time being from San Francisco (8 months). Kaplan-Meier curves for allcause mortality based on the Liverpool dataset and the external datasets are shown in Figure 1. The datasets from Essen and San Francisco matched the Liverpool (development) dataset most closely. .

Discrimination
The C-statistic, which examines the discriminative capacity of the model, was evaluated for all participating centers yearly up to 4 years ( Table 2)

Discrimination
The C-statistic, which examines the discriminative capacity of the model, was evaluated for all participating centers yearly up to 4 years ( Table 2)

Calibration
Calibration plots showing predicted probabilities of the outcome against actuarial survival estimates are shown in Figure 2. The plots show good agreement between observed and predicted probabilities. Limited event data in the Essen and Genoa datasets account for the wide confidence bands. Data from Leiden suggests that LUMPO3 over-predicted the survival probability while data from Moscow suggests that LUMPO3 under-predicted mortality, although the event rate was relatively low in the Moscow dataset.

Calibration
Calibration plots showing predicted probabilities of the outcome against actuarial survival estimates are shown in Figure 2. The plots show good agreement between observed and predicted probabilities. Limited event data in the Essen and Genoa datasets account for the wide confidence bands. Data from Leiden suggests that LUMPO3 over-predicted the survival probability while data from Moscow suggests that LUMPO3 under-predicted mortality, although the event rate was relatively low in the Moscow dataset.

Discussion
This is the first multicenter, international, collaborative study to validate and demonstrate the value of a multiparameter prognostic tool in UM-i.e., LUMPO3 developed on large well-phenotyped datasets and robust statistical modelling-for the individualized stratification of patients with respect to metastatic risk and all-cause mortality. To our knowledge, there currently are no other validated, multifaceted tools that take into account clinical characteristics, histopathologic, and genetic data to predict patient prognosis. Such tools are crucial for reliable decision-making for the identification of patients who may possibly be harmed (physically or psychologically) by inappropriate disease management. Although this is not a major concern in cancers that have a relatively good prognosis and have multiple treatment options with proven clinical benefit, it is an important determinant of clinical care.
Numerous prognostic factors have been identified for primary UM. These have been analyzed alone and in combination to predict the risk of metastasis. These factors can be divided into three main categories: clinical, histologic and genetic [16]. The resulting prognostic tools have led to personalized surveillance regimens [3,17,18] and targeted recruitment to clinical trials for adjuvant therapies.
Prognostic tools that combine multiple factors include the American Joint Committee on Cancer (AJCC) Tumor Node Metastasis (TNM) staging system for UM, which is based on only tumor size, location and extraocular spread. Genetic characteristics of UM are not included in this system as yet [19]. It is possible to improve the accuracy of prognostic tools by multivariable analysis. This is evidenced by the enhanced prognostic accuracy of the AJCC/TNM staging system when chromosome 3 and 8q status are included [20]. A prognostic nomogram combining AJCC/TNM staging, monosomy 3 and 8q gain has been developed but requires further validation using a larger study group [21]. Similarly, the largest basal tumor diameter was shown to provide additional prognostic information independently of the DecisionDx-UM gene expression profile (GEP) tool classification [22].
The National Comprehensive Cancer Network (NCCN) Clinical Practice Guidelines in Oncology for UM (National Comprehensive Cancer Network. NCCN Clinical Practice Guidelines in Oncology (NCCN Guidelines®) Uveal Melanoma Version 1.2018. Natl Compr. Cancer Network, Inc. 2018) stratifies patients as having a low, medium or high risk of metastasis based on a combination of anatomic, histologic and genetic features of the primary tumor. However, it would appear that this prognostic method has not been validated as yet. The Predicting Risk of Metastasis in Uveal Melanoma (PRiMeUM) tool employs a multivariate approach to predict the risk of metastasis developing within 48 months of treatment for the primary tumor. An accuracy of~85% (derived from Area Under the curve of the Receiver Operating Characteristic [AUROC] analysis) was achieved with a logistic regression model using a combination of clinical and genetic factors. However, the PRiMeUM tool also has yet to be externally validated [23]. Further, an artificial neural network has been created to predict survivorship 5 years from brachytherapy. The network incorporates demographic and clinical data only and again used only data collected at a single center. An accuracy of 84% was achieved (c-index 0.81) when 16 neurons were used in the artificial neural network [24].
GEP of 12 discriminating genes has been commercialized as DecisionDx-UM (Castle Biosciences) and classifies patients as at low, medium or high risk of metastasis. The GEP tool was validated in prospective multicenter studies [25,26]. The study by Onken et al. examined the correlation between number of events and GEP classification in UM patients with a short follow-up time of 17.4 months (median) [25]. Plasseraud et al., on the other hand, looked at correlations between pathologic characteristics and molecular class in UM patients with a median follow-up of 27.3 months [26]. However, neither of the GEP studies examined for the calibration aspect of providing accurate probability of survival. Despite these limitations, these studies did demonstrate early promise for the role of GEP in decision making in UM.
These previous experiences attest to the difficulty faced in studying the strongest prognostic factors for UM. The rarity of this disease makes it hard to collect a wide and comprehensive series of the prognostic factors, with great variations being seen in the modalities of diagnosis, histologic and genetic assessments, as well as in treatments during the observational period. However, with the availability of new regional therapies and targeted drugs, a simple and validated model for risk stratification of the patient such as LUMPO3 is urgently needed.
In this multicenter collaborative study, sufficient data were collected to perform a reliable validation of the prognostic accuracy of the LUMPO3 model. A limitation of this study is the relatively short follow-up time in some centers, because of the rarity of the disease, as well as its retrospective nature. Despite the differences between cohorts, the model's ability to discriminate between UM survivors and patients who died either from the disease or other causes was fair to good, as was the agreement between observed and predicted survival probabilities in most centers. Therefore, the LUMPO3 model is able to stratify the prognosis for UM patients and appears to be a valuable tool for predicting all-cause mortality in patients with UM. This model may therefore inform physicians' management when caring for UM patients, allowing for a better allocation of resources with respect to systemic surveillance.

Ethics
This study conformed to the principles of the Declaration of Helsinki. Approval for this study was obtained from the Health Research Authority (NRES REC ref 18/NW/0748) and anonymized data from consented patients were transferred from external centers according to local approvals.

Data Collection
In November 2017, a call for participation in this external validation of LUMPO3 was made to 14 centers involved in OOG and collaborative studies ( Figure 3). After an initial expression of interest by 11 centers, seven centers ultimately submitted their data for analysis. The study protocol was shared with the participating centers. The participating centers (Leiden University Medical Centre (LUMC), Leiden and Erasmus Medical Centre Hospital (EMCH), Rotterdam in the Netherlands, University of California San Francisco (UCSF), U.S.A., University Hospital Schleswig-Holstein (UHSH) in Rostock, Germany, the Helmholz Institute of Eye Diseases (HIED) in Moscow, Russia, S.C. Oculistica Oncologica (SCOO) in Genoa, Italy, and University Hospital of Essen (UHE) Germany) were asked to provide the following data: (1) demographic data-sex and age; (2) anatomical data-ultrasound or histopathological measurements of largest basal tumor diameter, tumor thickness, presence or absence of ciliary body involvement and presence or absence of extraocular extension; (3) histological data-presence or absence of extravascular matrix loops, presence or absence of epithelioid cells, and mitotic cell count (MITOC) per 40 high power fields (HPF); and (4) genetic data-chromosome 3 and 8q status. The MITOC was dichotomized as follows: 0-1/40 HPF = 1; 2-3/40 HPF = 2; 4-7/40 HPF = 3; >7/40 HPF = 4. Histological analysis was undertaken by all Centers using standard protocols, as previously described [9]. Full descriptions of how genetic data were obtained and classified (e.g., Fluorescence in situ hybridization (FISH) methods, Multiplex Ligation Probe Amplification (MLPA) [27] or other methods) were also requested.
Cases were pseudo-anonymized in accordance with local institutional policies and guidelines to export patient data. Cases were excluded if missing data included age, sex or basal tumor diameter as these have been established to be highly predictive of outcome. If any other variables are missing, they can be imputed using a model-projection framework as detailed in Eleuteri et al. 2018 [12]. The result of this imputation process will be reflected upon in the confidence interval-the more missing variables, the wider the interval. The data were transferred to the data manager (co-author MT) at the Liverpool Bio-Innovation Hub (LBIH) Biobank at the University of Liverpool (UoL), where patient identification and outcome were masked before passing the datasets to co-authors ACR and AT for LUMPO3 analysis. Using LUMPO3, ACR and AT predicted outcomes, which were then compared with the actual outcomes by a neutral Biostatistician mediator, LJB, to determine the performance of the LUMPO3 tool ( Figure 1). The comparative results were analyzed as below using statistical methods by LJB.

Statistical Analyses
Characteristics of the Liverpool (development) and external (validation) datasets were visually assessed for agreement. A Kaplan-Meier curve of all-cause mortality was also produced to evaluate event rates across the datasets.
The LUMPO3 model was designed by co-authors AE and AT to predict the probability of survival at yearly intervals for each UM patient [12]. The survival predictions were sent to the independent statistician (LJB) to undertake external validation using discrimination and calibration methods [28]. Discrimination refers to the ability of the prognostic model to differentiate between patients who died during this study and those who did not. The discriminative capacity of the model was measured using Harrell's C-statistic [15,29]. It is measured on a scale ranging from 0.5 (no better than chance) to 1 (perfect prognosis). A pooled estimate of discrimination was calculated using a random effects meta-analysis, which accounted for the correlation between studies [29]. Calibration refers to how closely the probability of the event predicted by the model agrees with the observed probability [28]. Calibration was assessed graphically [28]; if predicted and observed probabilities agree over the whole range of probabilities, the plots show a 45 • line. Statistical analyses were conducted using R statistical software version 3.5.0.

Conclusions
Despite the differences between cohorts, LUMPO3 appears to be a reasonably accurate and valuable tool predicting all-cause mortality in patients with UM. It should be noted that prognostic tools evolve as new information regarding tumor biology accrues. Whilst the genetic information incorporated into LUMPO3 are the copy number variations of chromosome 3 and 8, future versions of our tool are likely to incorporate key mutations as described in primary UM [30]. However, such revisions require sufficient data (and therefore time) for the revised algorithm to be made robust. We are also currently exploring the possibility of recalibrating the model, so that its predictions can be adapted to external data with different baseline hazard rates.