Treatment of Advanced Gastro-Entero-Pancreatic Neuro-Endocrine Tumors: A Systematic Review and Network Meta-Analysis of Phase III Randomized Controlled Trials

Simple Summary The most effective and safest approach for the treatment of advanced gastro-entero-pancreatic neuroendocrine neoplasms (GEP–NENs) remains unknown. A systematic review was done to clarify this point. A network meta-analysis was used to overcome the multiarm problem. Our study confirmed that somatostatin analogs (SSAs) alone remain the best choice for well-differentiated GEP–NENs. 177Lu-Dotatate plus SSA is a valid alternative for midgut NENs since it has been shown to be slightly more efficacious but yielding a higher risk for toxicity than SSAs. Abstract Several new therapies have been approved to treat advanced gastro-entero-pancreatic neuroendocrine neoplasms (GEP–NENs) in the last twenty years. In this systematic review and meta-analysis, we searched MEDLINE, ISI Web of Science, and Scopus phase III randomized controlled trials (RCTs) comparing two or more therapies for unresectable GEP–NENs. Network metanalysis was used to overcome the multiarm problem. For each arm, we described the surface under the cumulative ranking (SUCRA) curves. The primary endpoints were progression-free survival and grade 3–4 of toxicity. We included nine studies involving a total of 2362 patients and 5 intervention arms: SSA alone, two IFN-α plus SSA, two Everolimus alone, one Everolimus plus SSA, one Sunitinib alone, one 177Lu-Dotatate plus SSA, and one Bevacizumab plus SSA. 177Lu-Dotatate plus SSA had the highest probability (99.6%) of being associated with the longest PFS. This approach was followed by Sunitinib use (64.5%), IFN-α plus SSA one (53.0%), SSA alone (46.6%), Bevacizumab plus SSA one (45.0%), and Everolimus ± SSA one (33.6%). The placebo administration had the lowest probability of being associated with the longest PFS (7.6%). Placebo or Bevacizumab use had the highest probability of being the safest (73.7% and 76.7%), followed by SSA alone (65.0%), IFN-α plus SSA (52.4%), 177Lu-Dotatate plus SSA (49.4%), and Sunitinib alone (28.8%). The Everolimus-based approach had the lowest probability of being the safest (3.9%). The best approaches were SSA alone or combined with 177Lu-Dotatate.


Data Collection Process and Item
The following data were extracted to describe the characteristics of each study: first author, year of publication, acronym (if present), affiliation and country, population (the type of NENs), previous treatment with SSA, chemotherapy (CHT), or other therapy, previous resection of the primary tumor, study design, the sample size of each arm and the outcomes of interest reported. As the primary endpoints, we evaluated: a) PFS as a measure of efficacy; b) the grade 3 and 4 of toxicity as a measure of safety [3]. For PFS calculation, we measured the incidence density rate (number of events for ʺat-risk patientsʺ per unit time) to overcome different follow-up duration problems. This measure can be assimilated to the hazard rate for patients exposed. The rate ratio (RR) obtained from the ratio of two incidence density rates can be assimilated to the HR only for the exponential model (constant hazard functions) and absence of large differences in the average follow-up durations between the groups 4 . Dedicated software was used (GetData Graphical Digitizer @ , version 2.26) to extract the crude number of events and the period of observation from Kaplan-Meier curves. The secondary efficacy-related endpoints were a) rate of objective radiological response (ORR) defined according to RECIST 1.0 or 1.1 as the sum of partial and complete response (PR+ CR) [4,5]; b) rate of progressive disease (PD) according to RECIST 1.0 or 1.1 [5,6]; c) overall survival. As secondary endpoints of safety, we evaluated: a) adverse events (AEs) and serious adverse events (SAEs) [3]; b) the ʺon-treatmentʺ deaths (OTD) and the deaths drugrelated (DDR) defined as all deaths for any cause and related to the drug administration, respectively; c) drug discontinuation due to AEs (DDAEs).

Geometry of Network
The networkʹs geometry was plotted using one node for each arm and an edge that connected two nodes for each trial. The size of the node represents the number of patients included in each arm. The network geometry was preliminarily explored for all outcomes of interest to evaluate the presence of common nodes. When a common node was absent, the network was defined as disconnected, and this condition precludes the analysis in network modality. The network was also reported in a matrix form to obtain information about the contribution of included studies.

Risk of Bias within Individual Studies
The risk of bias within the individual studies was evaluated using a revised tool for assessing the risk of bias in randomized trials (Rob2) [7]. Two review authors (CR and LA.) independently assessed the risk of bias for each study using the criteria outlined in the Cochrane Handbook for Systematic Reviews of Interventions [8]. Each study was classified as follows: low risk, some concerns, or high risk.

Summary of Measures
All indirect and mixed estimates were reported as hazard ratios (HRs) or odds ratios (ORs) for survival and dichotomous outcomes. The HRs and ORs were expressed with 95% confidence intervals (CIs). An HR or OR with CIs crossing 1 or 0, respectively, indicated that the two competitive scenarios were equivalent. The network estimates (indirect and mixed) were reported in the forest plot [9] with CIs and predicting interval (PrI). The network results were reported first as "relative ranking probability," which represented the probability that each arm would be the best, the second, the third, and the worst with a certain degree of uncertainty for each outcome of interest. Thus, the surface under the cumulative ranking (SUCRA) curves and mean ranks were obtained starting these values. The SUCRA value, expressed as a percentage, showed the probability, without uncertainty, that each arm would be the best, based on the outcome analyzed [10].

Planned Method of Analysis, Inconsistency, Risk of Bias across the Study, and Additional Analyses
The PRISMA extension statement incorporating Network Meta-Analyses of Health Care Interventions was used to plan the analysis. Frequentist network meta-analysis was employed to compare all scenarios building a network for each outcome of interest [11]. The analysis was performed in two steps: first, all pairwise ("head-to-head") comparisons in each network were calculated to obtain the indirect and mixed estimates. Second, we calculated relative ranking probabilities, and thus SUCRA values were obtained [12]. The robustness of the networks was assessed by evaluating inconsistency, heterogeneity, and publication bias. The inconsistency was evaluated using the "loop" approach [13]. On the other hand, the restricted maximum likelihood method was used to estimate heterogeneity. The extent of heterogeneity in each network was evaluated by comparing the magnitude of a common heterogeneity variance for the network (tau [τ]) with an empirical distribution of heterogeneity variances, considering the range of expected treatment estimates (ORs and MDs). A τ value > 0.6 to was considered a high level of heterogeneity [14]. When the τ value was > 0.6, a multivariate meta-regression analysis was carried out to identify the reason for the heterogeneity in the outcome under study. Thus, all the covariates effects were reported using a small mean difference (SMD) coefficient and a pvalue. The algorithm adopted was based on the use of maximum residual likelihood (REML). For each covariate, we described, only when significant, the following parameters: SMD coefficient with standard error (SE). The SMD coefficient ± SE was related to the change of covariate value. If SMD was different from zero value, an increased o reduction of the covariate produced a positive or negative OR modification. A two-tailed p value < 0.05 was considered statistically significant. Considering the low number of included studies in previous meta-analyses, the p-value was recalculated using Monte Carlo permutation [15]. The number of permutation was 500 to obtain sufficient precision [16]. Publication/reporting bias was reported using an adjusted funnel plot. Each funnel plot was tested using Begg's test to identify whether the asymmetry was attributable to the small sample size effect. A two-sided p value < 0.05 indicated a significant small sample size effect [17].

Secondary Endpoints
The treatment with the highest probability of improving OS was Sunitinib, followed by 177 Lu-Dotatate plus SSA with a SUCRA value of 93.6 (mean rank = 1.4) and 87.7 (mean rank = 1.7), respectively. The worst approach was Bevacizumab plus SSA with a SUCRA value of 11.9 and a mean rank of 6.3. The ORR was evaluable only in 8 studies but in all clustered arms. The approach with the highest probability to obtain an ORR was Bevacizumab plus SSA (SUCRA = 88.3; mean rank = 1.7), followed by Sunitinib (SUCRA = 74.2; mean rank= 2.5), 177 Lu-Dotatate plus SSA (SUCRA = 68.6; mean rank= 2.8), IFN-α plus SSA (SUCRA = 59.0; mean rank = 3.5), the Everolimus-based one (SUCRA = 32.0; mean rank = 5.1), and SSA alone (SUCRA = 20; mean rank = 5.8). The approach with the lowest chance to obtain an ORR was a placebo (SUCRA = 6.9; mean rank = 5.8). The therapy with the highest probability to prevent a radiological progression of the disease was 177 Lu-Dotatate plus SSA (SUCRA = 90.6; mean rank = 1.6), followed by Bevacizumab plus SSA (SUCRA = 80.8; mean rank = 2.2), IFN-α plus SSA (SUCRA = 61.3; mean rank = 3.3), the Everolimus-based one (SUCRA =56.9; mean rank = 3.6), Sunitinib (SUCRA = 35.0; mean rank = 4.9), and SSA alone (SUCRA = 22.8; mean rank = 5.6). The approach with the lowest chance to prevent a PD was a placebo (SUCRA = 2.6; mean rank = 6.8). The treatment with the highest probability to avoid any AE was SSA alone (SUCRA = 93.6; mean rank = 1.1), followed by placebo (SUCRA = 74.1; mean rank = 2.0), Sunitinib (SUCRA value = 3.7; mean rank = 3.7), and 177 Lu-Dotatate plus SSA (SUCRA = 18.3; mean rank = 4.3). Data about the IFN-α or Bevacizumab arm are lacking. When considering SAEs, the worst approach according to the model is Bevacizumab plus SSA therapy (SUCRA = 0; mean rank = 7.0) while the best is SSA alone (SUCRA = 76.4; mean rank =2.4) followed by 177 Lu-Dotatate plus SSA (SUCRA = 65.6; mean rank = 3.1) and placebo (SUCRA = 60.6; mean rank = 3.4). Both IFN-α (SUCRA = 19.9, mean rank = 5.8) and Everolimus arm (SUCRA = 31.0, mean rank = 5.1) have less than 50% of the chances of being the safest approach. The approach with the lowest probability of being related to OTD was Sunitinib (SUCRA = 87.3; mean rank = 1.6) followed by placebo (SUCRA = 61.3; mean rank = 2.9), SSA alone (SUCRA = 56.8, mean rank = 3.2), IFN-α plus SSA (SUCRA = 43.8; mean rank = 3.8), 177 Lu-Dotatate plus SSA (SUCRA = 34.9; mean rank = 4.3), and the Everolimus-based one (SUCRA = 15.9; mean rank = 5.2). OTD was not evaluable for the Bevacizumab arm. The probability of being the safest approach minimizing the DDR was over the 50% for placebo (SUCRA = 64.3; mean rank = 3.1), SSA alone (SUCRA = 58.5; mean rank = 3.5), Sunitinib (SUCRA = 58.5; mean rank = 3.5), and 177 Lu-Dotatate (SUCRA = 56.7; mean rank = 3.6). DDR incidence could be higher when the therapy was based on Everolimus        The results are reported as Odds ratios (ORs) and 95% confidence intervals (CIs). The blue line (line of null effect) is equal to 1. The solid black lines represent the CIs while the diamond summarises the ORs. For each pairwise comparison, the forest plot should be read as following: if the diamond with the entire CIs did not reach the blue line of null effect, there is a significant difference. If the entire CI is on the left of the null effect, the mortality rate is significantly higher in the "intervention arm" while, when the entire CI is on the right, the event is statistically more frequent in the "reference arm." When the entire CI crosses the null effect line, the difference between the two procedures compared is not statistically significant. Besides, a red line reports the Predictive Interval (PrI), namely the interval within which the estimate of a future study is expected to be.