CA-125 Early Dynamics to Predict Overall Survival in Women with Newly Diagnosed Advanced Ovarian Cancer Based on Meta-Analysis Data

Simple Summary Cancer antigen 125 (CA-125) is a protein found at a high concentration in the blood of patients with specific types of cancer, mainly ovarian cancer. In 2004, the Gynecologic Cancer Intergroup (GCIG) proposed criteria defining response to treatment, as well as disease progression, based on the CA-125 concentration. Ever since, for the follow-up of ovarian cancer patients, the CA-125 concentration and/or CT-scans are used. This paper aims to compare different summaries of CA-125 evolution in the 3 to 6 months following treatment initiation in newly diagnosed advanced ovarian cancer and explore their prognostic capacity to predict overall survival. Based on individual patient data from the GCIG meta-analysis, we propose the most appropriate timeframe between follow-up and the prediction horizon in order to obtain robust, dynamic, individual predictions. Abstract (1) Background: Cancer antigen 125 (CA-125) is a protein produced by ovarian cancer cells that is used for patients’ monitoring. However, the best ways to analyze its decline and prognostic role are poorly quantified. (2) Methods: We leveraged individual patient data from the Gynecologic Cancer Intergroup (GCIG) meta-analysis (N = 5573) to compare different approaches summarizing the early trajectory of CA-125 before the prediction time (called the landmark time) at 3 or 6 months after treatment initiation in order to predict overall survival. These summaries included observed and estimated measures obtained by a linear mixed model (LMM). Their performances were evaluated by 10-fold cross-validation with the Brier score and the area under the ROC (AUC). (3) Results: The estimated value and the last observed value at 3 months were the best measures used to predict overall survival, with an AUC of 0.75 CI 95% [0.70; 0.80] at 24 and 36 months and 0.74 [0.69; 0.80] and 0.75 [0.69; 0.80] at 48 months, respectively, considering that CA-125 over 6 months did not improve the AUC, with 0.74 [0.68; 0.78] at 24 months and 0.71 [0.65; 0.76] at 36 and 48 months. (4) Conclusions: A 3-month surveillance provided reliable individual information on overall survival until 48 months for patients receiving first-line chemotherapy.


Introduction
Ovarian cancer (OC) is the seventh most common cause of cancer mortality in women worldwide, with a survival rate of 46% at 5 years after diagnosis [1]. First-line treatment consists of primary or interval debulking surgery and platinum-and taxane-based chemotherapy, which may be combined with maintenance treatments including bevacizumab or poly (ADP-ribose) polymerase (PARP) inhibitors, if applicable. Recently, PARP inhibitors have demonstrated very promising gains in progression free survival (PFS) and overall survival (OS), especially for patients with BRCA-mutated ovarian cancer [2][3][4][5]. Cancer antigen 125 (CA-125) is a marker mainly related to ovarian cancer, but it may also be elevated in other conditions. According to the Gynecologic Cancer Intergroup (GCIG) criteria, it is used to monitor the patient's response to treatment in cases of recurrent disease or to define progression after first-line therapy [6,7]. Although in 2010, the routine measurement of CA-125 was discouraged [7,8], two strategies for the follow-up of women after primary treatment are now used: (i) clinical follow-up with CA-125 and imaging on clinical indication or (ii) CT-scans throughout follow-up and optional CA-125 testing [9]. However, the contribution of CA-125 is debated, as clinical trials implement systematic CA-125 and CT-scan surveillance, especially in maintenance treatment settings. The best ways to analyze CA-125 decrease, its timeframe and reproducibility across trials, and the risk of error are insufficiently documented.
Assessing the prognostic value of the early evolution of CA-125 for time-to-event endpoints raises statistical challenges. The CA-125 trajectory can be summarized in multiple ways (e.g., CA-125 value at baseline, KELIM at 3 months, CA-125 value, or decrease at the prediction time (e.g., 3 and 6 months)), and its prognostic role has been studied in many isolated studies [10][11][12][13][14]. Nevertheless, the performance of these methods has never been evaluated in an individual patient data meta-analysis. CA-125 is a biomarker prone to measurement error and biological variation (i.e., some values may appear higher or lower than anticipated from the overall trajectory), so that raw summaries of observed CA-125 may be suboptimal for assessing CA-125 prognostic value. Furthermore, CA-125 may not be available at a particular timepoint, and an imputation method may be necessary [15]. Therefore, some authors have proposed reliance on modeling techniques that handle noisy and sparsely measured biomarkers [16,17] to better assess the evolution of CA-125. Recently, the use of the CA-125 ELIMination Rate Constant K (KELIM) [18], which is based on longitudinal CA-125 measures as well as pharmacokinetic and pharmacodynamic parameters of treatment, has gained important interest in multiple settings (in recurrent disease, for neo-adjuvant treatments, and for initial treatments). In the adjuvant and neoadjuvant settings, the prognostic role of the KELIM at 100 days (i.e., 3 months) has been tested in several trials [19,20]. Alternatively, statistical techniques such as mixed-effect modeling have been successfully developed in the case of prostate cancer to model the prostate-specific antigen dynamic over time [17,21]. A recent study proposed a CA-125 rate estimate that is easy to quantify in order to aid in decision making regarding second-line treatment for patients with recurrent high-grade serous ovarian cancer [22]. These statistical methods for longitudinal data are thus central to the assessment of the prognostic value of the trajectory estimated over various timeframes. Considering timepoints of less than three months could be challenging due to the scarcity of CA-125 measurements in this timeframe. Conversely, longer periods of monitoring of CA-125, such as six months, may improve the model's accuracy and the discriminatory value of the CA-125 decline. This may entail a CA-125 collection burden, which may lead to an elevated number of follow-up visits.
When studying the prediction of a time-to-event endpoint based on a prone-to-error biomarker, where the timeframe differs between the longitudinal process (a few months) and the survival endpoints (a few years), a landmark approach may be an appropriate approach. Briefly, its principle is to set up a landmark time from which the prediction is to be performed and restrict the population to event-free subjects at this timepoint. Summaries of the dynamics of the biomarker before the landmark time serve to predict progression or death after the landmark time [23,24].
When developing prediction tools, the assessment of the predictive performance needs to be carried out carefully. The value of the marker for predicting the risk of death can be measured using the Brier score and time-dependent area under the ROC curve (AUC) with estimators adapted to the time-to-event context [25][26][27]. The predictive performance may be overoptimistic when assessed on the same data used for training the model. To correct this bias, external validation or cross-validation techniques can be used [28].
This work aims to assess the performance of CA-125 early dynamics and determine the most appropriate landmark timepoint (i.e., the timeframe required to enrich the statistical prediction models) so as to predict OS at the best future timepoint in newly diagnosed patients with advanced ovarian cancer treated with taxane-and platinum-based chemotherapy. We used individual patient data (IPD) meta-analysis from the GCIG meta-analysis group study, which has a large and diverse sample size, permitting a cross-validation of the results between patients and studies as well as between subgroups based on the patients' characteristics.

Materials and Methods
This report follows the Preferred Reporting Items for Systematic Reviews and Metaanalyses (PRISMA)-IPD guidelines and the Transparent reporting of a multivariable prediction model for individual prognosis or diagnosis (TRIPOD) statement for the registration of the protocol, trial identification, data collection and integrity, assessment of bias, and sensitivity analyses [29,30]. This meta-analysis was registered with PROSPERO (CRD42017068135). The Ethics Committee of Gustave Roussy Cancer Center, Villejuif, France, approved this study, and the French Data Protection Authority waived the need for informed consent due to the use of deidentified data.

Study Population
To develop and validate a statistical predictive model of the dynamics of CA-125 in relation to OS, we used the IPD of the GCIG meta-analysis. This included patients with newly diagnosed ovarian cancer of the International Federation of Gynecology and Obstetrics FIGO stages IC to IV whose data were collected from randomized controlled trials published from January 2001 to September 2016 (GCIG meta-analysis of ovarian cancer [31]). From the initial set of 17 trials, we selected those that investigated initial systemic treatments after surgery (no maintenance treatments).
For the present analysis, only trials that collected serum CA-125 levels at baseline (before the start of the treatment) and repeatedly during follow-up were selected. Data collection followed the original research protocol. To be eligible, every patient had to have at least two CA-125 measures, including one at baseline, and complete information regarding overall survival and progression.
Data checking was performed to ensure the data quality. Levels of the biomarker greater than 15,000 were considered as outliers, and peri-operative CA-125 measures were excluded, as they are affected by the surgical intervention.

Outcomes
The primary endpoint was OS, defined as the time from randomization to death of any cause. Patients alive at the cut-off date were censored at the last follow-up date.
The prognostic value of the levels of CA-125 at 3 and 6 months from randomization and the rate of CA-125 decline were evaluated.
The model was developed based on patients treated with either the 'standard chemotherapy' (Paclitaxel + Carboplatin and Epirubicin/Doxorubicin) or the same chemotherapy with an investigational treatment, as none of the included trials showed significant differences in treatment effect between the two arms. A sensitivity analysis, after excluding the patients from the investigational arms, was carried out.
The following patient characteristics were available at baseline: age, performance status, FIGO stage, histological subtype and grade, and residual disease after surgery (≥1 cm, <1 cm, unknown).

Statistical Methods
In order to study the biomarker-OS association over time, a landmark analysis was performed, and different post-baseline times, noted as s, were considered [19]. All the available data up to s were modeled to predict the risk of death from time s to time s + t, with t denoting the horizon. The investigated prognostic factors based on the early CA-125 trajectory included (i) the observed CA-125 value at the landmark time, where if the former was missing, the closest previous measure was retained (OVLT), and (ii) the observed relative decline (ORD) in the biomarker between baseline and the last CA-125 measure before the landmark. In addition, three CA-125 summaries estimated using a hierarchical linear mixed-effect model were considered: (iii) the estimated value at the landmark time (EVLT) and (iv-v) the estimated slopes at baseline (ESB) and at the landmark time (ESLT).
For these three summaries, repeated measures of the log-transformed CA-125 were analyzed using a hierarchical linear mixed model (HLMM) [32], which handles the hierarchical structure of the data, the inherent measurement error, and missing values according to the missing-at-random mechanism. Three hierarchical levels were distinguished: (i) the between-trial variability, using random effects at the trial level, (ii) the within-patient correlation, using patient-specific random effects, and (iii) the variability due to observation-specific measurement errors.
To account for the nonlinear dynamics of log-CA-125 over time, we considered a basis of natural cubic splines over time with 3 or 4 internal knots for the landmark time at 3 or 6 months, respectively [33]. The knots were placed at the beginning of the follow-up (i.e., 0.5, 1, and 2 or 3 months for 3 and 4 knots, respectively) in order to capture possible rapid declines in the biomarker, and external knots were placed at the 2%-and 98%-percentiles of the measurement times. Knot selection was based on the Akaike criteria (AIC) of the model. For each patient, we included random effects on each natural cubic spline function to capture the patient-specific CA-125 deviation to the mean trajectory. For each study, a random intercept was further used to capture the trial-specific deviation.
The summaries were derived from the HLMM using the Best Linear Unbiased Predictor for the trial-and patient-specific random effects.

Evaluation of the Predictive Performance of the CA-125 Summaries
The summaries were included as fixed effects in a Cox proportional hazards model considering study-specific baseline hazards. Heterogeneity across studies of the associations between the summaries and survival was tested with a likelihood ratio test that compared the partial likelihood of the stratified model with the partial likelihood of the stratified model using trial specific summary effects [34,35].
As shown in Figure 1, the probability of death was predicted from the landmark time s at the horizon time s + t according to the CA-125 summary computed at time s, with s = 3 and 6 months and t = 24, 36, 48, and 60 months in a subset of studies with a sufficient follow-up. The predictive performance of these predictions was assessed using the AUC and Brier score in order to determine the most appropriate landmark time and to explore the robustness for various prediction horizon times. To account for censored times-to-event between the time s and the time s + t, estimators of these two quantities were weighted by the inverse probability of censoring (IPCW) [36]. Both measures were used to assess the quality of the prediction tool. The AUC can be seen as a concordance measure between patients with high and low risks of death [37], while the Brier score measures calibration and discrimination. Both measures range between 0 and 1. An AUC below 0.7 suggests a moderate discriminatory performance.
The calibration was assessed graphically: patients were split into groups defined by quantiles of the predicted event probabilities, and the results were plotted against the observed risk of death.
To correct for the over-optimistic performance obtained on the training datasets, we applied a 10-fold cross-validation technique on the patient level. The original dataset was partitioned into 10 sub-samples of the same size. For each sub-sample, the predictions were computed using the model trained on the 9 remaining sub-samples. The predictions from the 10 sub-samples were then pooled for the performance assessment. The average and standard deviation of the Brier score and the AUC over 50 replicates of the crossvalidation technique were finally reported to account for fluctuations in the partitions. The corresponding 95% interval of the bootstrap distribution was calculated.
To further investigate the added value of a summary of CA-125 dynamics, we finally compared the Cox model stratified by study and adjusted for the evaluated summary to either (i) the null model, i.e., a survival model stratified by study that served as a reference, or (ii) the same Cox model with the summary of the dynamics and with or without CA-125 level at baseline.
In a secondary analysis, the prognostic value of CA-125 was evaluated based on subgroups of patients defined by their baseline characteristics, such as the FIGO stage and residual disease.
Finally, for the known cutoff of 35 for the CA-125 value, we evaluated the positive predictive value (PPV) and the negative predictive value (NPV) at the analyzed time horizons.
The statistical analyses were performed using R (version 4.1.2, R foundation for Statistical Computing, Vienna, Austria), with R package RiskRegression for the predictive performances assessment and the lme4 package for the HLMM.

Study Selection and Characteristics
Among the 17 trials in the GCIG meta-analysis, repeated CA-125 measures were available for 13 trials. Fours trials that investigated maintenance treatments were further excluded. A total of nine trials were then selected, with four and five trials, respectively, investigating new initial treatments and the intensification of an initial treatment, leading to a total sample of 5573 patients.

Endpoint and Landmark Timeframe
We computed survival at the time horizons of 24, 36, 48, and 60 months for two landmark times, s = 3 and 6 months. A total of 5209 and 4946 patients were alive at these two landmark times, respectively, and included in the analysis set.

Performance of CA-125 Summaries
The cross-validated performances of the various CA-125 summaries are reported in Figures 2 and 3, and further results are described in the Supplementary Materials, Tables S2 and S3. Regardless of the summary in question, the AUC were relatively similar at the 24-, 36-, and 48-month horizon times, with a drop in performance at 60 months. In contrast, the shorter the prediction horizon was, the smaller the error in the prediction was, as measured by the Brier score. The model had a good calibration, as shown in Supplementary Materials Figures S1-S5.   Table S2).
The estimated slope showed a modest predictive accuracy. For instance, at 24 months, the estimated slope at baseline, without adding the CA-125 value at baseline, provided an AUC of 0. Of note, the AUC of 0.75 indicates that for two random patients, there is a 75% probability that the patient with the lowest CA-125 value at 3 months will have the longest survival when restricting to the window between 3 and 24 months.  Materials  Tables S4 and S5 and Figures S6 and S7).

Sensitivity Analysis and Predictive Values of the CA-125 Normal Range
The sensitivity analysis of the patients from the standard treatment arm showed equivalent results for both landmark times. (Supplementary Materials, Tables S6 and S7 and Figures S8 and S9).
Finally, we examined the sensitivity and specificity of the CA-125 value at 3 months. A patient with a CA-125 value < 35 had a 20% probability of death at 24 months, and 30%, 38%, and 66% at 36, 48, and 60 months, respectively. On the contrary, a patient with a CA-125 value > 35 had a 51% probability of death at 24 months, and 66%, 75%, and 79% at 36, 48, and 60 months, respectively.

Discussion
In patients with newly diagnosed FIGO stage II to IV ovarian cancer, the serum CA-125 level at 3 months after the end of primary treatment showed the best prediction of the overall survival probability at 24, 36, and 48 months compared to various measures of the CA-125 decrease rate. The last observed CA-125 value before or at 3 months and the estimate based on the repeated-measures model were associated with AUCs at 24 months of 0.749 and 0.745, respectively. A landmark time of 6 months did not improve the predictive performances, suggesting that early assessment provides the best trade-off between clinical application and statistical performance. The predictive capacity of both CA-125 summary measures remained equivalent after adding the CA-125 baseline value. Important improvement in the rate measures was observed after adding the CA-125 baseline value; for example, the observed rate of decline reached 0.738 at 24 months vs. 0.646 without the addition of the CA-125 baseline value. This reflects the importance of considering the CA-125 value at baseline in order to better interpret CA-125 kinetics. Despite the presence of heterogeneity, probably due to different inclusion criteria or healthcare standards between studies, the IPD meta-analytic context and the cross-validation allowed for robust results to be drawn.
A value of CA-125 above 35 at 3 months led to a higher risk of death, such as 51% at 24 months or up to 79% at 60 months.
In the case of 3-month surveillance, CA-125 kinetics could help in decision making regarding the intensification of the follow-up or the anticipation of a treatment change. More precisely, a patient with a CA-125 level concentration above 35 at 3 months would benefit from closer monitoring, with the measurement of the CA-125 concentration and CT scans in order to anticipate a potential need for treatment change. These patients may be good candidates for clinical trials upon progression.
Predictions at over 60 months post-landmark should be interpreted carefully, since all the measures performed poorly compared to the earlier prediction times. A lack of follow-up could explain such a performance. The main limitation is that recently approved drugs such as PARP-inhibitors were not included in this meta-analysis, since the trials were published between 2001 and 2016. Although the standard is still the chemotherapy backbone, maintenance treatment including PARP-inhibitors is now the standard of care. This may lead to different associations between CA-125 and OS. An update including trials with newly approved drugs, such as PARP-inhibitors, would be of interest to further quantify the prognostic ability of CA-125 in this context. Furthermore, we relied on linear mixed models to overcome the problems of missing and sparse data. Although we considered flexible functions of time and carefully assessed the goodness-of-fit, this approach is still based on parametric assumptions.
Recently, the subgroup of patients with HRD-or BRCA-positive status have gained great interest as a separate subtype and may lead us to question the robustness of our results for this subgroup [38]. This matter could be investigated through an update of the meta-analysis with the most recent agents. Finally, this meta-analysis is limited to serum CA-125, although other biomarkers with potential prognostic capacities, such as human epididymis protein type 4 (HE4), osteopontin, mesothelin (MSLN), and folate receptor α (FOLR1), can also be measured depending on the circumstances and state of the patients [39]. As a result, we concentrated our research on serum CA-125, which is routinely used in practice, and tried to quantify its prognostic capacity. Nevertheless, the same type of work could be extended, combining other markers in specific contexts to assess the risk of progression or death.
The assessment of KELIM in patients treated with maintenance veliparib and those treated with maintenance bevacizumab showed promising results that suggested a similar performance between KELIM and the reported chemotherapy agents. Nevertheless, further assessment of the predictive value of CA-125 at 3 months in the setting of PARP maintenance is required. The development of model-based approaches has been motivated by the impact of the variability between different assays used in routine practice on the interpretation of outcomes of the one-and two-timepoint strategies. Interestingly, the 3-month time window found in the present study is consistent with the 100-first-treatment-day period used for KELIM calculation.

Conclusions
To conclude, the surveillance of CA-125 at 3 months after the initiation of treatment could help to provide individual information for patients based on the initial CA-125 trajectory. More precisely, this could help to provide an informative tool so as to guide clinicians in decision making regarding overall survival up to 48 months.
Supplementary Materials: The following supporting information can be downloaded at: https:// www.mdpi.com/article/10.3390/cancers15061823/s1, Table S1: Description of patients' characteristics (N = 5573). Table S2: 10-fold cross-validated AUC (A,B) and Brier score (C,D) for the prediction of overall survival at 24, 36, 48 and 60 months from CA125 history up to 3 months when considering different CA125 summaries. Table S3: 10-fold cross-validated AUC (A,B) and Brier score (C,D) for the prediction of overall survival at 24, 36, 48 and 60 months from CA125 history up to 6 months when considering different CA125 summaries (Landmark time 6 months). Table S4: Predictive performances by subgroups for estimated CA-125 value at 3 months. Table S5: Predictive performances by subgroups for estimated CA-125 value at 6 months. Table S6: 5-fold cross-validated AUC (A,B) and Brier score (C,D) for the prediction of overall survival at 24, 36, 48 and 60 months from CA125 history up to 3 months when considering different CA125 summaries. Half population including only Standard regimen arm. Table S7: 5-fold cross-validated AUC (A,B) and Brier score (C,D) for the prediction of overall survival at 24, 36, 48 and 60 months from CA125 history up to 6 months when considering different CA125 summaries. Half population including only Standard regimen arm. Figure S1: Calibration plot for the observed rate of decline (ORD)-3 months. Figure S2: Calibration plot for the observed value at landmark time (OVLT)-3 months. Figure S3: Calibration plot for the estimated value at landmark time (EVLT)-3 months. Figure S4: Calibration plot for the estimated slope at baseline (ESB) (for landmark 3 months). Figure S5: Calibration plot for the estimated slope at landmark time (ESLT)-3 months. Figure S6: Predictive performances measured by AUC (A) and Brier Score (B) by subgroups for estimated CA-125 value at 3 months. Figure S7  Institutional Review Board Statement: This study was conducted according to the Preferred Reporting Items for Systematic Reviews and Meta-Analyses (PRISMA) [40]-IPD guidelines for the registration of the protocol, trial identification, data collection and integrity, assessment of bias, and sensitivity analyses. This meta-analysis was registered with PROSPERO (CRD42017068135). The Ethics Committee of Gustave Roussy Cancer Center, Villejuif, France, approved this study, and the French Data Protection Authority waived the need for informed consent for the use of deidentified data.
Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.
Data Availability Statement: Individual patient data (IPD) were requested for each eligible trial for all randomized patients. All data were checked with a standard procedure which follows the recommendations of the Cochrane working group on meta-analysis using individual patient data. Each trial was analyzed individually, and the resulting survival analyses and data description were sent to the trialists for review.