A Novel Serum Metabolomic Profile for the Differential Diagnosis of Distal Cholangiocarcinoma and Pancreatic Ductal Adenocarcinoma

The diagnosis of adenocarcinomas located in the pancreas head, i.e., distal cholangiocarcinoma (dCCA) and pancreatic ductal adenocarcinoma (PDAC), constitutes a clinical challenge because they share many symptoms, are not easily distinguishable using imaging techniques and accurate biomarkers are not available. Searching for biomarkers with potential usefulness in the differential diagnosis of these tumors, we have determined serum metabolomic profiles in healthy controls and patients with dCCA, PDAC or benign pancreatic diseases (BPD). Ultra-high-performance liquid chromatography coupled to mass spectrometry (UHPLC-MS) analysis was performed in serum samples from dCCA (n = 34), PDAC (n = 38), BPD (n = 42) and control (n = 25) individuals, divided into discovery and validation cohorts. This approach permitted 484 metabolites to be determined, mainly lipids and amino acids. The analysis of the results led to the proposal of a logistic regression model able to discriminate patients with dCCA and PDAC (AUC value of 0.888) based on the combination of serum levels of nine metabolites (acylcarnitine AC(16:0), ceramide Cer(d18:1/24:0), phosphatidylcholines PC(20:0/0:0) and PC(O-16:0/20:3), lysophosphatidylcholines PC(20:0/0:0) and PC(0:0/20:0), lysophosphatidylethanolamine PE(P-18:2/0:0), and sphingomyelins SM(d18:2/22:0) and SM(d18:2/23:0)) and CA 19-9. In conclusion, we propose a novel specific panel of serum metabolites that can help in the differential diagnosis of dCCA and PDAC. Further validation of their clinical usefulness in prospective studies is required.


Introduction
Although distal cholangiocarcinoma (dCCA) and pancreatic ductal adenocarcinoma (PDAC) share a close anatomical location, they are considered distinct entities and require specific management strategies [1]. Whereas dCCA is an aggressive malignancy that arises in the biliary tract below the cystic duct and represents approximately 20% of CCAs, PDAC derives from the epithelium of pancreatic ducts and is the fourth cause of cancer-related deaths [2,3]. Although dCCA has a poor clinical outcome [4] due to its late diagnosis and resistance to chemotherapy [5], in general, the prognosis is worse in the case of PDAC [3].
Despite improvements in imaging techniques during recent years, the accurate diagnosis of adenocarcinomas located in the pancreas head area represents a clinical challenge in gastrointestinal oncology. Biopsy, either using cytologic brushing or fine-needle aspiration guided by endoscopic ultrasound, is mandatory to confirm the diagnosis. However, this has serious limitations: (i) repeat sampling is often required since the quality of the samples is not always sufficient to carry out the anatomopathological analysis, and (ii) the detection of malignant cells can confirm the diagnosis, but a negative result does not permit ruling it out [6]. To distinguish PDAC from benign pancreas diseases (BPD), such as chronic pancreatitis or pancreatic cysts, is also challenging, and the lack of accurate tumor biomarkers justifies that ≈ 5-10% of surgical removals of the head of the pancreas due to presumed malignancies are finally identified as benign lesions.
Several non-invasive biomarkers have been evaluated for the diagnosis of PDAC [7] and CCA [8,9], but none of them are being used in the clinical setting. Serum carbohydrate antigen  is the only FDA-approved biomarker for PDAC for both the follow-up of the therapeutic response [10] and for the detection of recurrence after surgery. Nevertheless, owing to its low sensitivity and specificity, CA 19-9 is far from being considered an optimal biomarker. Serum CA 19-9 is also used clinically to help in diagnosis and to monitor the response to therapy in biliary cancers, usually in combination with another unspecific marker, i.e., carcinoembryonic antigen (CEA). However, its accuracy is low and is not suitable for early detection. In addition, CA 19-9 can be elevated in patients with obstructive cholestasis, chronic liver and pancreatic diseases, and premalignant pancreatic lesions. Moreover, ≈ 10% of the Caucasian population with Lewis-negative phenotype do not express this biomarker [11].
Therefore, there is an urgent need to identify reliable minimally invasive biomarkers that can help in the differential diagnosis of dCCA and PDAC. An optimal biomarker would also be expected to contribute to the early detection of these cancers. The analysis of a large number of small metabolites in biological samples represents an interesting approach for identifying clinically relevant biomarkers for different diseases. In this context, the aim of the present study was to evaluate the usefulness of differences in serum metabolomic profiles between dCCA and PDAC, as well as between these severe malignancies and BPD and healthy individuals.

Characteristics of the Study Population
The demographic and clinical features of individuals from both cohorts are shown in Table 1. The age was higher in patients with dCCA and PDAC than in patients with BPD and healthy individuals and only the latter group included a lower percentage of males. Most tumors included in the dCCA group were in early stage, while there was a similar distribution of tumors in early and advanced stage in the PDAC group. Regarding liver biochemical parameters (Table 1), a significant increase in ALT, GGT, alkaline phosphatase and total bilirubin was found in patients with dCCA and PDAC. Except for total bilirubin, these parameters were also found to be elevated in BPD, although the magnitude of changes was lower than that observed in patients with tumors. A significant increase in serum levels of CA 19-9 was found in both dCCA and PDAC, with a marked interindividual variability. Moreover, although CA 19-9 levels were also elevated in some patients with BPD, both with pancreatic cysts and with chronic pancreatitis, these were significantly lower than those found in patients with cancer.
Any clustering of the different groups of samples according to the serum metabolome was evaluated using multivariate data analysis, unsupervised principal component analysis (PCA) and supervised orthogonal partial least-squares to latent structures discriminant analysis (OPLS-DA) approaches. As shown in Figure 1, no differences in serum metabolomic profiles were found between the hospitals of origin, discovery and validation cohorts, gender, and group of age or group of samples ( Figure 1A-E, respectively). A random distribution of patients with cysts and pancreatitis, both included in the BPD group, was found ( Figure 1F). The supervised OPLS-DA model showed a good predictive ability to discriminate patient groups from healthy individuals, since Q 2 X = 0.694 (Figure 2A), triglycerides and, to a lesser extent, oxidized fatty acids and bile acids (all of them increased) and sphingomyelins and glycerophosphatidylcholines (both decreased) being the main contributors to the differences found between patients and control individuals. However, the supervised OPLS-DA models to differentiate dCCA vs. BPD patients, PDAC vs. BPD and both types of tumors showed very low predictive ability ( Figure 2B-D, respectively), since Q 2 X values were low, especially in the comparisons of PDAC with BPD (Q 2 X = 0.163) and dCCA (Q 2 X close to 0).  [2] explain 19.8% and 13.1% of the total variance, respectively. Each dot represents one sample. The ellipse represents 95% confidence interval according to Hotelling's T 2 test.
The supervised OPLS-DA model showed a good predictive ability to discriminate patient groups from healthy individuals, since Q 2 X = 0.694 (Figure 2A), triglycerides and, to a lesser extent, oxidized fatty acids and bile acids (all of them increased) and sphingomyelins and glycerophosphatidylcholines (both decreased) being the main contributors to the differences found between patients and control individuals. However, the supervised OPLS-DA models to differentiate dCCA vs. BPD patients, PDAC vs. BPD and both types of tumors showed very low predictive ability ( Figure 2B-D, respectively), since Q 2 X values were low, especially in the comparisons of PDAC with BPD (Q 2 X = 0.163) and dCCA (Q 2 X close to 0).

Serum Metabolomic Profiles of Patients with dCCA, PDAC and BPD and Healthy Individuals
During the discovery phase we were able to determine 484 metabolites in serum samples, which was confirmed in the validation cohort. Changes in the levels of molecules belonging to the different families of analyzed metabolites (lipids, amino acids and amino acids derivatives) were found. Figure  3 depicts the heatmaps showing the fold-changes and the p-values generated from different twogroups comparisons carried out in the discovery and validation cohorts, and considering all samples together.

Serum Metabolomic Profiles of Patients with dCCA, PDAC and BPD and Healthy Individuals
During the discovery phase we were able to determine 484 metabolites in serum samples, which was confirmed in the validation cohort. Changes in the levels of molecules belonging to the different families of analyzed metabolites (lipids, amino acids and amino acids derivatives) were found. Figure 3 depicts the heatmaps showing the fold-changes and the p-values generated from different two-groups comparisons carried out in the discovery and validation cohorts, and considering all samples together.   Figure 4 shows the volcano plots generated for each two-groups comparison, and the number of metabolites significantly changed in each comparison considering the full cohort ( Figure 4G). When BDL was compared with control, altered serum concentrations of 268 metabolites (mainly phosphatidylcholines > triglycerides > sphingomyelins ≈ lysophosphatidylcholines) were found. The comparison of dCCA with control revealed altered serum levels of 236 metabolites (mainly triglycerides ≈ phosphatidylcholines > lysophosphatidylcholines > sphingomyelins).  Figure 4 shows the volcano plots generated for each two-groups comparison, and the number of metabolites significantly changed in each comparison considering the full cohort ( Figure 4G). When BDL was compared with control, altered serum concentrations of 268 metabolites (mainly phosphatidylcholines > triglycerides > sphingomyelins ≈ lysophosphatidylcholines) were found. The comparison of dCCA with control revealed altered serum levels of 236 metabolites (mainly triglycerides ≈ phosphatidylcholines > lysophosphatidylcholines > sphingomyelins).   The highest number of metabolites affected by changes in their serum levels (n = 280; mainly triglycerides > phosphatidylcholines > lysophosphatidylcholines) was found in the PDAC group.
Different serum levels of 111 metabolites were found when comparing dCCA with BPD (mainly phosphatidylcholines > lysophosphatidyletanolamines = sphingomyelins), whereas this number increased to 178 when comparing PDAC with BPD (mainly phosphatidylcholines > triglycerides). The number of serum metabolites altered when comparing dCCA vs. PDAC was 63 (mainly triglycerides > phosphatidyletanolamines > lysophosphatidyletanolamines), and most of them were higher in PDCA than in dCCA. The number of metabolites with a value of area under the receiver operating characteristic curve (AUC) ≥ 0.8 was 73 when comparing BPD vs. control, 63 when comparing dCCA vs. control and 72 when comparing PDAC vs. control.
An important number of metabolites were found altered in the serum of more than one group of patients, although the magnitude of changes was higher in patients with cancer. Table 2 shows the 10 metabolites with the best diagnostic capacity (best values of AUC, sensitivity and specificity) for each disease vs. control. Complete panels are presented in Table S1A-C. Although fewer alterations in the circulating metabolomic profiles were observed when the different diseases were cross compared, we found changes with interest in diagnosis. Among 50 metabolites with significant AUC values in the comparison of dCCA vs. BPD, 6 showed AUC values of ≥ 0.8 (Table 3), while 2 among 61 in the comparison PDAC vs. BPD reached these AUC values. In the comparison dCCA vs. PDAC, 9 metabolites showed significant AUC values, although all with AUC < 0.8. In the last comparison serum concentrations of the 9 metabolites were lower in dCCA than in PDAC. Table 3 shows the 9-10 metabolites with the best diagnostic capacity in each two-groups comparison, and the complete panels are presented in Table S1D,E. Table 3. Diagnostic capacity of the top 9-10 metabolites in each two-disease group comparison considering the whole cohort.

Discrimination between Patients with and without Tumors
In our study, with a cut-off fixed in 37 IU/mL, CA 19-9 showed a good diagnostic capacity to differentiate patients with tumors (dCCA+PDAC) from healthy individuals, with an AUC of 0.93 in both cohorts. However, as shown in Figure 5A, it was not so good in differentiating between dCCA+PDAC and patients without cancer (Control+BPD). AUC was 0.845, 0.820 and 0.828 in discovery, validation and the whole cohort, respectively ( Figure 5B).      Using this model, the probability of diagnosing patients with chronic pancreatitis or healthy subjects as individuals suffering from dCCA or PDAC is low. However, this risk is higher for patients with benign pancreatic cysts ( Figure 6A). AUC was 0.93 in discovery, 0.86 in validation and 0.89 considering the whole cohort. Sensitivity was 73.6% and specificity 83.6% considering the whole cohort. We have evaluated the relationship between the age and the diagnostic error rate of the model. Based on a stratification of the patients in quantiles, the diagnostic error rate was constant and around 20% (average 21%, ranging from the 17% to 29%) and was not associated with the patient's age.
In our study, CA 19-9 showed a sensitivity of 71% and a specificity of 83% to differentiate patients with tumors from individuals without tumors (Controls+BPD).

Discrimination between dCCA and PDAC
Since none of the individual circulating metabolites had a sufficient capability of distinguishing dCCA from PDAC (Table 3), our next goal was to obtain a predictive model for discriminating between both tumors. A logistic regression model was built with nine metabolites ( Figure S1 considering all the patients. The analysis of CA 19-9 showed a sensitivity of 77% and a specificity of 48% to differentiate patients with PDAC from those with dCCA ( Figure S2). Another logistic regression model was built with the nine metabolites plus CA 19-9 (Figure 7), which improved the sensitivity. However, the specificity slightly decreased in the full cohort and especially in the validation cohort. Thus, AUC was 0.888, sensitivity 71.4% and specificity 89.2 considering the whole cohort.

Discussion
The lack of non-invasive biomarkers for the early diagnosis of PDAC and dCCA contributes to the bad prognosis of these tumors [12]. The anatomical difficulty in accessing the tumors to obtain good quality biopsies for diagnostic purposes makes it necessary to identify minimally invasive biomarkers that could help, not only in the early detection of these tumors to enable more patients to benefit from surgical treatment, but also in the prognosis and follow-up of these patients during treatment. However, although important efforts have been made during recent years, none of the

Discussion
The lack of non-invasive biomarkers for the early diagnosis of PDAC and dCCA contributes to the bad prognosis of these tumors [12]. The anatomical difficulty in accessing the tumors to obtain good quality biopsies for diagnostic purposes makes it necessary to identify minimally invasive biomarkers that could help, not only in the early detection of these tumors to enable more patients to benefit from surgical treatment, but also in the prognosis and follow-up of these patients during treatment. However, although important efforts have been made during recent years, none of the identified markers have been validated and reached clinical practice. Despite their moderate clinical utility, only CA 19-9 and carcinoembryonic antigen (CEA) are currently used for PDAC and CCA diagnosis [13].
Omics technologies are providing valuable information to understand cancer biology. Metabolic reprogramming is one hallmark of tumor cells [14,15]; thus, the analysis of the metabolome (hundreds of small molecules or metabolites) in body fluids of patients with cancer can give an indirect reflection of the metabolic behavior of the tumors and could be used to identify potential biomarkers.
Several studies have been conducted to identify serum metabolomic profiles for the diagnosis of pancreatic or biliary cancers. Most of them included only patients with pancreatic cancer and healthy controls [16][17][18] or with biliary cancer and healthy individuals [19]. However, it is important to include clinically relevant controls since the metabolome can be affected by many factors, including gender, age, comorbidities, medication, life style, environment or circadian rhythms; in fact, important intra-day variations have been observed in serum levels of patients with advanced pancreatic cancer, which were further affected by cachexia [20].
The use of metabolomics to discriminate between different types of tumors and between tumors and benign diseases has been less explored. Combinations of metabolites discriminating malignant from benign pancreaticobiliary diseases and from healthy controls have been reported, although the number of cases was low and most of the patients with tumors were in an advanced stage, for which their usefulness in early diagnosis cannot be guaranteed [21]. More recently, a biomarker signature for the differential diagnosis between PDAC and chronic pancreatitis was reported, consisting of nine metabolites, five of them lipids (two sphingomyelins, sphinganine 1-phosphate, one phosphatidylcholine and one ceramide), and proline, histidine, pyruvate and isocitrate plus CA 19-9, with a negative predictive value of 99.9% in patients with chronic pancreatitis [22].
All these studies support the concept that the combination of several metabolite markers allows for a more accurate diagnosis. In this study, we have included patients with biopsy-proven tumors or cysts located in the head of the pancreas divided into two independent cohorts of PDAC, dCCA, BPD and controls. Although serum bile acids levels represented the most marked alteration in patients with cancer, this hypercholanemic condition occurs in different pathologies that are accompanied by cholestasis, in which compensatory mechanisms are developed to limit the accumulation and toxic effects of these compounds [23]. It has been demonstrated that obstructive jaundice impacts the performance of biomarkers for PDAC [24], and in our study, a certain degree of cholestasis was found in some patients with tumors, since serum bilirubin was elevated, and as a consequence, none of the bile acid species measured could be considered as a good biomarker.
In the present study we have identified a multimarker signature for the differential diagnosis of adenocarcinomas located in the pancreas including nine metabolites plus CA 19-9 with better performance than serum CA 19-9 alone and another panel of ten metabolites (seven lipids and three amino acids) with similar performance to serum CA 19-9 to discriminate tumors from BPD but which are especially useful for chronic pancreatitis. Since this disease is a risk factor for the development of pancreatic cancer [25], these biomarkers could be useful for early detection of tumor development, for monitoring patients during treatment and for avoiding unnecessary pancreatic surgery and its complications. Interestingly, some of the metabolites included in the signature proposed here belonged to the same families of compounds (amino acids, sphingomyelins and ceramides) of a previously described model [22]. Changes in serum levels of certain amino acids have been described in other tumors, such as liver [26,27] and breast [28] cancer. In addition, sphingomyelins and ceramides have been found altered in the serum of patients with liver [27] and ovarian [29] cancer. Alterations in sphingolipid metabolism have been associated with cell proliferation [30]. Our model of changes in ten metabolites was less accurate than CA 19-9 levels in distinguishing pancreatic cysts from tumors in the head of pancreas, although the low number of cases of cystic lesions in our cohort can be considered a limitation. In recent years, several studies have proposed circulating microRNA (miRNA) signatures for early detection of pancreatic cancer [31] or for the differential diagnosis of PDAC and chronic pancreatitis with good sensitivity and specificity [32], although none of them included a group of patients with pancreatic cysts. A recent study proposed a two-miRNA panel of downregulated miR-16 and upregulated miR-877 to differentiate patients with dCCA from benign disease (AUC = 0.90) and from PDAC (AUC = 0.88) [33]. Serum proteins have also been investigated. The analysis of cell migration-inducing hyaluronan binding protein (CEMIP) plus CA 19-9 improved the diagnostic value compared to CA 19-9 alone for the diagnosis of pancreatic cancer [34]; the study included a small but very heterogeneous group of patients with BPD in the control cohort, but the results must be validated.
In sum, in this study, using two independent cohorts of patients, we have identified a model consisting of 9 metabolites in serum with promising capability to differentiate both types of pancreatic head adenocarcinomas, with AUC = 0.854. Because accurate diagnosis of these tumors remains challenging, our results suggest that the analysis of multiple types of biomarkers could help in the early and differential diagnosis and in the follow-up of these aggressive tumors.

Study Population and Eligibility
Fasting serum samples from dCCA (n = 34), PDAC (n = 38), BPD (n = 42), and healthy subjects (n = 25) were obtained from two Spanish hospitals; University Hospital of Salamanca, National DNA Bank Carlos III, and Donostia University Hospital in San Sebastian. Samples were randomly divided in two cohorts, "discovery" and "validation", with equal proportional representation of individuals belonging to each pathology as well as to each origin of samples.
Inclusion criteria for patients with dCCA and PDAC were histopathologic confirmation of diagnosis by expert pathologists and serum obtained before any type of treatment. Exclusion criteria were other types of CCA or synchronous presence of another type of malignancy. The BPD group included 22 samples from patients with cysts and 20 from patients with chronic pancreatitis. Selected healthy individuals had no history of any type of malignancy and no clinical evidence of hepatopancreaticobiliary disease. Clinical and laboratory test values were collected from the patients' records. The research protocol was approved by the Ethics Committee for Clinical Research of Salamanca (July 18, 2018) and San Sebastian (October 16, 2019), and informed written consent for the samples to be used for biomedical research was obtained from each patient.

Metabolomic Analyses
Serum metabolic profiles were analyzed as previously described [35].
Briefly, two ultrahigh-performance liquid chromatography (UHPLC)-time of flight-MS based platforms analyzing methanol and chloroform/methanol serum extracts were combined with the amino acid measurement using an UHPLC-single quadrupole-MS based analysis. Identified ion features in the methanol extract platform included amino acids and its derivatives and lipids.
Metabolite extraction procedures, chromatographic separation conditions and mass spectrometric detection conditions have been previously described [35]. Metabolomics data were pre-processed using the TargetLynx application manager for MassLynx 4.1 (Waters Corp., Milford, MA, USA). Intraand inter-batch normalization was performed by inclusion of multiple internal standards and pool calibration response correction, following a previously described procedure [36]. Data quality was assessed by the inclusion of quality control samples, including repeated injections of these samples to evaluate the reproducibility of the analysis process [36].

Statistical Analysis
Data are shown as mean ± SD. Differences between groups were determined using the Student´s t-test or the Bonferroni method of multiple range test, as appropriate. Calculations were performed using the statistical software package R v.3.4.0 (R Development Core Team, 2017; http://cran.r-project.org).
Multivariate principal component analysis (PCA) [37] and orthogonal partial least squares discriminant analysis (OPLS-DA) [38] modeling were performed with the software SIMCA 14.1 (Umetrics, Malmo, Sweden). Model quality was assessed using R 2 and Q 2 values, which indicate the explained fraction of variance and the goodness of prediction, respectively. The Q 2 parameter was calculated by sevenfold cross validation.
To find statistical models to differentiate patients with tumors (dCCA or PDAC) and subjects without tumors (controls or BPD [chronic pancreatitis or pancreatic cysts]), as well as to differentiate each type of tumor, dCCA vs. PDAC, generalized linear models (GLM) were used and those selected were confirmed by leave-one-out cross validation (LOOCV). Box-Cox transformations were applied to the biomarker metabolite levels for correcting non-normally distributed data and used to calculate the classification algorithm. The diagnostic accuracy of the model to identify patients in each comparison was assessed using the AUC p < 0.05.

Conclusions
Based on the results obtained in the present study, we propose novel specific panels of serum metabolites that can help in the early and differential diagnosis of dCCA and PDAC. Further validation of their clinical usefulness in prospective studies including other relevant controls and in combination with clinical features is required.