Urinary 1 H-NMR Metabolic Signature in Subjects Undergoing Colonoscopy for Colon Cancer Diagnosis

Featured Application: Metabolomics can useful tool to support the diagnosis and the monitoring of colon cancer. Abstract: Metabolomics represents a promising non-invasive approach that can be applied to identify biochemical changes in colorectal cancer patients (CRC) and is potentially useful for diagnosis and follow-up. Despite the literature regarding metabolomics CRC-speciﬁc proﬁles, discrimination between metabolic changes speciﬁcally related to CRC and intra-individual variability is still a problem to be solved. This was a preliminary case-control study, in which 1 H-NMR spectroscopy combined with multivariate statistical analysis was used to proﬁle urine metabolites in subjects undergoing colonoscopy for colon cancer diagnosis. To reduce intra-individual variability, metabolic proﬁles were evaluated in participants’ urine samples, collected just before the colonoscopy and after a short-term dietary regimen required for the endoscopy procedure. Data obtained highlighted di ﬀ erent urinary metabolic proﬁles between CRC and una ﬀ ected subjects (C). The metabolites altered in the CRC urine (acetoacetate, creatine, creatinine, histamine, phenylacetylglycine, and tryptophan) signiﬁcantly correlated with colon cancer and discriminated with accuracy CRC patients from C patients (receiver operator characteristic (ROC) curve with an area under the curve (AUC) of 0.875; 95% CI: 0.667–1). These results conﬁrm that urinary metabolomic analysis can be a valid tool to improve CRC diagnosis, prognosis, and response to therapy, representing a noninvasive approach that could precede more invasive tests.


Introduction
Colorectal cancer (CRC) diagnosis is mainly based on invasive, costly, and time-consuming methods (e.g., endoscopic, histological, and radiographic techniques), as noninvasive methods, such as stool-based tests (e.g., fecal occult blood test, FOBT) or the carcinoembryonic antigen (CEA) test, lack sensitivity and specificity [1][2][3]. Indeed, the diagnostic power of the noninvasive techniques mentioned above, such as FOBT, is higher in the presence of advanced tumors, while survival and

Patients and Sample Collection
The study was conducted on patients admitted to the Colorectal Surgical Unit, University Hospital of Cagliari, with a suspected diagnosis of CRC from fecal occult blood presence or for familiarity risk. Patients were subjected to colonoscopy. A few days before the colonoscopy procedure they started a low-fiber diet. The day before the colonoscopy they did not eat solid food, but only liquids. The night before the colonoscopy, patients took strong laxatives to clear the digestive tract. On the day of the colonoscopy, they could drink water up to 2 h before the procedure. Based on the outcome of this examination, the population was divided into two groups: CRC group, comprising 6 cases of rectal, 7 of colon, and 1 of caecum adenocarcinomas, and unaffected subjects (C) group, including 10 individuals without colon pathologies. Participant characteristics are indicated in Table 1. Patient status was confirmed by subsequent histological analysis. This study was approved by the Ethical Committee of the University Hospital of Cagliari, and all participants gave informed consent.

Urine Samples Preparation
Urine samples were collected just before colonoscopies from participants in the study. An aliquot of 800 µL of urine was transferred into a tube with 8 µL of a 1% aqueous solution of NaN 3 , to inhibit bacteria growth, and was stored at −80 • C. Afterwards, to remove solid particles, the sample was centrifuged at 12,000× g for 10 min at 4 • C. The supernatant (630 µL) was mixed with 70 µL of potassium phosphate buffer in D 2 O (1.5 M, pH 7.4) containing sodium 3-trimethylsilyl-propionate-2,2,3,3,-d4 (TSP) as an internal standard (98 atom% D, Sigma-Aldrich, Milan, Italy). An aliquot of 650 µL was transferred to NMR glass tubes for 1 H-NMR analysis [13].

1 H-NMR Spectroscopic Analysis
1 H-NMR measurements of urine samples were carried out at 298 K using a Bruker DRX 500 spectrometer operating at 500 MHz (Bruker Biospin, Rheinstetten, Germany). 1 H-NMR spectra were obtained using a 1D Nuclear Overhauser Enhancement Spectroscopy (NOESY) pulse sequence to suppress water signals (relaxation delay of 3 s). For each sample, 128 free induction decays (FIDs) were collected into 64 K data points with a spectral width of 6000 Hz with a 90 • pulse, an acquisition time of 2 s, and a mixing time of 150 ms. The FIDs were weighted by an exponential function with a 0.5 Hz line-broadening factor before Fourier transformation.

NMR Data Preprocessing and Multivariate Statistical Analysis
The phase and baseline of NMR spectra were corrected using ACDlab Processor Academic Edition (Advanced Chemistry Development, 12.01, 2010, Toronto, ON, Canada). The spectral region comprising the signal of residual water and urea (4.5-6.0 ppm) was removed. The final spectral regions considered were between 0.5 and 4.5 ppm as well as 6.0 and 9.5 ppm. The ACD Labs intelligent bucketing method was used for spectral integration [14]. A 0.01 ppm bucket width was defined with an allowed 50% looseness. The intelligent bucket method finds local minima in spectra and adjusts the buckets accordingly. In this way, a peak is integrated into one bucket. The area of bucketed regions was normalized using Median Fold Change Normalization [15], largely preferred to total sum normalization when studying urine samples, and a matrix was generated. The resultant data sets were then imported into SIMCA software (Version 15.0, Sartorius Stedim Biotech, Umea, Sweden) for multivariate statistical analysis. The data sets were then Pareto scaled. Pareto scaling, where each variable is divided by the square root of the standard deviation, gives greater weight to NMR variables with low intensity, but it is not as extreme as the UV (Unit Variance) scaling method. Principal component analysis (PCA) and Orthogonal Partial Least-Squares Discriminant Analysis (OPLS-DA) were used for multivariate statistical analyses of NMR data. PCA was performed to identify any possible relation (trends, outliers) between the samples. As far as the outliers are concerned, Hotelling's T2 and DModX tests were applied. OPLS-DA analysis was used to reduce model complexity and to better highlight sample discrimination. The goodness of the model was evaluated using a 7-fold cross-validation and a "permutation test" (400 permutations). The permutation test was calculated by randomizing the Y-matrix (classification components) while the X-matrix (peak intensity in NMR spectra) was kept constant. The permutation plot shows the correlation coefficient between the original and the permuted y-variables on the x-axis, versus the cumulative R2 and Q2 on the y-axis, and draws the regression line. Q2Y intercept values are a measure of the overfit, and values <0.05 are indicative of a valid model. To highlight potential metabolites that mainly contributed to group separation, an S-plot for the OPLS-DA model was created. The S-plot reveals the contribution of each variable to the predictive component, matching the covariance p and the correlation p(corr) obtained from the OPLS-DA model. The axes plotted in the S-plot from the predictive component are p1 versus p(corr)1, representing the magnitude (modelled covariance) and reliability (modelled correlation) respectively. The variables characterized by high magnitude and reliability values have an important role in the separation of different groups of samples. In the S-plot both magnitude (intensity) and reliability are plotted. The statistical significance of the difference in metabolite concentrations, quantified using Chenomx NMR suite 7.1 (Chenomx Inc., Edmonton, AB, Canada), was calculated using an unpaired Welch t-test with a 95% confidence interval. Chenomx NMR Suite software is useful for identifying and quantifying the metabolites in NMR spectra [16]. It is equipped with reference libraries containing numerous compound models that are identical to the spectra of pure compounds obtained under similar experimental conditions. Basically, a Lorentzian peak shape model of each reference compound is created from database information and is overlapped with the actual spectrum. The linear combination of modeled metabolites gives rise to the total spectral fit, which can be assessed with a summation line [16]. The metabolites with both VIP > 1 (Variable Importance for the Projection) and a p-value ≤ 0.05 were considered statistically significant. The Metaboanalyst program (https://www.metaboanalyst.ca/) [17] was used to generate receiver operator characteristic curves (ROC), calculate sensitivity, specificity, and the area under the ROC curve (AUC), and Matlab (http://it.mathworks.com/) was used to generate the box-and-whisker plots. The Random Forest algorithm was used to construct the ROC. It identifies important features through repeated random sub-sampling cross-validation (CV). In each CV, two-thirds (2/3) of the samples are used to evaluate the importance of each feature based on decreases in accuracy. The top features are used to build classification/regression models that are validated on the one-third (1/3) of samples that were left out of the original model.

Serum Carcinoembryonic Antigen Level Determination
Blood samples for CEA analysis were collected just before colonoscopy, centrifuged, and stored at −80 • C before use. The cut-off for CEA 1 was >2.5 ng/mL [12].

1 H-NMR Spectra of Urine Samples
NMR spectral analysis of urine samples revealed a distinct metabolic signature between C and CRC subjects ( Figure 1). The resonance of spectra was assigned to different metabolites based on data published in the literature [18] and by using the library from the Chenomx NMR suite. Representative spectra of C and CRC are shown in Figure 1A,B. Major peak assignments of urine samples are reported in Figure 1, while the chemical shifts of metabolites are summarized in Table S1.

Multivariate Statistical Analysis of NMR Data
PCA was initially applied to the complete data set to highlight possible metabolic differences among the urine samples of control and CRC subjects and to identify potential outliers ( Figure S1 in Supporting Information). Based on Hotelling's T2 and DModX tests, two samples were considered outliers and were removed from the analysis ( Figure S2 in Supporting Information). To optimize the

Multivariate Statistical Analysis of NMR Data
PCA was initially applied to the complete data set to highlight possible metabolic differences among the urine samples of control and CRC subjects and to identify potential outliers ( Figure S1 in Supporting Information). Based on Hotelling's T2 and DModX tests, two samples were considered outliers and were removed from the analysis ( Figure S2 in Supporting Information). To optimize the separation between the CRC and C urine samples, the supervised OPLS-DA model was applied. As displayed in Figure 2, the OPLS-DA scores plot showed clear separation between the two groups of samples. To test the validity of the model, a permutation test on the PLS-DA model was performed. Results showed that the model was statistically valid, with a Q2 intercept value of −0.192. To identify the metabolites that mainly contributed to group separation, an S-plot was constructed ( Figure 3). As shown in Figure 3, the variables selected in the S-plot are indicated with a dotted rectangle and represent the metabolites responsible for differentiation in the OPLS-DA scores plot. Cutoff values for the covariance of |p| ≥ 0.1 and the correlation |p(corr)| ≥ 0.2 were used. In the S-plot, the control samples were characterized, based on the discriminant regions, by high creatine, sn-glycero-3-phosphocholine, phenylacetylglycine, and proline ( Figure 3 square A), whereas an increase in citrate, creatinine, acetoacetate, 3-hydroxybutyrate, 3-aminoisobutyrate, tyrosine, tryptophan, histamine, methylhistidine, and fucose characterized the CRC samples ( Figure 3, square B). The relative concentrations of metabolites highlighted in the S-plot were verified with Chenomx NMR Suite 7.1 and were subjected to the Welch t-test to identify significant variations in concentration in the two groups. Significantly discriminant metabolites were characterized by VIP > 1 and p ≤ 0.05. After this analysis, 14 metabolites exhibited VIP > 1, but only acetoacetate, creatine, creatinine, histamine, phenylacetylglycine, and tryptophan showed significant variation (with p ≤ 0.05) ( Table 2). The relative concentrations, calculated by normalization of the molar concentration of each metabolite to the total molar concentration of all 14 metabolites for each sample in the two groups, were compared using box-and-whisker plots. As shown in Figure 4, the data obtained demonstrated that the CRC group showed increased relative levels of acetoacetate, creatinine, and histamine, whereas creatine, phenylacetylglycine, and tryptophan levels were lower compared to the C group. A ROC curve was constructed using only the metabolites with a significant statistical variation, and the area under the curve of the ROC analysis was found to be 0.879 (95% CI: 0.667-1), indicating high predictive accuracy of the model ( Figure 5). Significantly, only 5/12 CRC patients had increased CEA levels (CEA > 2.5 ng/mL), showing 42% sensitivity.
Appl. Sci. 2020, 10, x FOR PEER REVIEW 13 of 16 phenylacetylglycine, and tryptophan levels were lower compared to the C group. A ROC curve was constructed using only the metabolites with a significant statistical variation, and the area under the curve of the ROC analysis was found to be 0.879 (95% CI: 0.667-1), indicating high predictive accuracy of the model ( Figure 5). Significantly, only 5/12 CRC patients had increased CEA levels (CEA > 2.5 ng/mL), showing 42% sensitivity.
In the present study, 1 H-NMR spectroscopy coupled with pattern recognition was used to profile urine metabolites of subjects with (CRC) or without (C) CRC. Significantly, metabolic perturbations were evaluated between the urine samples of CRC patients and individuals undergoing colonoscopy for positive FOBT or familial risk of CRC, to whom a similar low-fiber, controlled diet had been administered for few days before they were subjected to the examination. Urine samples were collected just before the colonoscopy. Therefore, both CRC and C groups were evaluated in the same short-term dietary conditions, consisting of a low-fiber diet for a few days. Since diet represents an important factor that contributes to inter-individual variability, our sample collection approach reduced variability related to a different nutritional status. To date, our study, despite the small number of subjects analyzed, is the first to investigate the metabolomics profiles of potential CRC
In the present study, 1 H-NMR spectroscopy coupled with pattern recognition was used to profile urine metabolites of subjects with (CRC) or without (C) CRC. Significantly, metabolic perturbations were evaluated between the urine samples of CRC patients and individuals undergoing colonoscopy for positive FOBT or familial risk of CRC, to whom a similar low-fiber, controlled diet had been administered for few days before they were subjected to the examination. Urine samples were collected just before the colonoscopy. Therefore, both CRC and C groups were evaluated in the same short-term dietary conditions, consisting of a low-fiber diet for a few days. Since diet represents an important factor that contributes to inter-individual variability, our sample collection approach reduced variability related to a different nutritional status. To date, our study, despite the small number of subjects analyzed, is the first to investigate the metabolomics profiles of potential CRC patients under a similarly controlled short-term diet. Based on identification of metabolites with significant statistical variations between CRC and C groups, a ROC curve was constructed to assess the sensitivity and specificity of the biomarker candidates for early detection of CRC. The AUC of the ROC analysis denoted high predictive accuracy of the model. Furthermore, CRC patients showed a urine metabolic profile characterized by a higher sensitivity compared to CEA serum levels. The sensitivity of CEA is well known to be low, but it is often used as a serum biomarker in CRC for integrating diagnosis and monitoring the follow up [20]. Our data confirmed CEA detection limitations, including its relatively low sensitivity and specificity [2], and suggested the possibility of integrating these data with the metabolomics approach. Our study indicates a better sensitivity and specificity of the urine profile test in CRC samples compared to those analyzed with 1 H-NMR spectroscopy by Wang et al. [21] in adenomatous polyp samples. The improved sensitivity and specificity of urine profile tests in CRC samples may be due to a more advanced stage of colon disease in our samples in the CRC group. In our study, the metabolites identified were the end products of several metabolic pathways relating to lipid and amino acid metabolism, which are known to be perturbed during tumor cell proliferation [10]. We found that the levels of acetoacetate, creatinine, and histamine were increased, while those of creatine, phenylacetylglycine, and tryptophan decreased in the urine of the subjects affected by CRC. The increased concentrations of acetoacetate observed in CRC urine may be related to lipid metabolic changes associated with tumor development. The increased content of acetoacetate in our CRC samples could be related to increased β-oxidation due to the strong energy demands accompanying tumor growth. Indeed, in tumor cells, acetyl-CoA is not converted into citrate by the tricarboxylic acid cycle but is processed through an alternative pathway to form ketone bodies, which represent a very efficient fuel source, preferable even to glucose [11,22].
Cancer cells use some amino acids as an energy source [23]. In our study, we found that creatinine levels increased while those of creatine and tryptophan decreased in the urine of CRC patients. Urinary creatinine increase is normally connected to muscle catabolism, breakdown of proteins, or a combination of the two processes [24]. The increased creatinine levels observed in our tumor samples may be due to the non-enzymatic and irreversible conversion of creatine [25], resulting in significantly decreased levels in CRC urine samples. On the other hand, we found decreased tryptophan levels in CRC urine samples. Altered levels of tryptophan in urine have been observed in other types of tumors, such as breast and bladder cancer [26]. Tryptophan is an essential amino acid metabolized in tumor microenvironments, immune-privileged, or inflammation sites. Degradation of tryptophan by the enzyme indoleamine-2,3-dioxygenase (IDO) is considered to be an immune defense mechanism, which inhibits the growth of intracellular bacteria, viruses, parasites [27], and malignant tumor cells [28]. Based on different pieces of evidence [29,30], the catabolic products of tryptophan in cancer are considered important microenvironmental factors that suppress antitumor immune responses. Another metabolite that was increased in our CRC urine samples was histamine. It has been reported that histamine can regulate the proliferation and angiogenesis of cancer cells [31][32][33]. Finally, another metabolite decreased in our CRC urine samples was phenylacetylglycine, which is considered a minor Appl. Sci. 2020, 10, 5401 9 of 11 metabolite of fatty acids. Increased excretion of phenylacetylglycine has been observed in gastric cancer patients, in which its levels correlated with the cancer T stage [6], and in rats treated with alkylating agents that produced precancerous colorectal lesions possibly related to gut microflora metabolism dysbiosis [34]. Significantly, even if the low number of subjects investigated does represent a limit of the work, a recent study by Wang et al. [11] analyzing the metabolomics profiles of CRC versus healthy patients reported similar results for acetacetate and phenylalanine, a precursor of phenylacetylglycine. In addition, a recent study by Deng et al. [35] identified four metabolites as robust CRC urine biomarkers: proline, diacetylspermine, kynurenine, and glucose. Because kynurenine is a tryptophan metabolite, its increased levels correlate well with the decrease of tryptophan found in our CRC samples.
In conclusion, metabolomics analysis showed a different urinary metabolic profile between CRC patients and healthy controls undergoing colonoscopy. Altered levels of some metabolites resulted in a significant correlation with colon cancer and were able to distinguish with accuracy CRC patients from C. Both our results and data from literature confirm that urinary metabolic profiling is an effective tool for identifying CRC, and it may be useful in improving diagnosis, prognosis, and response to therapy, representing a noninvasive diagnostic support to be used before other more invasive tests. Though a limit of the study is the number of subjects investigated, a strength of the study is the similar nutritional status of all the subjects undergoing colonoscopy. Our results suggest that for CRC identification, the metabolomics analysis does not need to be associated with colonoscopy considering that similar discriminant results can be observed without it. The next important step is the validation of these data in an independent larger group of patients to transfer this information into the clinic.