Screening for Early Gastric Cancer Using a Noninvasive Urine Metabolomics Approach

Simple Summary There are currently no effective specific biomarkers for the screening of early gastric cancer. Recently, metabolomics has been used to profile small endogenous metabolites, demonstrating significant potential in the diagnosis/screening of cancer, owing to its ability to conduct a noninvasive sample analysis. Here, we performed a urine metabolomics analysis in the context of an early diagnosis of gastric cancer. This approach showed very high diagnostic sensitivity and specificity and performed significantly better than the analysis of serum tumor markers modalities. An additional genomic data analysis revealed the up-regulation of several genes in gastric cancer. This metabolomics-based early diagnosis approach may have the potential for mass screening an average-risk population and may facilitate endoscopic examination through risk stratification. Abstract The early detection of gastric cancer (GC) could decrease its incidence and mortality. However, there are currently no accurate noninvasive markers for GC screening. Therefore, we developed a noninvasive diagnostic approach, employing urine nuclear magnetic resonance (NMR) metabolomics, to discover putative metabolic markers associated with GC. Changes in urine metabolite levels during oncogenesis were evaluated using samples from 103 patients with GC and 100 age- and sex-matched healthy controls. Approximately 70% of the patients with GC (n = 69) had stage I GC, with the majority (n = 56) having intramucosal cancer. A multivariate statistical analysis of the urine NMR data well discriminated between the patient and control groups and revealed nine metabolites, including alanine, citrate, creatine, creatinine, glycerol, hippurate, phenylalanine, taurine, and 3-hydroxybutyrate, that contributed to the difference. A diagnostic performance test with a separate validation set exhibited a sensitivity and specificity of more than 90%, even with the intramucosal cancer samples only. In conclusion, the NMR-based urine metabolomics approach may have potential as a convenient screening method for the early detection of GC and may facilitate consequent endoscopic examination through risk stratification.


Introduction
Gastric cancer (GC) is the sixth most common type of cancer and the second leading cause of cancer-related mortality worldwide [1]. The overall survival and prognosis greatly depend on the disease stage, and the mortality from GC is mainly due to late presentation [2,3]. Therefore, early detection is critically important for reducing GC morbidity and mortality [4]. Further, early diagnosis is highly associated with a good prognosis [5][6][7]. Among screening methods for the early detection of GC, endoscopy is the most common modality [8]. However, endoscopy may be accompanied by complications, requires adequately qualified facilities, and is time-consuming [9][10][11][12][13][14].
The development of GC involves multiple genes and other factors, and tumors consist of mixed tissues and display various degrees of differentiation. There are currently no effective specific biomarkers for GC. Serum tumor markers are not satisfactory, mainly because of their poor diagnostic sensitivities. The sensitivities of Cancer embryonic antigen (CEA), CA 19-9, and CA 72-4 are <20% for an early stage and 20-50% for advanced stages of cancer [15][16][17][18]. Several studies have attempted to improve the diagnostic value of these markers by combining more than two markers or using adjusted cutoff values [19][20][21][22]. However, these markers remain unsatisfactory for GC diagnosis, especially at early stages. Currently, several biomarkers have been proposed for GC, including GLS/GGCT protein coexpression levels in tissue samples and circulating miR21 in serum samples [23,24]. Although these invasive methods have shown very satisfactory results with high sensitivities and specificities, it is still necessary to develop a reliable screening tool for the early detection of GC that is sensitive, specific, and easy to use; moreover, a mostly noninvasive approach is highly desired.
Metabolomics is used to profile small endogenous metabolites and has, in particular, demonstrated a significant potential in the diagnosis or screening of cancer, owing to its ability to noninvasively analyze samples [25][26][27][28][29]. Urine is arguably the most suitable sample because it is easy to obtain and can reflect the systemic metabolic status. Recently, urine metabolomics has been applied to the diagnosis of GC, with promising results [30][31][32][33][34]. A capillary electrophoresis-mass spectrometry (CE-MS)-based feasibility study by Chen et al. [30] used moving reaction boundary (MRB)-CE-MS for a better sensitivity and stability with a small number of samples. A nuclear magnetic resonance (NMR)-based metabolomics study [31] for early-stage GC diagnosis has also been reported; however, the study had multiple purposes, including the evaluation of curative surgery. Although the data showed great discrimination between groups based on urine sample analysis, the majority of the patient population had later than stage II GC. In other studies, the numbers of urine samples from patients with cancer and healthy controls were small (≤50), which could have limited the reliability of the studies. In addition, these studies included very limited numbers of cases of early GC, which is critical for GC screening [32][33][34]. Therefore, the actual diagnostic sensitivity and specificity for early GC detection by urine metabolomics have not been properly assessed.
In this study, we performed a urine metabolomics analysis for GC diagnosis, with a particular focus on early GC. The diagnostic performance of the metabolomics approach was evaluated with a rigorous cross-validation using a separate validation set. A relationship between metabolic contributors and the gene expression profile in cancer tissues was also studied.

Subject Characteristics
The clinical information of the patients with GC and healthy subjects included in this study is detailed in Table 1. There were no significant differences in the demographic and laboratory findings between the two groups. The patients with GC were classified according to TNM (TUMOR-NODE-METASTASIS) staging as follows: stage I, 69 patients; stage II, 10 patients; stage III, 15 patients; stage IV, 9 patients. Thus, this study population included many more patients with stage I, particularly IA, than later-stage patients. There were 55 cases with differentiated and 48 cases with undifferentiated cancer. Lymph node metastases were found in 22 (21.4%) cases. The proportions of abnormal conventional serum tumor markers are also shown in Table 1.

Study Design
To evaluate the performance of the diagnostic model, validation was carried out with a separate validation set, as previously described [35]. Briefly, we randomly selected one-third of the samples from each group (within the cohorts) as the validation set (34 of 103 for the GC group and 33 of 100 for the healthy control group). The remaining samples (training set) were used to develop the diagnostic model. Similarly, in the analysis of the diagnostic performance of the three subgroups-stage IA cohort, stage I (IA+IB) cohort, and stage I+II cohort-one-third and two-thirds of the samples from each cohort were selected as the validation and training sets, respectively. These datasets from all cohorts were balanced for age, sex, Helicobacter infection, and other clinical characteristics (Table 1).

Discrimination between Urine Samples from Healthy Subjects and Patients with GC Using NMR Spectra
All urine metabolic profiles, from both patients with GC and healthy controls, were obtained using NMR spectroscopy. Representative spectra from both groups were similar, but there were differences in specific regions, indicating subtle metabolic differences ( Figure 1A). For a more holistic analysis of the NMR data for a training set, we performed a multivariate statistical analysis and established an orthogonal projection for a latent structure-discriminant analysis (OPLS-DA) model. The OPLS-DA approach is a method for the classification of groups with confounding factors, such as multicollinear and noisy variables [36]. The discrimination model was built with one predictive and two orthogonal components, and it exhibited an R 2 (Y) (overall goodness of fit) of 86.5% and Q 2 (Y) (overall cross-validation coefficient) of 72.8% ( Figure 1B). Most of the samples in the prediction set were clustered into the corresponding groups, and only a few samples overlapped across the groups. In addition, a partial least squares-discriminant analysis (PLS-DA) score plot showed that there were no significant differences between the healthy subjects and GC patients at any stage based on Helicobacter pylori infection (Supplementary Materials Figure S1).

Analysis of Contributing Metabolites Using Statistical Total Correlation Spectroscopy (S-TOCSY)
The identities of the metabolites that contributed to the difference between the cancer and healthy control groups were determined by assigning NMR peaks and performing a statistical analysis (Table 2, Figure S2).
All metabolites with statistically meaningful differences were identified by matching their peak characteristics with those in databases and are shown in Figure 2A and Table S1. Of the identified metabolites, alanine, taurine, phenylalanine, and creatine have been previously described as urinary metabolomics markers for GC [31][32][33]37]. We observed low creatine and high creatinine levels in the urine from the normal subjects and, conversely, high creatine and low creatinine levels in the urine from the patients with GC ( Figure 2B). The levels of these markers varied rather widely, with those of citrate showing the widest variation between the healthy control and cancer groups ( Figure 2B). In addition, the fold increases were not very large, and thus it was difficult to find a single representative marker metabolite. As the (P corr) p-values of each signal in the S-TOCSY plot were also rather small (maximum value ≤0.7), multiple metabolites, rather than one or two major metabolites, seemed to intricately contribute to the group separation. Additionally, we performed a receiver operating characteristic (ROC) curve analysis, and the results showed a broad range of areas under the curves (AUCs), from 0.632 (citrate) to 0.936 (glycerol), with varying sensitivities and specificities ( Figure 2C). The results of the ROC analysis are summarized in Table 2. represents up-regulation and represents down-regulation.

Analysis of Contributing Metabolites Using Statistical Total Correlation Spectroscopy (S-TOCSY)
The identities of the metabolites that contributed to the difference between the cancer and healthy control groups were determined by assigning NMR peaks and performing a statistical analysis (Table 2, Figure S2).
All metabolites with statistically meaningful differences were identified by matching their peak characteristics with those in databases and are shown in Figure 2A and Table S1. Of the identified metabolites, alanine, taurine, phenylalanine, and creatine have been previously described as urinary metabolomics markers for GC [31][32][33]37]. We observed low creatine and high creatinine levels in the urine from the normal subjects and, conversely, high creatine and low creatinine levels in the urine from the patients with GC ( Figure 2B). The levels of these markers varied rather widely, with those of citrate showing the widest variation between the healthy control and cancer groups ( Figure 2B). In addition, the fold increases were not very large, and thus it was difficult to find a single representative marker metabolite. As the (P corr) p-values of each signal in the S-TOCSY plot were also rather small (maximum value ≤0.7), multiple metabolites, rather than one or two major metabolites, seemed to intricately contribute to the group separation. Additionally, we performed a receiver operating characteristic (ROC) curve analysis, and the results showed a broad range of areas under the curves (AUCs), from 0.632 (citrate) to 0.936 (glycerol), with varying sensitivities and specificities ( Figure 2C). The results of the ROC analysis are summarized in Table 2.

Diagnostic Performance: Validation of the Prediction Model
We developed a prediction model that discriminated between cancer and healthy control samples using a training set. Therefore, we tested whether our model could predict the cancer status of unknown samples from the validation set, which were set aside during model building.
NMR data of the validation set were obtained using the same experimental parameters as those used for the training set, and the data were fitted into the prediction model to obtain the cancer status using an a priori set cutoff value of 0.5 for the predicted dependent variable ( Figure 3A). This validation test correctly predicted the respective status of 31 of 33 healthy control samples and 32 of 34 cancer samples, thus yielding a specificity of 93.9% and a sensitivity of 94.1% for the diagnosis of GC. Moreover, the sensitivity and specificity of CEA, CA19-9, and CA72-4 were determined (Table 3). Compared with the serum tumor markers for the same patients, the metabolomics approach performed significantly better, especially in terms of sensitivity.
For early GC screening, it is desirable to have a diagnostic model that can perform well without the inclusion of later-stage samples. Therefore, we carried out a diagnostic performance test as above after excluding later-stage samples ( Figure S3 and Table 3). The metabolomics approach showed a robust performance even with the earliest, stage IA samples (R 2 > 0.508 and Q 2 > 0.676) and exhibited a specificity of 97% (32 correct predictions for 33 healthy control samples) and a sensitivity of 94.7% (18 correct predictions for 19 cancer samples) ( Figure 3B).

Relationship between Urine Metabolites and Gene Expression in Cancer Tissues
After confirming the utility of the proposed model in predicting early GC, we wanted to determine whether there was a relationship between the metabolic contributors and gene expression profiles in GC. Although one or two significant marker metabolites did not account for a large part of the metabolic alterations between the healthy control and cancer groups, a metabolic-genetic relationship might reveal a weak but significant correlation. To this end, we analyzed microarray data for patients with GC (n = 103) and normal controls (n = 29) and examined the expression levels of genes known to metabolize the identified contributing metabolites [38]. We applied the bioinformatics network visualization tool MetScape to perform a pathway-based network analysis [39]. The software allows the visualization and interpretation of metabolomics and gene expression datasets in a human metabolic network context. Using the compound-reaction-enzyme-gene analysis function of MetScape, we could obtain enzymes potentially related to the metabolites identified by our metabolomics analysis. In brief, we conducted a pathway analysis based on our metabolomics results to elucidate potentially related genes ( Figure S4A). Subsequently, we analyzed the expression levels of the genes encoding these enzymes using the Chen's microarray dataset and found significant changes in the expression of some genes (Figures S5 and  S6; Table S3). We performed a volcano plot analysis for selecting meaningful genes based on a fold change of 1.5 and a false discovery rate (FDR) of 0.005. As a result, we selected five genes including ACLY, ACO2, BAAT, CKMT1B, and GGTL4 ( Figure S4B). A joint pathway analysis using the identified metabolites and correlated genes was also performed; overall, the result supports the abovementioned findings ( Figure S4C).

Discussion
Metabolomics, which is used to assess the overall metabolic profiles of biological samples, may help establish the missing link between gene/protein expression profiles and final cellular phenotypes in normal and diseased states. Metabolomics urine analysis is also attractive for the routine monitoring of cancer because urine contains high concentrations of many water-soluble metabolites present in plasma and sampling is noninvasive. Several studies have shown that urine metabolomics analysis provides a potential diagnostic tool for the early detection of cancer [27,[40][41][42].
To date, several studies have explored the utility of metabolomics in the diagnosis of GC. Early reports mostly used tissue samples obtained from surgery or endoscopic biopsies [37,43,44], aiming at directly detecting metabolic alterations in cancer tissues. However, studies that are more recent have obtained comparable results using more accessible samples such as blood or urine [31,34,45]. Important drawbacks of these studies were small numbers of cases and a lack of rigorous cross-validation to assess the actual diagnostic performance of the metabolomics approach. The current study has thus far employed the largest number of cases. In addition, our validation set corresponded to one-third of all enrolled cases and was larger than those used in most of the previous studies. Therefore, the high diagnostic performance of our approach should be considered more reliable than that of previous studies.
Clinically, another equally important aspect of our study was the inclusion of a large number of patients with stage I GC. Among the enrolled cases with cancer, more than half had stage I GC, including 56 cases with intramucosal cancer, which can be treated by endoscopic resection. In addition, we achieved a comparably high diagnostic performance using only intramucosal lesions. In comparison, a previous study that reported the diagnostic performance of metabolomics included no stage IA cases in the training set, although it included 13 stage IA cases in the validation set [31]. It should be also noted that conventional tumor markers performed very poorly, with a low sensitivity. The current study is the first noninvasive metabolomics diagnosis study that focused on early GC.
Despite the high diagnostic performance, our approach could not establish a firm mechanistic link between metabolite changes and GC, nor could it identify a single marker metabolite that explained the metabolic difference between healthy controls and cases with GC. Nevertheless, this analysis can be performed quickly and affordably, provided a diagnostic facility is available, at a low cost and without the need for medically trained personnel. Therefore, the urine metabolomics approach should be suitable for early GC screening, rather than the confirmation of GC, which should be performed by expert pathologists. Combined with the convenience and noninvasive nature of urine analysis, this method should be applicable to the general population.
Although we found several metabolites that contributed to the metabolic difference between patients with cancer and healthy controls, the correlations of these metabolites with differences between the groups were rather small. Interestingly, we found that the creatine and creatinine levels showed opposite trends in the normal controls and in patients with GC. Up to 94% of creatine is found in muscle tissues, and it is nonenzymatically converted to creatinine and excreted into urine through the kidney [46]. A significant correlation has been reported between creatine metabolism and muscle mass [47], which is helpful in interpreting the inverse relationship between the measured concentrations of these two metabolites in our study. Compared with that in normal controls, muscle mass is likely to be relatively deteriorated in patients with GC, which may lead to abnormal creatine metabolism. Based on these effects, the high levels of creatinine were interpreted to be due to the conversion of creatine to creatinine in the normal controls, while relatively low levels of creatinine were measured in the patients with GC. Creatine conversion to creatinine is a nonenzymatic reaction, which is potentially due to the failure to find specific enzymes associated with this reaction.
Unfortunately, with insufficient metabolic data, a direct mechanistic investigation of GC was not possible in the current study. However, the diagnostic performance was evaluated using all the metabolite signals as a whole and was found to be much higher than that of conventional markers or a radiological approach. In fact, one advantage of the metabolomics approach is that no identification of a single significant marker is necessary for disease diagnosis, as noted previously [48]. In addition, it has been noted that the measured metabolite profile is a reflection of the convergence of tumor, microenvironment, and global metabolic alterations [49]. Therefore, finding one or two significant metabolites whose levels can account for a large part of metabolic differences between patients with cancer and normal cases seems to be very difficult, if possible. Actually, there are noticeable variations in the levels of contributing metabolites. These variations may reflect the well-known heterogeneity of solid tumors. In a large-scale sequencing analysis of major human cancers, it was estimated that 3000 to 10,000 mutations could be found in one of the cancer genomes [50]. This heterogeneity may account for the difficulties in finding a single significant metabolite marker.
Citrate, a metabolic intermediate of the tricarboxylic acid (TCA) cycle, has been previously identified as a GC biomarker [51], but there remain inconsistencies regarding its levels, with some studies reporting increases and others reporting decreases in citrate levels in GC [51,52]. These differences may indicate that the distribution range of citrate levels is quite large in GC, as was observed in our study. However, further qualified studies, including a controlled trial or a cohort study, should be conducted to validate the screening programs using urine metabolomics profiles.
Although our metabolomics results showed different total metabolic profiles between healthy and GC patients, these results cannot stand alone. The complementation for functional linkage via the incorporation of other "omics" datasets in the context of system biology is always required. For this purpose, we conducted a pathway analysis using microarray data to disclose any potential correlation with our metabolomics results. Importantly, we found several metabolite-gene expression correlations. For instance, the expression of ACO2, an enzyme that plays a critical role in the catalysis of citrate to isocitratewas down-regulated in GC patients, supporting the increased citrate levels observed through our metabolomics analysis. Moreover, GGTL4, a member of gamma-glutamyltransferase that plays a key role in the transfer of gamma-glutamyl functional groups to amino acids, expression levels were down-regulated in GC patients. Although a direct correlation between the GGTL4 levels and the production of alanine and taurine may be questionable, GGTL4 is located in a metabolic pathway close to those essential for the production of these metabolites. Another gene responsible for the catalysis of acyl-CoA thioester into either glycine or taurine, BAAT, was up-regulated in GC patients; again, this could be the reason for the elevated taurine levels in GC patients. We also found that mitochondrial creatine kinase (CKMT1B) was differentially expressed in GC patients. CKMT1B is responsible for the transfer of high-energy phosphate groups from the mitochondria to creatine. However, this gene is not directly involved in creatine production, making it difficult to determine the correlation between the expression of CKMT1B and the creatine levels in GC patients. Importantly, overall, we confirmed that the suggested metabolite-gene correlation was supported by the results of additional joint pathway analysis.
In addition, the pepsinogen I/II ratio, which is recently coming into the spotlight, for identifying a high-risk GC group was not examined in this study. However, this biomarker is known to be a useful serologic marker for chronic atrophic gastritis, which is a precancerous lesion. Hence, the purpose of this marker is not consistent with this study as this study demonstrated the diagnostic performance of a metabolomics profile in the detection of GC regardless of accompanying atrophic or metaplastic change. Furthermore, as the sensitivity and specificity of the pepsinogen test differ from country to country, further research is required to increase the efficacy of pepsinogen as a GC biomarker [53]. Another issue is about change of metabolomics profile after treatment. In our study, we only compared active cancer patients and healthy subjects. The evaluation of the urine metabolomics profile of gastric cancer patients, after curative resection, should also be considered, even though there could be a confounding bias due to the dietary life change after surgical gastrectomy. This said, further studies are warranted to elucidate the metabolomics change in gastric cancer subjects, before and after surgery.
For the NMR spectra normalization, we applied the "total area normalization approach", a numerical normalization method rather than a physiological method using, e.g., the osmolality or creatinine levels. Indeed, the normalization against creatinine should be used when the creatinine clearance is constant under stable metabolic regulation. Of note, such an approach is not suitable for our study, because gastric cancer is one of the many diseases causing metabolic dysregulation. Even though the urinary specific gravity and osmolality measurements are currently recommended as normalization methods for the analysis of urine samples, these approaches require an additional measurement, a critical limitation in the context of high-throughput screening research. While various numerical normalization methods are proposed, each approach has its own advantages and limitations [54]. Therefore, the standardized normalization method for NMR-based urine metabolomics has not been established. However, for large sample size-based studies (over 50 samples), the quantile normalization has been suggested with better results, which we will consider as an additional option in further research.

Patients
The study design was approved by the Institutional Review Board at the Samsung Medical Center (IRB No. 2012-08-045), and written informed consent was obtained from all subjects enrolled in this study. We collected urine samples from patients with GC (n = 103) and age-and sex-matched healthy subjects (n = 100) at the Samsung Medical Center. Patients with prior treatment, including chemotherapy or surgery, serious complications such as active bleeding or an obstruction, abnormal liver function or renal function test results, severe cardiopulmonary disease, collagen disease, uncontrolled diabetes mellitus, and active carcinoma at other sites were excluded. All patients were diagnosed using biopsy or surgical resection. Healthy controls were age-and sex-matched subjects with no declared history of any gastrointestinal or chronic diseases, and no gastrointestinal symptoms. Furthermore, healthy controls showed normal endoscopic findings, as per the endoscopic examination results. Subjects diagnosed with benign gastric diseases including gastritis, gastroesophageal reflux disease, ulcer, or benign tumors (as per the endoscopic examination results) were not included in this study. Control samples were obtained from healthy subjects who underwent a routine health check-up at the Samsung Medical Center in the same period. The selection criteria required individuals to not meet the above-listed exclusion criteria, including neoplasms. All subjects were instructed to fast for 8 h before the collection of urine samples. Participants provided a midstream urine sample. No dietary restriction or activity modification was required before the urine collection.

Serum
Assays of CEA, CA 19-9, and CA 72-4 Blood samples were obtained from all patients in the morning during the week before surgery. The blood sample was centrifuged at 1000× g for 10 min to separate the plasma from blood cells. Serum CEA, CA 19-9, and CA 72-4 were measured using a radioimmunoassay. The normal values of CEA, CA 19-9, and CA 72-4 were set at less than 7 ng/mL, 35 U/mL, and 4 U/mL, respectively.

Urine Sample Collection and Preparation
First morning urine samples (3-5 mL) were collected and centrifuged at 3000 rpm for 15 min at 4 • C. The supernatants were transferred to frozen tubes and stored at −80 • C until processing. The frozen urine samples were gently thawed and centrifuged at 15,000 rpm for 20 min at 4 • C. A 500-µL aliquot of the supernatant was mixed with 50 µL of phosphate buffer (1.5 M K 2 HPO 4 , 1.5 M Na 2 HPO 4 , and pH 7.4), centrifuged at 15,000 rpm for 20 min at 4 • C, and then incubated at room temperature for 10 min. For the internal standard, 50 µL of a 0.25% trimethylsilylpropanoic acid (TSP) solution in D 2 O was added to 450 µL of a centrifuged supernatant, and the mixture was transferred into a 5-mm NMR tube.

NMR Data Acquisition and Determination of Metabolic Profiles
All 1 H NMR spectra were obtained using a 500-MHz NMR spectrometer (BioSpin Avance 500; Bruker, Billerica, MA, USA), which was operated at a 500.13-MHz proton frequency at 25 • C using the Carr-Purell-Meiboom-Gill pulse sequence (cpmgpr1d) with 400 ms of the total spin echo delay. The acquisition parameters and data processing steps were as reported previously [35,55]. In brief, a proton spectral FID was collected within a 14-ppm spectral width during 128 scans. A Fourier transformation and phase correction were applied to proton NMR signals using the MNova software (Mestrelab Research S.L., Escondido, CA, USA). The water signal region (4.62-5.15 ppm) was excluded, and the data were normalized and referenced to the total area integration values and the 0.025% TSP value, respectively. All the processed data were compared against the Chenomx NMR database version 8.2 (Spectral Database, Edmonton, AB, Canada) to identify representative metabolites by shifting in appropriate pH ranges. To distinguish between the healthy controls and patients with GC, OPLS-DA was performed using SIMCA-P version 11.0 (Umetrics, Umeå, Sweden) by applying spectral binning data by using an in-house Perl script with a 0.0073-ppm width. To evaluate the performance of the OPLS-DA model, validation was carried out using a separate validation set. Briefly, one-third of the samples from each cohort were randomly selected and used as the validation set using the k-fold cross-validation method. The diagnostic performance of the OPLS-DA model was evaluated based on the Y-variable values of the validation set, obtained using the training set model with an a priori set cutoff value of 0.5. The sensitivity and specificity values were obtained based on the correct prediction of the 33 healthy control and 34 GC patient samples during the validation step. All statistical analyses were performed using statistical software and an online data analysis tool, including SIMCA-P version 11.0 (Umetrics, Umeå, Sweden), OriginPro 8 (OriginLab Corporation, Northampton, MA, USA), R (The R Foundation for Statistical Computing, Vienna, Austria), and Metaboanalyst (www.metaboanalyst.ca).

Microarray Data Analysis
The variation of gene expression in human GC was investigated using the microarray results by Chen et al. [38] Genes involved in the metabolism of the contributing metabolites, identified by S-TOCSY, and were investigated for their differential expression in cancer and normal samples. In detail, we performed pathway-based network analysis using MetScape to visualize and interpret metabolomics and gene expression data in the human metabolic network [39]. We used metabolites identified by our metabolomics analysis, and the pathway-based network was built using the compound-reaction-enzyme-gene method. Subsequently, we obtained the names of the genes potentially associated with the input metabolites and analyzed the gene expression levels from Chen's microarray dataset [38]. From Chen's research dataset, we downloaded all individual microarray data from normal control and patients, and generated a new summarized gene list consisting of "R/G Normalized (Mean)" value to determine the fold changes and adjusted p-values of each gene.

Conclusions
In conclusion, we report here a metabolomics approach for GC screening using urine samples from patients who were mostly diagnosed with early GC and healthy subjects. This approach showed very high diagnostic sensitivity and specificity using a validation set and performed significantly better than did serum tumor markers modalities. An additional genomic data analysis revealed an up-regulation of the expression of several genes in GC. As this was the largest metabolomics study and the first that mostly focused on early GC, the approach developed may have the potential for a mass screening of an average-risk population and may facilitate endoscopic examination through risk stratification.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-6694/12/10/2904/s1, Figure S1: Helicobacter pylori effects on healthy subjects and patients with gastric cancer, Figure S2: Snapshot of metabolite identification using the Chenomx NMR suite, Figure S3: Cancer stage-dependent prediction of healthy subjects and patients with gastric cancer, Figure S4: Pathway-based network analysis using MetScape and Metaboanalyst. The identified metabolites were loaded into MetScape, and a pathway-based network was built using the compound-reaction-enzyme-gene mode, Figure S5: Levels of differentially expressed genes associated with gastric cancer metabolic markers, Figure S6: Validation of changes in differentially expressed gene levels using an independent microarray dataset, Table S1: Comparison of metabolite concentrations, Table S2: List of genes potentially related to metabolites, Table S3: Supplementary dataset.