Detection of Lung Cancer via Blood Plasma and 1H-NMR Metabolomics: Validation by a Semi-Targeted and Quantitative Approach Using a Protein-Binding Competitor

Metabolite profiling of blood plasma, by proton nuclear magnetic resonance (1H-NMR) spectroscopy, offers great potential for early cancer diagnosis and unraveling disruptions in cancer metabolism. Despite the essential attempts to standardize pre-analytical and external conditions, such as pH or temperature, the donor-intrinsic plasma protein concentration is highly overlooked. However, this is of utmost importance, since several metabolites bind to these proteins, resulting in an underestimation of signal intensities. This paper describes a novel 1H-NMR approach to avoid metabolite binding by adding 4 mM trimethylsilyl-2,2,3,3-tetradeuteropropionic acid (TSP) as a strong binding competitor. In addition, it is demonstrated, for the first time, that maleic acid is a reliable internal standard to quantify the human plasma metabolites without the need for protein precipitation. Metabolite spiking is further used to identify the peaks of 62 plasma metabolites and to divide the 1H-NMR spectrum into 237 well-defined integration regions, representing these 62 metabolites. A supervised multivariate classification model, trained using the intensities of these integration regions (areas under the peaks), was able to differentiate between lung cancer patients and healthy controls in a large patient cohort (n = 160), with a specificity, sensitivity, and area under the curve of 93%, 85%, and 0.95, respectively. The robustness of the classification model is shown by validation in an independent patient cohort (n = 72).


Introduction
Metabolomics, or the study of low-molecular-weight molecules in biofluids, such as blood plasma, serum, and urine, offers great potential to answer critical clinical research questions [1][2][3][4][5][6][7][8][9]. A frequently used technique to analyze these complex biological samples is nuclear magnetic resonance (NMR) spectroscopy [10][11][12][13]. This analytical tool allows quantitative data collection in a robust and highly reproducible way, when a standardized procedure is used [14][15][16][17][18]. Although NMR metabolomics is still an upcoming field, several research groups have developed analytical protocols based on different sample preparation conditions, to establish a metabolite profile or fingerprint [10,[18][19][20][21]. Next to mass-spectrometry, multiple metabolite profiling studies are accomplished by means of proton ( 1 H)-NMR metabolomics, since it allows the quantification of all the metabolites in a single run.
Blood plasma inherently contains macromolecules, such as proteins and lipoproteins, of which the signals may overlap with some metabolite signals in the 1 H-NMR spectrum  (I) was immediately followed by measurement (II) under fully identical measuring conditions. Whereas the integration value for alanine and valine signals remains unchanged after TSP addition, the integration values increase significantly for lactate, phenylalanine, and maleic acid. (C) Integration values of measurement (I) can be overestimated or underestimated when normalized towards MA without addition of TSP (yellow bars) because several metabolites as well as MA itself bind to HSA. Correct MAnormalized integration values for all metabolites are only obtained when 4 mM TSP is present (blue bars). MA: maleic acid; ns: not significant; TSP: trimethylsilyl-2,2,3,3-tetradeuteropropionic acid; (**** p < 0.0001).  (I) was immediately followed by measurement (II) under fully identical measuring conditions. Whereas the integration value for alanine and valine signals remains unchanged after TSP addition, the integration values increase significantly for lactate, phenylalanine, and maleic acid. (C) Integration values of measurement (I) can be overestimated or underestimated when normalized towards MA without addition of TSP (yellow bars) because several metabolites as well as MA itself bind to HSA. Correct MA-normalized integration values for all metabolites are only obtained when 4 mM TSP is present (blue bars). MA: maleic acid; ns: not significant; TSP: trimethylsilyl-2,2,3,3-tetradeuteropropionic acid; (**** p < 0.0001).

Maleic Acid as an Internal Standard for Quantification in NMR Metabolomics of Plasma Containing 4 mM TSP
Undoubtedly, TSP is no fitting candidate to serve as a reference for quantification, because of its high binding affinity to HSA. To enable quantification of the plasma metabo-lites, a known concentration of a useful internal standard should be added to the buffer stock solution. The essential criteria for internal standard selection include that its signal is sharp and not overlapping with other signals, and thus easy to integrate. In the proposed methodology, MA is used, since it is well known in analytical chemistry. MA can be purchased with excellent purity, easily dried, and has a distinct solubility in D 2 O, making it an attractive internal standard for quantitative NMR [44]. The addition of MA to plasma samples, in buffer pH 7.4, gives rise to a sharp singlet around 6 ppm, which is a region where no other metabolite signals appear. Figure 1B shows that the addition of 4 mM TSP to the 12 different plasma samples also results in an increase in the MA intensity. This indicates that MA also binds to HSA, however, with a much lower affinity for HSA than the competing TSP. A small relative standard deviation (%RSD) value of only 3.54% is found, originating from the sample preparation, NMR measurement, and integration. Figure 2 shows an almost perfect linear calibration curve that was obtained by means of eight identical reference plasma pool samples, containing 4 mM TSP and a different, but known, amount of MA. The linear behavior with an R 2 value of 0.9998, demonstrates the absence of association between MA and HSA under these conditions.

Maleic Acid as an Internal Standard for Quantification in NMR Metabolomics of Plasma Containing 4 mM TSP
Undoubtedly, TSP is no fitting candidate to serve as a reference for quantification, because of its high binding affinity to HSA. To enable quantification of the plasma metabolites, a known concentration of a useful internal standard should be added to the buffer stock solution. The essential criteria for internal standard selection include that its signal is sharp and not overlapping with other signals, and thus easy to integrate. In the proposed methodology, MA is used, since it is well known in analytical chemistry. MA can be purchased with excellent purity, easily dried, and has a distinct solubility in D2O, making it an attractive internal standard for quantitative NMR [44]. The addition of MA to plasma samples, in buffer pH 7.4, gives rise to a sharp singlet around 6 ppm, which is a region where no other metabolite signals appear. Figure 1B shows that the addition of 4 mM TSP to the 12 different plasma samples also results in an increase in the MA intensity. This indicates that MA also binds to HSA, however, with a much lower affinity for HSA than the competing TSP. A small relative standard deviation (%RSD) value of only 3.54% is found, originating from the sample preparation, NMR measurement, and integration. Figure 2 shows an almost perfect linear calibration curve that was obtained by means of eight identical reference plasma pool samples, containing 4 mM TSP and a different, but known, amount of MA. The linear behavior with an R 2 value of 0.9998, demonstrates the absence of association between MA and HSA under these conditions. Consequently, MA is an ideal internal standard when used in combination with 4 mM TSP. Figure 1C demonstrates that without TSP, the MA-normalized integration values are being overestimated or underestimated (yellow bars), because several metabolites, as well as MA itself, bind to HSA. Only in the presence of 4 mM TSP are correct MAnormalized integration values (and so absolute metabolite concentrations) obtained (blue bars). Figure 3 provides a visual overview of the influence of adding 4 mM TSP on the metabolite signals, as discussed above. Altogether, these results indicate that MA is an excellent internal standard for quantification in plasma NMR metabolomics, if combined with the addition of 4 mM of the strong HSA-binding competitor TSP. Consequently, MA is an ideal internal standard when used in combination with 4 mM TSP. Figure 1C demonstrates that without TSP, the MA-normalized integration values are being overestimated or underestimated (yellow bars), because several metabolites, as well as MA itself, bind to HSA. Only in the presence of 4 mM TSP are correct MAnormalized integration values (and so absolute metabolite concentrations) obtained (blue bars). Figure 3 provides a visual overview of the influence of adding 4 mM TSP on the metabolite signals, as discussed above. Altogether, these results indicate that MA is an excellent internal standard for quantification in plasma NMR metabolomics, if combined with the addition of 4 mM of the strong HSA-binding competitor TSP.

Spiking with 62 Known Metabolites Results in 237 Well-Defined Integration Regions
Large metabolomic databases that provide information about human metab and their NMR characteristics, are powerful, accessible tools that can guide new me lomics research [45,46]. However, it is well known that 'external' conditions, such a ion strength, sample concentration, and temperature, influence the 1 H-NMR che shifts of human blood plasma metabolites [47][48][49]. Additionally, 'intrinsic' condi such as the donor-specific HSA concentration, will not only lead to chemical shift ch in the signals of the HSA-bound metabolites, but also to a sample-to-sample depen underestimation of the amount of such HSA-bound metabolites. In our proposed m uring protocol, the choice of the doublet signal of alanine, serving as chemical shift ence at 1.4938 ppm, was prompted by the fact that it is not influenced by the HSA co tration. A previous study already showed that metabolite identification, using a sp approach, provides a robust way to divide the 1 H-NMR spectrum into well-defined gration regions, for the collection of a spectral dataset that allows metabolic profili human plasma [42]. Efforts are hereby made to define the integration regions that r sent single metabolites, as much as possible. Therefore, spiking experiments were formed for 62 metabolites, using the same sample preparation, NMR measuremen ditions, and processing as described above and in the experimental section. Based o spiking information, the plasma 1 H-NMR spectrum was divided into 237 well-de

Spiking with 62 Known Metabolites Results in 237 Well-Defined Integration Regions
Large metabolomic databases that provide information about human metabolites and their NMR characteristics, are powerful, accessible tools that can guide new metabolomics research [45,46]. However, it is well known that 'external' conditions, such as pH, ion strength, sample concentration, and temperature, influence the 1 H-NMR chemical shifts of human blood plasma metabolites [47][48][49]. Additionally, 'intrinsic' conditions, such as the donor-specific HSA concentration, will not only lead to chemical shift changes in the signals of the HSA-bound metabolites, but also to a sample-to-sample dependent underestimation of the amount of such HSA-bound metabolites. In our proposed measuring protocol, the choice of the doublet signal of alanine, serving as chemical shift reference at 1.4938 ppm, was prompted by the fact that it is not influenced by the HSA concentration. A previous study already showed that metabolite identification, using a spiking approach, provides a robust way to divide the 1 H-NMR spectrum into well-defined integration regions, for the collection of a spectral dataset that allows metabolic profiling of human plasma [42]. Efforts are hereby made to define the integration regions that represent single metabolites, as much as possible. Therefore, spiking experiments were performed for 62 metabolites, using the same sample preparation, NMR measurement conditions, and processing as described above and in the experimental section. Based on this spiking information, the plasma 1 H-NMR spectrum was divided into 237 well-defined integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration region numbers (VAR 001-VAR 237). Table S1 shows the metabolite composition of each integration region, and its start and end ppm values. integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration region numbers (VAR 001-VAR 237). Table S1 shows the metabolite composition of each integration region, and its start and end ppm values. integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration region numbers (VAR 001-VAR 237). Table S1 shows the metabolite composition of each integration region, and its start and end ppm values.  (2) γ'-β 237 3-hydroxy-3-methylbutyrate integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration region numbers (VAR 001-VAR 237). Table S1 shows the metabolite composition of each integration region, and its start and end ppm values. integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration region numbers (VAR 001-VAR 237). Table S1 shows the metabolite composition of each integration region, and its start and end ppm values. integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration region numbers (VAR 001-VAR 237). Table S1 shows the metabolite composition of each integration region, and its start and end ppm values. integrations regions, each representing one or more metabolites. All 62 metabolites, with their chemical shifts, multiplicity, and J-coupling values, are summarized in Table 1, together with their assigned integration region numbers (VAR 001-VAR 237). Table S1 shows the metabolite composition of each integration region, and its start and end ppm values.

The Proposed Methodology Shows a High Robustness Level
To evaluate the robustness of the proposed method, plasma from multiple donors was combined, to obtain a large reference plasma pool from which identical plasma aliquots of 400 µL were taken and stored at −80 • C. Twelve identical NMR samples were prepared from 12 of these aliquots, measured, and processed using the above conditions. The integration values of all 237 integration regions were normalized to that of the MA signal. The MA signal showed a %RSD of only 1.98%. Table S2 demonstrates that the intrasample variability is <10% for almost all the regions, with a large majority of the regions showing even a %RSD <5%. Only nine regions showed a higher variability, they are as follows: VAR 001 (formate, singlet), VAR 009 (3-methylhistidine, singlet), VAR 021 (fumarate, singlet), VAR 022 (uridine, doublet), VAR 024 (allantoin, singlet), VAR 027 (mannose, doublet), VAR 029 (carnitine, multiplet), VAR 030 (non-identified metabolite), and VAR 031 (non-identified metabolite). The higher intra-variability for these regions can be explained by (i) the influence of the neighboring water peak (VAR 027, VAR 029, VAR 030, and VAR 031) and (ii) the very low signal intensity, resulting in a poor signal/noise ratio (VAR 001, VAR 009, VAR 021, VAR 022, and VAR 024).

Method Validation: The Proposed Method Allows Differentiation between Lung Cancer Patients and Healthy Controls in a Large Study Cohort
To validate the proposed methodology and demonstrate its discriminative potential, plasma samples of 141 lung cancer patients and 135 healthy controls were measured and analyzed, as described above. Hemolytic plasma samples (n = 44) were excluded from the analysis. Although our group previously demonstrated that both severe (free hemoglobin concentration ≥ 100 mg/dL) and mild (free hemoglobin concentration ≥ 10 mg/dL) hemolysis did not influence the plasma metabolite profile [50], it was decided to (not yet) include these 44 hemolytic samples in this proof-of-principle study, to ensure correct validation of the proposed methodology (the sample cohorts are still quite large without the hemolytic samples). Table 2 shows the clinical characteristics of the lung cancer patients and healthy controls that were used in the training and validation cohort datasets. The intensities (areas under the peaks) of the well-defined integration regions (Table S1) serve as variables in the multivariate statistical models. These intensities were normalized to the integration value of the MA standard, of which the chemical shift and linewidth was closely monitored for all the measurements (no abnormalities were observed). The first statistical orthogonal partial least squares discriminant analysis (OPLS-DA) training model was constructed, by using samples of 80 lung cancer patients and 80 healthy controls, excluding the 9 variables showing a high intrasample variability (%RSD >10%-see also  Figure S2). Starting from this model (constructed with 221 variables), further data reduction was performed, by excluding the variables that, based on their loading values and their standard errors that were calculated by cross-validation (jack-knifing), showed little or no significant contribution to the model (143 variables). The unsupervised PCA plot, obtained using this reduced dataset of 78 variables (221 minus 143 variables), shows a clear distinctive trend between the two groups ( Figure 4A). The same 78 remaining variables were used for training a supervised OPLS-DA classifier, showing R 2 X(cum) and R 2 Y(cum) model parameters of, respectively, 0.861 and 0.581, and a Q 2 (cum) value of 0.364 ( Figure 4B). This model allows discrimination between the lung cancer patients and healthy controls, with a specificity of 93% and a sensitivity of 85%. Moreover, the receiver operating characteristic (ROC) curve shows an area under the curve (AUC) of 0.95 ( Figure 4C). Strengthened by the PCA analysis, this all demonstrates that the group separation in the supervised classification model is not based on over-fitting due to 'noisy' data. Further support is found in a PCA analysis, obtained by using the reduced dataset (78 variables) and all 232 subjects (of the training and validation cohort together), also showing the distinctive trend between the lung cancer patients and controls ( Figure S3). Moreover, permutation testing of the trained classifier resulted in R 2 and Q 2 values of, respectively, 0.206 and −0.339, again supporting the strength of the model ( Figure S4). Furthermore, testing the trained model on an independent validation cohort of 34 lung cancer patients and 38 controls confirms its validity. Group separation remains apparent in the validation model, with a specificity and sensitivity of 74% for both ( Figure 4D).  Table S3 summarizes the 30 variables with a VIP (variable importance in projection) value >0.80, and indicates whether a decreased or increased integration value (metabolite concentration) is observed for the lung cancer patients compared to the healthy controls.
The integration values of 13 variables show a decrease in lung cancer patients, and could be attributed to the lipid signals of fatty acid chains (FAC), phosphatidylcholines (PC), or sphingomyelins (SM). The integration values of the other 17 most discriminative variables are increased in the lung cancer patients. After careful interpretation of the NMR spectra, as described in the Section 4, the metabolites that are responsible for this increase are identified as glucose, isoleucine, leucine, glycerol, and isopropanol.
Metabolites 2021, 11, x FOR PEER REVIEW 15 of 24 are increased in the lung cancer patients. After careful interpretation of the NMR spectra, as described in the Materials and Methods section, the metabolites that are responsible for this increase are identified as glucose, isoleucine, leucine, glycerol, and isopropanol.

Discussion
This study presents the successful development of a robust quantitative 1 H-NMR metabolomics method and its validation, by showing differentiation between large cohorts of lung cancer patients and healthy controls, based on their plasma metabolite profile.
In order to quantify plasma metabolites, some studies describe the use of internal standards, such as formic acid [10,51,52] or 4,4-dimethyl-4-silapentane-1-ammonium trifluoroacetate (DSA) [53]. However, when future studies demand for an absolute determination of metabolite concentrations, the standard should be very pure and water-free (to avoid a contribution of bounded water and impurities to the analytically weighed amount of the standard), which will be difficult for these standards. In contrast, maleic acid (MA) is a commonly used analytical standard, which is easily dried in a vacuum oven, and can be purchased with excellent purity, making it an ideal and reliable internal standard for plasma metabolite quantification.
This study demonstrates, for the first time, that combining MA and 4 mM TSP, as a strong human serum albumin (HSA) binding competitor, allows an accurate

Discussion
This study presents the successful development of a robust quantitative 1 H-NMR metabolomics method and its validation, by showing differentiation between large cohorts of lung cancer patients and healthy controls, based on their plasma metabolite profile.
In order to quantify plasma metabolites, some studies describe the use of internal standards, such as formic acid [10,51,52] or 4,4-dimethyl-4-silapentane-1-ammonium trifluoroacetate (DSA) [53]. However, when future studies demand for an absolute determination of metabolite concentrations, the standard should be very pure and water-free (to avoid a contribution of bounded water and impurities to the analytically weighed amount of the standard), which will be difficult for these standards. In contrast, maleic acid (MA) is a commonly used analytical standard, which is easily dried in a vacuum oven, and can be purchased with excellent purity, making it an ideal and reliable internal standard for plasma metabolite quantification.
This study demonstrates, for the first time, that combining MA and 4 mM TSP, as a strong human serum albumin (HSA) binding competitor, allows an accurate determination of the plasma metabolite concentrations, because the binding of metabolites and MA to HSA is prevented. Under these conditions, the upfield peak of the methyl doublet of the non-HSA-binding metabolite alanine, having a large J-coupling value of 7.2 Hz, is an ideal candidate to calibrate the chemical shift ppm scale, since the doublet lines are baseline resolved. Moreover, plasma HSA comprises 50% to 60% of the total plasma proteins [54], and varies between individuals, hereby explaining the sample-to-sample differences in the chemical shift and intensity of several plasma metabolites in the 1 H-NMR spectrum, when TSP is added in only a small amount or not at all. The use of MA as internal standard was recently also reported by Bliziotis et al. [55], but the effect of TSP on the HSA-MA binding was not taken into account. Gowda et al. recently also verified the use of MA (or fumaric acid) for blood plasma samples, and proposed to use it as an internal standard in combination with protein precipitation [56].
Other reported methods that (partly) release HSA-bounded metabolites by using an HSA competitor, include the addition of, e.g., fatty acids [24] or SDS [57]. However, the addition of these compounds will give rise to additional overlapping signals in the spectrum, which is not the case for the upfield TSP signal around 0.00 ppm. Furthermore, in contrast to this protocol that allows the inclusion of the contribution of different lipid types to the metabolite profile, as shown in Table S1, the addition of fatty acids will hinder this possibility.
Where the 1 H-NMR metabolomics spectra often encounter signal assignment uncertainties, due to overlapping metabolite signals, the proposed methodology resolves this issue by performing selective metabolite spiking experiments (instead of using only library-based identification). All the signals of 62 metabolites are hereby identified, and consequently, 237 well-understood integration regions are defined that can serve as accurate variables to establish a reliable metabolite profile, or fingerprint, of an individual that allows the construction of multivariate statistical models for, e.g., disease diagnosis and therapy follow-up, as well as deeper investigation of the metabolic pathways related to the disease. By crucial validation of the proposed methodology in a large patient cohort (n = 232), it is shown that the method is robust and enables a clear differentiation between lung cancer patients and healthy controls.
In this study, the metabolite profile is further demonstrated to be of high value for a targeted profiling, enabling immediate backtracking of the critical key metabolites that contribute to a disturbed metabolism in lung cancer diseases. The unraveling of cancer metabolism, with its aberrant biochemical pathways, is classically focused on central carbon metabolism and the support of tumor growth via rapid energy production. Although it was not yet the primary goal of this methodology study, this paper already discusses the role of the metabolites that differentiate the strongest between lung cancer patients and healthy controls in the reprogrammed biochemical pathways. A detailed metabolic pathway interpretation will be the content of a follow-up paper.
Proliferating cancer cells are known to be highly glycolytic in order to meet their metabolic requirements. The elevated plasma glucose levels that are found in lung cancer patients, suggest a compensatory upregulated gluconeogenesis in other tissues, primarily the liver, using lactate that is derived from muscle activity, enabling rapid energy production via glycolysis within the tumor [58]. Interestingly, several studies suggest that the upregulated systemic gluconeogenesis can be supported even more by using lactate originating from fermentation (or aerobic glycolysis) within proliferating cancer cells, hereby creating a cyclic metabolic co-operation between tumor and healthy tissue [59][60][61][62].
The increased plasma glycerol levels can be explained in the same context of fueling the highly proliferating tumor cells. Glycerol, from the bloodstream, can enter the tumor cell and serve as a backbone for fuel biosynthesis (triacylglycerols) and phospholipid membrane formation [63]. In addition, glycerol will be used for gluconeogenesis within cancer cells [63]. Indeed, suggestive results, pointing to an adaptive response of gluconeogenesis within cancer cells upon glucose deprivation, were provided by Leithner et al. [64]. The heterogeneous character of tumors should (always) be taken into account, as more and more evidence indicates a combined appearance of glycolysis and gluconeogenesis, with a flexible difference in flux (enhanced metabolic flexibility) rather than a complete separated character within cells [65,66].
This study also reveals a higher plasma concentration of the branched-chain amino acids (BCAAs) leucine and isoleucine, in lung cancer patients. A previous study highlights the fact that BCAAs can also play an anaplerotic role and fuel the tricarboxylic acid (TCA) cycle via conversion to acetyl-CoA [67]. BCAA-derived acetyl-CoA is also reported to be targeted by histone acetyl transferases (HATs), regulating histone acetylation and hereby stimulating gene expression [68,69].
A reduced level of plasma lipids in lung cancer patients is in accordance with the high need of membrane synthesis by cancer cells. In non-small-cell lung cancer (NSCLC) tumor cells, enzymes such as lipoprotein lipase (LPL) are known to use triacylglycerols (TAGs) and phospholipids (PLs) from the bloodstream to acquire energy for membrane synthesis and tumor proliferation via lipolysis [70,71]. Furthermore, the previously mentioned increased plasma glycerol concentrations support the enhanced lipolysis and TAG catabolism [72].
Isopropanol can be oxidized to acetone by alcohol dehydrogenase (ADH)-type enzymes. However, a disturbed high NADH/NAD + ratio can shift this balance towards reduction, explaining an elevated plasma isopropanol level [73]. While healthy cells can rely on ketone bodies (such as acetone) for energy production via oxidative phosphorylation (OXPHOS), tumor cells show an impaired OXPHOS [74,75]. Thus, a ketotic state might be an unfavorable condition for cancer cells.
Expected future work will involve the inclusion of plasma samples from other collection sites, but if the described sample collection protocol is carefully followed, no problem is expected to combine interlab datasets into one large data matrix. Moreover, if other NMR analysis sites come into play (using spectrometers with the same magnetic field strength), it will be important to first validate the analysis protocol by setting up an interlaboratory ring trial using pooled reference plasma samples, in order to evaluate the repeatability of the results. ples', were used for the maleic acid calibration curve; the metabolite spiking experiments; and the determination of the error on the NMR sample preparation and measurement. Subjects for the model training and validation. Plasma samples from 276 donors were analyzed, but hemolytic plasma samples (n = 44) were excluded from the dataset. Finally, 160 subjects (80 lung cancer patients and 80 healthy controls) were selected for the training cohort while the remaining 72 subjects (34 lung cancer patients and 38 healthy controls) were used for an independent validation of the trained classification model ( Figure 5).

Subjects
All blood samples were collected from donors who met the following inclusion criteria: (I) fasted and no medication intake for at least 6 h, (II) fasting blood glucose concentration below 200 mg/dL, and (III) no treatment or history of cancer in the past 5 years. Plasma samples of a plasma pool (multiple donors), further referred to as 'plasma reference samples', were used for the maleic acid calibration curve; the metabolite spiking experiments; and the determination of the error on the NMR sample preparation and measurement.
Subjects for the model training and validation. Plasma samples from 276 donors were analyzed, but hemolytic plasma samples (n = 44) were excluded from the dataset. Finally, 160 subjects (80 lung cancer patients and 80 healthy controls) were selected for the training cohort while the remaining 72 subjects (34 lung cancer patients and 38 healthy controls) were used for an independent validation of the trained classification model ( Figure 5).

Preanalytical Sample Preparation
Plasma sample collection and storage. Plasma samples collected during the NCT02024113 study were retrieved from the University Biobank Limburg (UBiLim) for immediate NMR analysis. All plasma samples were obtained as follows: fasting venous blood was collected and stored at 4 °C within 5 to 10 min. Within 8 h after blood collection, blood samples were centrifuged at 1600× g for 15 min (swinging bucket centrifuge), as described by Louis et al. [2]. Finally, plasma aliquots of 400 µL were transferred into sterile cryovials and stored at −80 °C until NMR analysis.
Buffer preparation containing TSP and the internal MA standard for quantification. Stock solutions of 1 M K2HPO4 and 1 M KH2PO4 in D2O were prepared by dissolving, respectively, 174.18 g/L and 136.09 g/L in D2O. The 0.15 M potassium phosphate pH 7.4 buffer used for NMR sample preparation was obtained as follows: 85 mL D2O was added to a solution containing 4 mL of the 1 M KH2PO4 stock and 11 mL of the 1 M K2HPO4 stock. The final buffer to prepare the plasma samples for NMR measurements was obtained by dissolving 137.82 mg TSP, and 6.25 mg dried maleic acid (MA) as internal standard for

Preanalytical Sample Preparation
Plasma sample collection and storage. Plasma samples collected during the NCT02024113 study were retrieved from the University Biobank Limburg (UBiLim) for immediate NMR analysis. All plasma samples were obtained as follows: fasting venous blood was collected and stored at 4 • C within 5 to 10 min. Within 8 h after blood collection, blood samples were centrifuged at 1600× g for 15 min (swinging bucket centrifuge), as described by Louis et al. [2]. Finally, plasma aliquots of 400 µL were transferred into sterile cryovials and stored at −80 • C until NMR analysis.
Buffer preparation containing TSP and the internal MA standard for quantification. Stock solutions of 1 M K 2 HPO 4 and 1 M KH 2 PO 4 in D 2 O were prepared by dissolving, respectively, 174.18 g/L and 136.09 g/L in D 2 O. The 0.15 M potassium phosphate pH 7.4 buffer used for NMR sample preparation was obtained as follows: 85 mL D 2 O was added to a solution containing 4 mL of the 1 M KH 2 PO 4 stock and 11 mL of the 1 M K 2 HPO 4 stock. The final buffer to prepare the plasma samples for NMR measurements was obtained by dissolving 137.82 mg TSP, and 6.25 mg dried maleic acid (MA) as internal standard for quantification in the 0.15 M phosphate buffer pH 7.4, in order to obtain a buffer containing 8 mM TSP and 107.70 µM (62.50 µg/mL) MA.
Sample preparation. Before NMR analysis, plasma aliquots were thawed and homogenized using a vortex mixer. After centrifugation at 13,000× g for 4 min at 4 • C (fixed rotor Eppendorf centrifuge), 350 µL plasma was added to 350 µL 0.15 M potassium phos-phate buffer pH 7.4 in D 2 O, giving a 700 µL sample containing 4 mM TSP and 53.85 µM (31.25 µg/mL) MA as an internal standard for quantification. Finally, the samples were transferred into 5 mm NMR tubes and immediately analyzed.

Metabolite Spiking
Stock solutions for metabolite spiking were prepared by dissolving 1 mg of a known metabolite in 100 µL reference plasma. In the next step, 10 to 30 µL of this stock solution was added to a standard NMR sample (350 µL reference plasma and 350 µL buffer containing TSP and MA), creating a spiked plasma sample for each metabolite separately. Each sample was subsequently analyzed by 1 H-NMR spectroscopy using the experimental parameters as described above. This procedure was repeated for 62 metabolites (see Table 1).

1 H-NMR Analysis
NMR data acquisition. After thermal sample stabilization for 5 min at 25 • C, 1 H-NMR spectra were recorded at 25 • C with 96 scans (total measurement time of 9 min) on a 600 MHz JEOL NMR spectrometer having a magnetic field strength of 14.1 Tesla. Slightly T 2 -weighted spectra were acquired using the CPMG pulse sequence to attenuate signals of remaining plasma proteins, such as albumins with a short T 2 relaxation time. The CPMG pulse sequence (total spin-echo time of 64 ms; spin-echo delay of 0.4 ms; 160 loops) was preceded by 16 prescans. Water suppression was accomplished by presaturation for 3 s. Other parameters used were as follows: 16 k data points, a spectral width of 12 ppm, and an acquisition time of 2.27 s.
NMR spectra processing. Spectra were processed using JEOL Delta software (version 5.3.1) for all following processing steps. A line broadening of 0.8 Hz and a zero-filling factor of four to 64 k datapoints was applied. After Fourier transformation, the spectra were phased manually and baseline corrected. The upfield peak of the methyl doublet of alanine was used to calibrate the ppm chemical shift scale at 1.4938 ppm. Finally, spectra were divided into 237 fixed integration regions, rationally defined based on the metabolite spiking results. For the TSP-HSA binding experiments described in Figure 1A,B, non-ISnormalized integration values (area under the peaks) were used, i.e., integration values not normalized against an internal standard (IS, of known and fixed concentration), but normalized against a freely chosen, but fixed, normalization value. This approach can be used since spectra without TSP and with 4 mM TSP are taken immediately after each other under fully identical NMR measuring conditions (10 µL of TSP-containing buffer solution was added to the NMR tube after the first measurement). For the orthogonal partial least squares discriminant analysis (OPLS-DA) model training and validation, all integration values of the 237 integration regions were normalized to the integration value of the MA internal standard.

Statistical Analysis for Model Training and Valorization
Multivariate statistical analysis was performed using SIMCA ® (version 15.0.2). All variables were normalized to the integration value of MA, mean-centered, and Pareto scaled. Variables showing high variation, i.e., an intrasample %RSD >10% (9 variables) or an intersample %RSD >30% (7 variables), were removed from the dataset. The first training model was constructed using a large cohort of 80 lung cancer patients and 80 healthy controls, and all remaining 221 variables by means of OPLS-DA. In the next step, data reduction was performed by removing 'noisy' variables that, based on their loading values and their standard errors calculated by cross-validation (jack-knifing), showed little or no significant contribution to the model (143 variables). More specifically, a variable is considered significant if its loading (absolute) value exceeds its standard error defined by cross-validation. Interpretation of variable significance by comparing its loading and its standard error resulting from cross-validation (jack-knife interval) is a commonly used method for variable selection, by taking the error on the predicted scores from all cross-validation rounds into account [76][77][78][79][80][81]. Using this reduced dataset of 78 variables (221 minus 143 variables), an unsupervised principal component analysis (PCA) model was constructed to confirm the separation of the two groups. Moreover, a receiver operating characteristic (ROC) curve was constructed to evaluate the selection of significant variables, where an AUC (area under the curve) value close to 1 indicates a strong model. The same dataset was also used to construct the final supervised OPLS-DA classifier (trained classification model). Finally, the classifier was validated by (i) a seven-fold internal cross-validation, (ii) a permutation testing, where an R 2 value of ±0.2 and a negative Q 2 value typically indicate a good model, and (iii) a validation using an independent validation cohort of 34 lung cancer patients and 38 controls.

Metabolite Identification
From the 78 variables that were used to construct the discriminative OPLS-DA model, 30 variables with a VIP (variable importance in projection) value >0.80 were selected. An S-plot was used to identify which variables/metabolites are increased or decreased in lung cancer patients compared to healthy controls. 1 H-NMR spectra of individuals with high and low values for these variables were selected and the corresponding peaks and their J-coupling multiplicities were compared to ensure correct metabolite identification for the variables that show strong discriminative power between the two groups.

Conclusions
This paper validates the use of maleic acid (MA) as an internal standard to quantify the human plasma metabolite profile with 1 H-NMR spectroscopy and to detect metabolic changes occurring in patients with diseases, as demonstrated in this work for lung cancer. It is shown that by adding 4 mM TSP as a strong competitor, the metabolite peak intensities become independent of the varying sample-to-sample human serum albumin (HSA) concentration, thus avoiding the need for (low-reproducible) protein precipitation. Based on metabolite spiking, and using the methyl signal of alanine to calibrate the ppm chemical shift scale, the plasma 1 H-NMR spectrum is divided into 237 fixed integration regions, serving as variables in multivariate statistics. The resulting classification model allows discrimination between 80 lung cancer patients and 80 healthy controls, with a specificity of 93% and a sensitivity of 85%, in combination with an area under the curve of 0.95. Last, but not least, the robustness of the classifier is demonstrated in an independent validation cohort (n = 72).

Supplementary Materials:
The following are available online at https://www.mdpi.com/article/ 10.3390/metabo11080537/s1, Figure S1: zoom-in of a lactate signal upon the addition of different amounts of TSP, Figure S2: OPLS-DA classifier of the training cohort using 221 variables, Figure S3: principal component analysis (PCA) of all 232 subjects from the training and validation cohort, Figure S4: permutation test of the training model, Table S1: overview of the 237 integration numbers (variables), and their corresponding integration regions and contributing metabolites, Table S2: overview of the intrasample variability of the 237 integration regions, Table S3: overview of the 30 variables with the highest total VIP value based on the OPLS-DA model that was constructed with 78 variables.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available on request.