Tomato Juice Consumption Modifies the Urinary Peptide Profile in Sprague-Dawley Rats with Induced Hepatic Steatosis

Non-alcoholic fatty liver disease (NAFLD) is the most common liver disorder in Western countries, with a high prevalence, and has been shown to increase the risk of type 2 diabetes, cardiovascular disease (CVD), etc. Tomato products contain several natural antioxidants, including lycopene—which has displayed a preventive effect on the development of steatosis and CVD. Accordingly, the aim of the present work was to evaluate the effect of tomato juice consumption on the urinary peptide profile in rats with NAFLD induced by an atherogenic diet and to identify potential peptide biomarkers for diagnosis. Urine samples, collected weekly for four weeks, were analyzed by capillary electrophoresis (CE) coupled to a mass spectrometer (MS). A partial least squares-discriminant analysis (PLS-DA) was carried out to explore the association between differential peptides and treatments. Among the 888 peptides initially identified, a total of 55 were obtained as potential biomarkers. Rats with steatosis after tomato juice intake showed a profile intermediate between that of healthy rats and that of rats with induced hepatic steatosis. Accordingly, tomato products could be considered as a dietary strategy for the impairment of NAFLD, although further research should be carried out to develop a specific biomarkers panel for NAFLD.


Introduction
Non-alcoholic fatty liver disease (NAFLD) is the most common liver disorder in Western countries and has a high prevalence, affecting between 14% and 24% of the population and reaching higher incidence (25%-75%) in cases of obesity and type 2 diabetes individuals. Traditionally, NAFLD has affected the adult population; however, it is extending to children and adolescents, due to the increased prevalence of obesity in these sub-populations [1,2]. Obesity, type 2 diabetes, dyslipidemia and hypertension are the most important risk factors, and NAFLD is considered the hepatic manifestation of metabolic syndrome. The hallmark of NAFLD is hepatic lipid accumulation, mainly triglycerides, in the absence of significant ethanol consumption or viral hepatitis. It covers a spectrum of pathologies, ranging from hepatic steatosis to steatohepatitis (NASH), fibrosis and even cirrhosis. Although significant progress in understanding the pathogenesis of NAFLD has been achieved in recent years, the mechanisms leading to liver steatosis and further progress to NASH still remain unclear [3,4].
Two steps or hits have been proposed for the pathophysiology of NAFLD and NASH: the first hit is due to the triglycerides accumulation as a consequence of insulin resistance; and the second hit includes oxidative stress, lipid peroxidation, increased cytokine production and inflammation, resulting in NASH [5]. Although the "two hit hypothesis" is the most supported theory, currently under consideration is a "multiple parallel hits hypothesis", which suggests that overlapping among insulin resistance, hepatic de novo lipogenesis and subsequent hepatocyte injury, as well as the effects of some candidate genes, could contribute to the progression from simple steatosis to NASH [6,7].
In addition, patients with NAFLD show an important risk of development of cardiovascular disease (CVD), mainly associated with the abnormalities in lipid and lipoprotein metabolism accompanied by chronic inflammation, and even by oxidative stress. In fact, the current evidence raises the possibility that NAFLD may not only be a marker, but also an early mediator of atherosclerosis [8,9].
Currently, lifestyle modification, including changes in dietary habits, is the most accepted treatment for NAFLD [10][11][12]. In this regard, the Mediterranean diet has been suggested as the most appropriate dietary strategy for this pathology, mainly due to the high consumption of plant-based foods and low intake of saturated fats and refined sugars [13]. Tomato products are a dietary source of natural antioxidants such as vitamins C and E, polyphenols, β-carotene and, especially, lycopene, the most abundant carotenoid in this fruit [14,15]. Previous studies have shown that the consumption of tomatoes and tomato products strengthens the antioxidant system and inhibits lipid peroxidation in humans [16,17]. Scientific evidence suggests that the role of lycopene as an antioxidant agent in the prevention of CVD is related to the effect of this carotenoid on lipoprotein metabolism, decreasing total cholesterol and the content and oxidation of LDL-cholesterol [17][18][19]. Moreover, lycopene has been found to be a most effective antioxidant for liver health [20], showing different beneficial effects on liver metabolism in rats with induced NAFLD. Related to this, different authors have described, in animal models, a preventive effect of tomato consumption on steatosis, increasing mitochondrial and peroxisomal fatty acid oxidation [21][22][23][24], that could prevent the development of NASH due to the inhibitory effect of lipid peroxidation in the liver tissue [25,26].
In our research group, we have shown that the consumption of tomato juice by rats with NAFLD induced by a high-fat diet ameliorated the steatosis, improving the metabolic pattern in the animals-which reached a state more similar to that of healthy rats [21,24]. In these investigations, invasive methodologies-based on the analysis of plasmatic biomarkers and the amino acid profile and gene expression of the liver-were applied to evaluate the effect of the accumulation of lycopene on NAFLD. Recently, technologies based on proteome and peptidome analysis of biological fluids are becoming targets for disease diagnosis, since they provide large amounts of information on the physiological state of an organism. In this regard, urine is easy to collect in large quantities and it is more stable and less complex than blood, making it a suitable fluid for peptide biomarkers detection [27,28]. Capillary electrophoresis coupled to a mass spectrometer (CE-MS) has been shown to be an excellent platform to identify urinary biomarkers for diagnosis [29][30][31].
Currently, liver biopsy remains the gold standard for the diagnosis of NAFLD. However, liver biopsy is an invasive procedure not fit for general screening given its associated costs, risks and sampling errors, as the liver is not necessarily uniformly affected by this pathology [32]. Moreover, the high prevalence of NAFLD, especially in high-risk sub-populations, makes the employment of liver biopsy difficult due to its low-throughput nature; thus, there is a crucial need to discover new biomarkers for the diagnosis and evaluation of the stage of NAFLD through non-invasive methods. Using proteomics, several studies have revealed a number of serum proteins for the diagnosis of NAFLD [33,34]. However, the blood proteome is highly complex, with an extensive mass range, and the collection of blood is an invasive procedure [31].
Taking all factors into consideration, the aim of this investigation was to identify potential biomarkers for the diagnosis of NAFLD, using a peptidomic approach, and to evaluate if the intake of tomato juice modifies the urinary peptide profile in Sprague-Dawley rats with hepatic steatosis induced by a high-fat diet, in comparison with healthy animals. The urinary peptide profile was analyzed using CE-MS, in order to discover specific biomarkers for the diagnosis of this pathology.

Results
In terms of the biochemical parameters, total cholesterol was higher in hypercholesterolemic and high-fat diet (H) than in standard diet (N) groups ( Table 1). As expected, total cholesterol, LDL-cholesterol and triglycerides were significantly higher in the rats with induced steatosis (hypercholesterolemic and high-fat diet and water (HA) and hypercholesterolemic and high-fat diet and tomato juice (HL) than in the rats of groups standard diet and water (NA) and standard diet and tomato juice (NL). All the changes in the lipid profile were related to diet H, but some changes were also associated with the intake of tomato juice in the NL group (Table 1). These changes show the dyslipidemia associated with NAFLD, which was also confirmed by the increase in the alanine transaminase (ALT) and aspartate transaminase (AST) enzymes, in rats fed the H diet (Table 1). In a previous study [24], we confirmed by histological examination the steatosis grade of 2 or 3 in these animals, according to the classification of Brunt et al. [35].  (a,b) show significant statistical differences between groups after carrying out one-way ANOVA (p < 0.05). NA: standard diet and water; NL: standard diet and tomato juice; HA: hypercholesterolemic and high-fat diet and water; HL: hypercholesterolemic and high-fat diet and tomato juice; ALT: Alanine transaminase; AST: Aspartate transaminase.
As explained in the Methods, the best sparse partial least squares-discriminant analysis (sPLS-DA) classification (lowest error rate) was for the data organized in three groups (N, HA, HL).          The urinary polypeptide fingerprints of the group with hepatic steatosis but without tomato juice consumption (HA) were very different to those of the other groups, showing a high complexity during the first three weeks-especially due to the high relative abundance of most of the peptides and the presence of high-molecular-weight peptides, as shown in the proteomic fingerprints ( Figure 3).
Initially, 888 peptides were obtained after processing the urine samples. However, according to the sPLS-DA models, only 55 peptides (changing a long time) can be considered potential biomarkers (Table 2) according to the protocol established in the Methods. These peptides were present at a frequency ≥30% in at least one group (control or cases), a criterion used in other biomarker studies [36].
The protein identity was derived by matching the amino acid sequences against a protein database [36]. Table 2 shows the week, the identified and named proteins, the mass and the migration time of a peptide in the CE-MS analysis, as well as the mean relative abundance, the frequency of occurrence of each peptide and the fold change of groups HA and HL versus group N. The urinary polypeptide fingerprints of the group with hepatic steatosis but without tomato juice consumption (HA) were very different to those of the other groups, showing a high complexity during the first three weeks-especially due to the high relative abundance of most of the peptides and the presence of high-molecular-weight peptides, as shown in the proteomic fingerprints ( Figure 3).
Initially, 888 peptides were obtained after processing the urine samples. However, according to the sPLS-DA models, only 55 peptides (changing a long time) can be considered potential biomarkers ( Table 2) according to the protocol established in the Methods. These peptides were present at a frequency ≥30% in at least one group (control or cases), a criterion used in other biomarker studies [36].
The protein identity was derived by matching the amino acid sequences against a protein database [36]. Table 2 shows the week, the identified and named proteins, the mass and the migration time of a peptide in the CE-MS analysis, as well as the mean relative abundance, the frequency of occurrence of each peptide and the fold change of groups HA and HL versus group N.  Overall, most of the 55 peptides showed clear differences between the groups fed with the standard diet or high-fat diet, as shown by the fold change values (Table 2). However, some peptides showed greater increases or decreases in the hepatic steatosis group with respect to group N, group HL being closer to the healthy group over the weeks (Figure 4).
With regard to the behavior of specific proteins, apolipoprotein A-IV (peptide 13451) was found to decrease in group HA, compared to groups N and HL, in Weeks 1 and 3. Collagen α-1(I) chain (peptide 4970) and collagen α-1(II) chain (peptide 11153) showed a clear increase in group HA in Week 2, whilst their values in group HL were closer to those of group N. Fibrinogen α chain precursor (peptide 7243), extracellular superoxide dismutase (Cu-Zn) (peptide 9955) and proline-rich protein (peptide 14976) showed the same tendency in Week 2 as the above-mentioned collagen α chains. Uromodulin (peptide 14544) had greater abundance in the H groups, being especially increased in group HA, and was not detected in group N at Week 3. In contrast, in Week 3, transketolase (peptide 8116) and L-lactate dehydrogenase B chain (peptide 14576) showed important decreases in the H groups, being lower in rats that did not drink tomato juice. However, it is important to remark that in Week 4 only two peptides, 6975 (collagen α-1(I) chain) and 11644 (collagen α-1(I) chain precursor), were identified as classificatory peptides.

Discussion
The CE-MS approach is rapid, sensitive and automated [29]. In addition, this platform allows the detection of differences between the urinary proteomes from healthy and unhealthy individuals; therefore, it is a useful tool for the diagnosis and prevention of diseases. For this reason, one of the objectives of this work was to discover potential biomarkers associated with hepatic steatosis, since, to the best of our knowledge, there is currently no specific panel of urinary biomarkers for NAFLD. Taking into consideration that the prevalence of NAFLD is dramatically increasing and that its diagnosis is mainly based on a liver biopsy, proteomics appears an interesting approach not only for the diagnosis of NAFLD but also to allow better understanding of its pathogenesis, employing a non-invasive method [37].
The urinary peptide fingerprints of the three groups (N, HA and HL) (Figure 3), as well as the sPLS-DA models (Figure 4), suggest that rats fed the H diets showed changes related to their physiological condition, since rats of groups HA and HL had steatosis with associated clinical symptoms, like dyslipidemia and increased activity of the enzymes ALT and AST. However, it is noteworthy that the rats of group HL showed a peptide profile in a position intermediate between those of groups HA and N along the three first weeks. This suggests that tomato juice intake could have modified the urinary peptide profile of rats fed with the high-fat diet, leading to a status closer to that of the healthy group. This tendency is also supported by Figure 4, where group HL is found in a similar situation from Weeks 1 to 3. This may be due to the tomato juice consumption triggering a protective effect on the pathophysiology of hepatic steatosis during the three first weeks, this effect having been overcome by the mechanisms of the pathogenesis at the end of the study (Week 4).
The positive effect of tomato juice consumption, and the consequent accumulation of lycopene in the liver, has been described for these rats in previous studies conducted by our research group. Bernal et al. [21] reported a complex effect of lycopene from tomato juice, showing: (a) alleviation of amino acid depletion; (b) recovery of the redox balance in the liver; and (c) an increase in L-carnitine, which could indicate an improvement in the transport of fatty acids into the mitochondria. Martín-Pozuelo et al. [24] described a decrease in urinary isoprostranes and an over-expression of genes related to mitochondrial and peroxisomal fatty acid oxidation in rats with steatosis that had drunk tomato juice (HL group). In general, other authors, using different experimental designs, have reported a beneficial effect of tomato or lycopene consumption on NAFLD, by reduction of the oxidative stress and also improvement of the lipid metabolism, exerting a preventive effect on the progression to NASH [22,23,25].
In our present work, most of the 55 discriminant peptides were able to show a clear difference between animals fed with the standard diet and those receiving the high-fat diet (N and H groups), as can be observed in the fold change values included in Table 2. Some interesting changes were observed in several proteins, identified from peptides analysed in the urine samples by CE-MS, whose levels were increased or reduced in group HA compared to group N, group HL showing, in most of these cases, levels closer to those of the healthy group.
Apolipoprotein A-IV is secreted in the small intestine, to absorb dietary fats, and it is also involved in glucose homeostasis and the reverse transport of cholesterol and lipids through chylomicrons and high density lipoproteins (HDL). In addition, it possesses important antioxidant and anti-inflammatory properties. For these reasons, the circulatory levels of this protein have been considered a target in the diagnosis and treatment of CVD, as well as diabetes and obesity [38]. Moreover, different proteomic studies [39] have shown that a deficiency of plasmatic apolipoproteins is associated with a higher prevalence of NAFLD. This is in concordance with our findings, since the abundance of apolipoprotein A-IV was lower in urine samples from the H groups (fold change −2.20 in HA and −1.72 in HL, Table 2) compared to group N, reflecting the fact that this apolipoprotein is mainly present in HDL lipoproteins, whose plasma levels were reduced in these groups. However, in group HL, the levels of this protein were higher than in group H, increasing slightly in group HL over the weeks. This could be related to the consumption of tomato juice, since several studies have described changes in plasmatic cholesterol (total and its fractions) related to the consumption of tomato juice and the accumulation of lycopene in the body [17,40,41], mainly due to the inhibitory effect of lycopene on 3-hydroxy-3-methylglutaryl-CoA reductase [42].
The association of the serum concentrations of extracellular matrix components, especially serum type IV collagen 7S, measured with routine laboratory parameters, and the degree of fibrosis in NAFLD has been studied extensively [43][44][45]. It is believed that the ballooning of hepatocytes and releases of type IV collagen are the main causes of the increased serum levels, although the mechanism of this interaction is still unknown [39]. Accordingly, in our study other types of collagen have been detected in the urine samples, but without a clear behavior.
Uromodulin has been validated as a biomarker of hypertension and renal injury and has been found at higher levels in patients with hypertension, relative to healthy individuals [46]. This peptide was not detected in healthy rats, but was present in the H groups; in particular, a higher relative abundance of uromodulin was observed in group HA, since a high-fat diet is associated with disturbances of arterial pressure [47]. The consumption of tomato juice appeared to exert a positive effect on health, because-although the frequency of uromodulin in the HL group was 100%-the relative abundance was reduced significantly, by more than a half, in comparison with the HA group samples. This finding could be explained by the ability of lycopene to reduce the systolic blood pressure, as suggested by Ried and Falkler [40], or by the general improvement of endothelial function produced by lycopene [48].
Furthermore, the abundance of transketolase was considerably reduced in the H groups, especially in group HA (fold change −5.94). This enzyme participates in numerous metabolic pathways at the cellular level, such as the pentose phosphate pathway, and its activity has been reported to change in several pathologies such as diabetes. In fact, this enzyme plays an important role in the prevention of vascular damage in hyperglycemia, caused mainly by injury to the mitochondrial function due to the presence of reactive oxygen species [49]. Moreover, Boren et al. [50] showed that this enzyme was also active in the peroxisomes of liver parenchymal cells; so, hepatic damage could alter such activity. The proline-rich proteins (PRPs) are a heterogeneous group of proteins with important biological functions, such as the expression of immunomodulatory and antioxidant properties, in secondary modifications of collagen molecules and in the modulation of interactions between proteins, so they have a crucial role in cellular signal transduction pathways [51]. In rats with steatosis the relative abundance of transketolase was significantly reduced in comparison with the N and HL groups, whereas the relative abundance of PRPs was significantly increased in HA, more than in HL. The transketolase levels were especially low in rats with steatosis, probably due to the damage to mitochondria and peroxisomes caused by lipid peroxidation products. The higher levels of PRPs might be related also to NAFLD, their increasing concentrations being metabolic mechanisms to combat the oxidative stress and inflammation associated with steatosis. In fact, the rats of group HA showed higher levels of oxidative stress than rats of group HL, which recovered their redox balance due to the protection of lycopene and showed lower urinary isoprostanes levels, a lower NAD/NADH ratio and increased amounts of the intermediates in the metabolism of methionine [21]. No significant differences were observed in the biomarkers of inflammation (TNFα and IL-6) among the healthy and ill rats [24].
A similar tendency was observed for other proteins related to CVD. Extracellular superoxide dismutase (Cu-Zn) catalyzes the conversion of oxidative molecules, such as nitric oxide and the superoxide anion, thereby preventing the endothelial damage and mitochondrial dysfunction which occur in the pathogenesis of CVD [52]. Other proteins, such as collagen α-1(I) (peptide 11153), collagen α-1(II) (peptide 4970) and the fibrinogen α chain precursor (peptide 7243), have been validated as biomarkers of CVD and diabetes [53,54]. In particular, these peptides were more abundant in group HA than in groups N and HL, which could be associated with a higher risk of CVD in animals with steatosis [8,9]. These results show again the beneficial effects of the consumption of tomato juice on the amelioration of steatosis, leading to a reduction of the CVD risk.
Concerning the L-lactate dehydrogenase B chain, this enzyme is present in the mitochondria, so damage to these organelles could decrease its excretion in urine. This could explain the low levels observed in group HA [55] and, likewise, it suggests a protective effect of tomato juice consumption-according to the intermediate situation of the values of group HL in comparison with group HA and the healthy group. Mitochondria are damaged by lipid peroxidation products, so the accumulation of lycopene in the liver could protect them, improving their functionality.
It is interesting to observe that the amount of classificatory peptides detected in Week 4 was very low, compared with the other weeks. This suggests that the main changes occurred at the beginning of the study, when most of the physiological disturbances caused by the consumption of a hypercholesterolemic and high-fat diet were triggered, and also that the supplementation of lycopene improved the metabolism. However, in the fourth week, the effect of the continuous delivery of fat in the diet could not be counteracted by the supplementation of tomato juice and the accumulation of lycopene, meaning that there were no significant changes in the urinary peptide profile.
Summing up, this research provides new information about the urinary peptides that could be associated with steatosis, describing a relationship between the main proteins and the clinical evolution of this illness. In addition, and taking into consideration the amelioration of the steatosis associated with the intake of tomato juice and the accumulation of lycopene in the liver, this effect is also reflected in the changes observed in the urinary peptide profile. For this reason, the discovery of biomarkers in urine samples, which are easy to collect and have great stability, should be explored further, for early diagnosis of NAFDL as well as to determine the pathological state. Therefore, further investigations in humans are necessary to be able to create a specific urinary biomarkers panel for this disease of rising importance. In this respect, CE-MS technology enables the reproducible analysis of low molecular weight proteome, whose data can be used for diagnosis, prognosis and assessment of therapy, due to its ability for the definition and validation of biomarker patterns for a clinical application [56][57][58][59]. On the other hand, the use of other techniques, like Western blot or Enzyme-Linked ImmunoSorbent Assay (ELISA), would be interesting for the confirmation of specific proteins as clinical biomarkers. In fact, a combination of both technologies, CE-MS and immunological tests, may be the best advance toward solving yet unmet clinical needs [57,60].

Tomato Juice
Commercial tomato juice was provided by a local juice producer. It was obtained from an industrial standard process and commercialized in glass bottles. The juice was analyzed to determine the total content of bioactive compounds, following the methods described previously [14,15]. The contents of these compounds were: total lycopene 108 mg/kg, total phenols 284 mg/kg, free flavonoids 36 mg/kg and total folates 340 µg/kg. In addition, this tomato juice contained 14.4 mg/L of vitamin C and had a calorific value of 260 kcal/L.

Animals and Experimental Design
Twenty-four male Sprague-Dawley rats (8 weeks old), weighing approximately 250 g, were obtained from the Animal Facility of the University of Murcia (Murcia, Spain). The sample size was calculated using the method based on the law of diminishing return, following the procedure described by Charan and Kantharia [61]; this gave a sample size that was more than adequate. The rats were randomly divided into two groups (n = 12) fed ad libitum with a standard diet (N) (Teklad global 14% protein rodent maintenance diet, Harlan Laboratories, Indianapolis, IN, USA) or a hypercholesterolemic and high-fat diet (H) (Atherogenic rodent diet TD-02028, Harlan Laboratories, Indianapolis, IN, USA) and water during a 2-week adaptation period. Afterwards, each group was randomly sub-divided into two other groups (n = 6) and these were placed individually in metabolic cages, yielding the following groups: standard diet and water (NA), standard diet and tomato juice (NL), hypercholesterolemic and high-fat diet and water (HA) and hypercholesterolemic and high-fat diet and tomato juice (HL). The rats were maintained under controlled conditions of temperature (22 • C) and air humidity (55%), with a 12-h light-dark cycle, during all the study. Urine samples were collected weekly for four weeks and collection was performed over a period of 24 h, giving a total of 96 samples. At the end of the study the rats were euthanized and blood and liver samples were collected. All samples were stored at −80 • C until the analytical procedures were carried out. The study was carried out at the experimental Animal

NAFLD Confirmation
In the rats fed with diet H (HA and HL groups), NAFLD was confirmed by analyzing the biochemical parameters (total cholesterol, HDL-cholesterol, LDL-cholesterol, total triglycerides) and hepatic enzymes (ALT and AST) and by histological examination of the liver using hematoxylin and eosin stain. All analyses were carried out in the Veterinary Hospital of the University of Murcia.

Sample Preparation
Rat urine was thawed immediately before use and a 0.7 mL aliquot of urine was diluted with 0.7 mL of 2 M urea and 10 mM NH 4 OH containing 0.02% sodium dodecyl sulfate (all from Sigma-Aldrich, Dorset, UK), as described by Albalat et al. [28]. Purified peptides were lyophilized and stored at 4 • C until analysis.

Protein Estimation
The protein concentration was quantified in urine samples using the bicinchoninic acid (BCA) assay Uptima, from Interchim (Montluçon, France). Freeze-dried aliquots were re-suspended in HPLC-grade water to reach a concentration of 2 µg/µL, shortly before CE-MS analyses as described by Mullen et al. [36].
The MS spectra were recorded over an m/z range of 350-3000 and accumulated every 3 s. The accuracy, precision, selectivity, sensitivity, reproducibility and stability of the CE-MS measurements are described in Theodorescu et al. [56].

CE-Data Processing
The MS ion peaks were processed using MosaiquesVisu software (Mosaiques Diagnostics, Hannover, Germany), which includes peak picking, deconvolution and deisotoping [62]. The CE migration time and peak intensity were subsequently normalized using internal polypeptide standards [56], to allow compilation and comparison of samples. The resulting peak list characterizes each polypeptide by its molecular mass (0.8-30 kDa), normalized CE migration time (min) and normalized signal intensity (ion counts). The normalized signal intensity was used as a measure of relative abundance. All the polypeptides detected were deposited, matched and annotated in a Microsoft SQL database [36], to allow the identification of proteins from the peptide sequences obtained from the urine samples. Polypeptides from different samples were considered identical if the mass deviation was lower than ±50 ppm and the migration time lower than 2 min.

Statistical Analyses for Biochemical Parameters
One-way ANOVA and a post hoc Tukey test were carried out to determine the differences among the four experimental groups regarding the different biochemical parameters. The data are expressed as the mean ± SE and the significance level was p < 0.05. The statistical analyses were performed with the IBM Statistical Package for the Social Sciences (SPSS), version 19.0 (IBM, New York, NY, USA).

Statistical Analysis for Biomarker Definition
A sparse partial least squares-discriminant analysis (sPLS-DA) was applied to explore the associations between peptides and treatments, using the mixOmics package [63,64] of R [65]. The sPLS-DA method combines the ability of PLS to extract latent variables from matrices with a very high number of variables and a low number of cases (typical of -omics) with the ability of DA to separate groups of different treatments.
Exploratory analyses showed that the peptide profiles of the rats of diet N were very similar, independently of the administration of tomato juice; therefore, we built three different types of model: (i) models comparing four groups (NA, NL, HA, HL) within a week; (ii) models comparing three groups (N, HA, HL) within a week; and (iii) a model comparing all groups and weeks simultaneously (in practice, a model of 16 groups, one per treatment and week).
The original database includes 888 peptides. From the point of view of biomarker identification, an analysis including all of these peptides is not useful; therefore, the objective of the sPLS-DA was to find a model that maximizes the correct classification (minimizes the error rate of the classification) of treatments based on their peptide profile while minimizing the number of peptides used. A basic model was defined by its number of components (latent variables) and the number of peptides in the model. The number of components ranged from 1 to n-1 groups to be classified, as n groups may be segregated by n-1 latent variables. The maximum number of peptides to be included in a model was arbitrarily set to 50, considering that a biomarker set >50 peptides is unnecessarily complicated for routine clinical application. Then, a total of 1750 models were tested, resulting from all the possible combinations of grouping, number of components and number of peptides (Table A1).
From this pool of 1750 models the best models for the identification of biomarkers were selected on the basis of the lowest classification error rate, estimated by cross-validation. Briefly, for each basic type of model the rats in the sample were divided into 10 sub-groups: nine of these were used to estimate the model and the rats in the excluded sub-group were classified into the treatments according to this model. Misclassification was the error rate. This was repeated 10 times per basic model (excluding one sub-group per step) in order to calculate a mean error rate per model. Models based on three groups within a week showed the lowest mean error rates (0.17 to 0.25, depending on the week), performing much better than models based on four groups within a week or the model of 16 groups with all treatments and weeks analyzed simultaneously. Then, we selected the five "best" models (lowest error rates) in each week to screen them for biomarkers. Peptides with a minimum load of ±0.15 in all five models were named as potential biomarkers. Load represents the correlation between the peptide and the component (latent variable), which optimizes the separation of treatments. Furthermore, in order to minimize the biomarker set and maximize its usefulness, we decided, as a threshold, that a peptide selected as a biomarker should be detected also in >30% of the samples in at least one group.