Does Proteomic Mirror Reflect Clinical Characteristics of Obesity?

Obesity is a frightening chronic disease, which has tripled since 1975. It is not expected to slow down staying one of the leading cases of preventable death and resulting in an increased clinical and economic burden. Poor lifestyle choices and excessive intake of “cheap calories” are major contributors to obesity, triggering type 2 diabetes, cardiovascular diseases, and other comorbidities. Understanding the molecular mechanisms responsible for development of obesity is essential as it might result in the introducing of anti-obesity targets and early-stage obesity biomarkers, allowing the distinction between metabolic syndromes. The complex nature of this disease, coupled with the phenomenon of metabolically healthy obesity, inspired us to perform data-centric, hypothesis-generating pilot research, aimed to find correlations between parameters of classic clinical blood tests and proteomic profiles of 104 lean and obese subjects. As the result, we assembled patterns of proteins, which presence or absence allows predicting the weight of the patient fairly well. We believe that such proteomic patterns with high prediction power should facilitate the translation of potential candidates into biomarkers of clinical use for early-stage stratification of obesity therapy.


Introduction
Obesity in most cases is blatantly visible by the unaided eye. Paradoxically, at the same time both clinicians and citizens tend to ignore this pathology, acquiring the scale of "globesity" [1,2]. Being a generally preventable disease, obesity, resulting from the excess of body fat, often entails the development of 50+ various pathologies, significant disability, and premature death [3].
The pathogenesis of obesity involves the interaction of genetic, environmental, and behavioral factors [4]. Each time, figuring out the characteristic features in the biomedical portrait of obesity, scientists are trying to resolve the nature vs nurture debate [5]. The multifactorial nature and high comorbidity of obesity make it difficult to understand the clear molecular nature of this disease. Moreover, about a third of obese patients are "metabolically healthy" with little or no evidence of metabolic syndrome. There are four

Sample Collection
One hundred and four human plasma samples were obtained from the patients of the Clinic of "Federal Research Centre of Nutrition, Biotechnology and Food Safety" (Moscow, Russia).
All study participants gave informed consent confirming their willingness to participate in the research. All procedures performed in studies involving human partici-pants were under the ethical standards of the institutional or national research committee and with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. The study was approved by the relevant ethical review committee of the Federal Research Centre of Nutrition, Biotechnology and Food Safety (protocol #4 from 15 June 2018).
The present study included 104 individuals in accordance with the inclusion and exclusion criteria. The inclusion criteria were the age of the study participants from 18 to 45 years, BMI from 18.5, absence of diagnosed somatic and mental disorders.
Individuals younger than 18 and older than 45 years were excluded from the study, as well as pregnant/breastfeeding patients, patients with mental disorders, identified cancer, cardiovascular and any gastrointestinal diseases, other somatic disorders, and recent (6 months) weight loss.
The patients were divided into groups according to their body mass indexes. BMI, calculated as the mass of the individual in kilograms divided by his/her height in meters squared, is one of the most popular metrics to characterize body condition [19].

Anthropometric Tests
The BMIs of the patients were evaluated according to the standard formula [19]. The weight distributions were measured using the bioelectrical impedance analysis method.

Biochemical Blood Test and Complete Blood Count
Serum levels of fasting plasma glucose, triglycerides, high-density lipoprotein, lowdensity lipoprotein, cholesterol, alanine aminotransferase, aspartate aminotransferase, γ-glutamyl transpeptidase, alkaline phosphatase, uric acid, urea, creatinine, albumin, bilirubin, etc. were determined according to standard protocols. Blood levels of hemoglobin, hematocrit, and blood cell indexes were established according to standard protocols [20].
Results of anthropometric and blood tests are provided in Supplementary Table S1.

The Depletion of Blood Plasma
The immunoaffinity depletion of the high abundance plasma proteins (albumin and IgG) was used to enhance the detection of lower abundance but more insightful proteins in further shotgun proteomic analysis. For plasma depletion, we used ProteoPrep Kit (Sigma-Aldrich, St. Louis, MO, USA). The depletion was carried out following the manufacturer's instructions [21].

Trypsinolysis of Depleted Plasma
The depleted blood plasma samples (175 µg of total protein) were in-solution digested in accordance with a standard protocol [22]. In brief, proteins were denatured and reduced with a solution containing sodium deoxycholate, tris-2-carboxyethyl-phosphine hydrochloride, and 1,4-dithiothreitol, and further alkylated with vinylpyridine. Trypsin was added to the sample (trypsin/total protein = 1/100) and then incubated within 2 h at a temperature of 44 • C. After 2 h, an aliquot of trypsin was added and then incubated for 2 h at 37 • C. Trypsinolysis was quenched by adding formic acid to each sample to a final concentration of 5%, then a mixture of peptides was centrifuged at 10,000 rpm within 15 min. The supernatant was collected and subjected to further chromatography-mass spectrometric analysis.

HPLC-MS/MS Analysis
Separation and identification of the peptides were performed on an Ultimate 3000 nano-flow HPLC (Thermo Fisher Scientific, Cleveland, OH, USA) connected to Orbitrap Exactive (Thermo Fisher Scientific, Cleveland, OH, USA) mass spectrometer equipped with a Nanospray Flex NG ion source (Thermo Fisher Scientific, Cleveland, OH, USA). Peptide separation was carried out on an RP-HPLC column Zorbax 300SB-C18 (C18 particle size of 3.5 µm, inner diameter of 75 µm and length of 150 mm, Acclaim ® PepMap™ RSLC, Thermo Fisher Scientific, Cleveland, OH, USA) using a linear 90-min gradient from 95% solvent A (0.1% formic acid) and 5% solvent B (0.1% formic acid, 80% acetonitrile) to 60% solvent B over 95 min at a flow rate of 0.3 µL/min.
Mass spectra were registered in the positive ion mode. Data was acquired in the Orbitrap Exactive analyzer with a resolution of 70,000 (at m/z 400) for MS and 15,000 (m/z 400) for MS/MS scans. For peptide fragmentation higher energy collisional dissociation (HCD) was used, the signal threshold was set to 17,500 for an isolation window of 1 m/z and the first mass of HCD spectra was set to 100 m/z. The collision energy was set to 35%. Fragmented precursors were dynamically excluded from targeting for 10 s. Singly charged ions and ions with not defined charge states were excluded from triggering MS/MS scans. Three LC-MS/MS repetitions were performed for each plasma sample.

Interpretation of Experimental Data
Raw files were converted into .mgf files by MSConvert (v. 3.0). Each of the 312 mgf files containing the feature list for protein identification was processed by SearchGUI software (v. 4.0.4 [23]) using three search engines (X!Tandem, MS-GF+, OMMSA) against SwissProt library of human canonical and alternatively spliced protein sequences in automatic mode [24]. Trypsin was specified as the proteolytic enzyme; maximum of 2 missing cleavages were allowed. Pyridylethylation (C) was used as a constant modification, and oxidation of methionine was set as a variable one. Charge states of +2, +3, and +4 were selected as parent ions. Mass tolerance was set to ±15 ppm for precursor ions and ±0.01 Da for fragment ions. The cut-off of false discovery rates for peptide-spectra matches, peptides, and proteins was ≤1%. Results were visualized in PeptideShaker (v. 2.0.5 [25]). The MS data were deposited to the ProteomeXchange Consortium via the PRIDE partner repository [26] with the dataset identifier PXD023526.

Statistical Analysis
Clustering analysis of clinical and anthropometric tests was performed using Ward's method and Euclidean distance for normalized data. Clustering patterns of protein presence/absence were done using Ward's method and Jaccard distance metric. All statistical analyses and graphics were performed using R version 4.0 [27].
Each protein of interest was annotated with its GO-terms from UniProt using ViSEAGO package [28]. We used the "2020-03" GO release and "2020_01" UniProt release.
When comparing the results of proteomic and clinical analysis, we explored publicly available data on the relationship between proteins and parameters of clinical analysis. The automatic analysis of the texts of scientific publications was performed by Scanbious platform [29,30], which visualizes semantic networks between objects of various types (names of proteins, pathological processes, etc.).
We predicted the BMI of the patient based on the pattern of presence/absence of certain proteins in his/her blood plasma using the Least Absolute Shrinkage and Selection Operator (LASSO) regression implemented in glmnet package [31]. We performed 10 iterations, each time randomly selecting 90% of the samples. For each iteration we needed to select the optimum value of LASSO tuning parameter lambda, which penalizes the sum of the absolute values of the coefficient. Optimum value of lambda was also selected by performing cross-validation (10 runs of 10-fold cross-validation cycle). The lambda with the minimum average error was selected as a lambda for the current iteration. Final model included only proteins, which were selected at every iteration (10 out of 10 times). Model performance was estimated as the median absolute error which was defined as the median of absolute differences between the true BMI and the predicted BMI.

Clinical Component
Much attention has been riveted on the phenomenon of metabolically healthy obesity (MHO), characterized by the absence of the metabolic abnormalities that traditionally accompany excess adiposity [32]. Thus, a substantial proportion of the obese subjects does not seem to be at an (at least temporarily, [33]) increased risk of mortality and metabolic complications of obesity. MHO is characterized by the absence of dyslipidemia, hypertension, insulin resistance, and chronic inflammation.
Moreover, lean subjects may possess abnormal metabolic parameters (exhibiting metabolically unhealthy non-obesity, MUNO) [34]. A gradient of metabolically healthy and unhealthy obese and lean phenotypes makes the revealing of abnormalities as well as relevant prevention of risks more difficult even for non-obese individuals.
To elucidate whether there is a bias to any of the selected extremes (four combinations of metabolic status and BMI) in our sample collection, we selected the monitored parameters of blood and anthropometric tests (Supplementary Table S1), which significantly differed between groups under study (NORM, OW, OB1, OB2, OB3). For these differed parameters, we performed a principal components analysis (Supplementary File S1) and hierarchical cluster analysis ( Figure 1) using Ward's minimum variance. The results of cluster analysis were evaluated with the Adjusted Rand Index (ARI), which reflects an agreement between two partitions: one given by the clustering process and the other defined by external criteria.
In our case, ARI was equal to 0.051, which indicates a low similarity between resulting and expected clustering. According to the obtained result, it is not possible to explicitly define the boundaries between groups of subjects with different weight conditions under study.
The impossibility to unambiguously divide patients according to their weight conditions based only on the results of clinical tests once again emphasizes the controversial and considerably challenging nature of obesity and indicates the need for orthogonal data.
In our opinion, the most promising for solving this problem will be the transition to the proteome level and multiplex assessment of the patient's proteome landscape.
parameters of blood and anthropometric tests (Supplementary Table S1), which significantly differed between groups under study (NORM, OW, OB1, OB2, OB3). For these differed parameters, we performed a principal components analysis (Supplementary File S1) and hierarchical cluster analysis (Figure 1) using Ward's minimum variance. The results of cluster analysis were evaluated with the Adjusted Rand Index (ARI), which reflects an agreement between two partitions: one given by the clustering process and the other defined by external criteria.
. In our case, ARI was equal to 0.051, which indicates a low similarity between resulting and expected clustering. According to the obtained result, it is not possible to explicitly define the boundaries between groups of subjects with different weight conditions under study.
The impossibility to unambiguously divide patients according to their weight conditions based only on the results of clinical tests once again emphasizes the controversial and considerably challenging nature of obesity and indicates the need for orthogonal data.
In our opinion, the most promising for solving this problem will be the transition to the proteome level and multiplex assessment of the patient's proteome landscape.

Proteomic Component
In total, 154 proteins were reliably identified in the entire collection of plasma samples. These proteins are predominantly associated with peptidase activity, receptor binding, and lipid transporter activity. Most of the proteins are localized in blood microparticles or plasma lipoprotein particles and vesicles, and therefore we expect stable detection of them under various mass spectrometric protocols. Of those, 36 proteins were consistently found in all plasma samples under study. A total of 138 proteins were identified in the NORM group of lean subjects. A total of 148 proteins were identified in the integrated group of overweight (OW) and obese (OB1, OB2, OB3) samples.
Next, we performed a principal components analysis (Supplementary File S1) and studied possible relationships between the pattern of presence/absence of proteins in blood plasma and the patient's BMI using cluster analysis.

Proteomic Component
In total, 154 proteins were reliably identified in the entire collection of plasma samples. These proteins are predominantly associated with peptidase activity, receptor binding, and lipid transporter activity. Most of the proteins are localized in blood microparticles or plasma lipoprotein particles and vesicles, and therefore we expect stable detection of them under various mass spectrometric protocols. Of those, 36 proteins were consistently found in all plasma samples under study. A total of 138 proteins were identified in the NORM group of lean subjects. A total of 148 proteins were identified in the integrated group of overweight (OW) and obese (OB1, OB2, OB3) samples.
Next, we performed a principal components analysis (Supplementary File S1) and studied possible relationships between the pattern of presence/absence of proteins in blood plasma and the patient's BMI using cluster analysis.
Preliminarily, unrepresentative proteins (identified in a single sample in the collection) and non-specific proteins (identified in all samples) were excluded from the calculations. The data matrix consisted of 104 rows (samples) and 101 columns (proteins).
The pattern of 15 proteins (namely, P07225, P00748, P07357, P07358, P09871, P01591, P01861, O43866, P00736, P02654, P13671, P25311, P01619, P01859, and P29622) allows to distinguish (Figure 2a) a group of 14 samples with increased BMI (mean 39 vs 33, p-value = 0.002). Moreover, 11 of them belong to the OB2 and OB3 groups, and three samples were obtained from overweight individuals. It should be noted that these three patients from OW were diagnosed with blood lipids disorder (there are seven samples with such a diagnosis in the whole OW group). Half of the samples from the OB2 and OB3 groups were also characterized by this diagnosis. As part of a pattern of 15 proteins for 5 (P07225 [35], O43866 [36,37], P02654 [38], P25311 [39][40][41], P29622 [42]) the association with the obesity was shown. It is noteworthy that five of these 15 proteins are complement components, included in two complexes: P07358, P07357, and P13671 organize membrane attack complex (MAC), that plays a key role in the innate and adaptive immune response, and P09871 and P00736 combine with serine protease to form the first component of the classical and less variable pathway of the complement system, also associated with obesity [43,44]. The collection of the samples under study contains blood plasma from overweight patients with an increased body mass index (OW), but not exceeding the threshold values required for the diagnosis of obesity, which could affect the results of cluster analysis. In this respect, we removed from consideration 21 samples from the borderline OW group, as a result, the total number of identified proteins remained practically unchanged, as well as the set of proteins common for the two-NORM and OB-groups. The updated matrix consisted of 83 rows (samples) and 98 columns (proteins) plotted with the same parameters. Clustering indices improved slightly, so for the group with high BMI its mean value was 42, and for the rest-32 (p-value = 0.002, Figure 2b).
The composition of the cluster with high BMI practically did not change-three samples from the OW group left, and one image from the OB2 group was added. Accordingly, the pattern of specific proteins did not change significantly, it included 13 proteins, where 12, except two immunoglobulins (P01619 and P01859) and serpin (P29622), coincide with the results of the pattern of proteins according to the all-samples clustering. New in the resulting pattern is the component of the above MAC complex-P07360.

BMI Prediction
To assess the contribution of proteins to obesity, an attempt to predict the BMI of the sample based on proteomic data was performed. For this, using the LASSO method, we built a regression model predicting the BMI of a sample according to the pattern of presence/absence of proteins in blood plasma. The model based on all data consisted of five proteins (P08185, P0DJI8, P10643, P25311, and P35858), and the median absolute error (MAE) was 5.1 kg/m 2 (Figure 3a). At the same time, the model obtained on the basis of processing data excluding samples from the OW group showed a higher accuracy, MAE = 3.2 kg/m 2 (Figure 3b), and the number of proteins required to build the model was 18.  It is noteworthy that the pattern of 18 proteins includes five proteins from the model with lower predictive power, as well as three previously considered proteins of the complement system (P00736, P07358, P07360) included in the MAC complex, which is indirectly associated with obesity [45].
Text-mining [29,30] performed for these proteins and their relations with pathological processes showed that 15 out of 18 proteins (Table 2) are associated to varying degrees with metabolic disorders, including obesity. For example, according to our model, the absence of sex hormone-binding globulin (P04278) correlates with increased BMI, which is confirmed by studies on its expression, where it was shown that inhibition of the corresponding gene leads to the development of obesity [46]. Table 2. Proteins included into predicting pattern and their association with obesity. It is noteworthy that the pattern of 18 proteins includes five proteins from the model with lower predictive power, as well as three previously considered proteins of the complement system (P00736, P07358, P07360) included in the MAC complex, which is indirectly associated with obesity [45].
Text-mining [29,30] performed for these proteins and their relations with pathological processes showed that 15 out of 18 proteins (Table 2) are associated to varying degrees with metabolic disorders, including obesity. For example, according to our model, the absence of sex hormone-binding globulin (P04278) correlates with increased BMI, which is confirmed by studies on its expression, where it was shown that inhibition of the corresponding gene leads to the development of obesity [46]. An increased risk for obesity/overweight due to genotypes of CNDP1 was observed only in the group with a low carotene/carbohydrate intake ratio. In the high carotene/carbohydrate intake group, the genotype of CNDP1 was no risk factor for obesity/overweight Summarizing the above said, we can conclude that among the reliably and reliably detected proteins [76] in blood plasma, there is a pattern that has predictive power in the issue of obesity. The minimum pattern size is five proteins. Expanding the panel increases the level of BMI prediction accuracy, which can be critical in examining borderline states in metabolically healthy obese and unhealthy lean, and also provide researchers with additional information about body composition status even when exploring protein profiles from the patients with non-obesity disorders.
We would like to stress that our intention was not to build the perfect BMI prediction model (the dataset is quite limited for this task) but rather to point to some plasma proteins likely associated with obesity when analyzed together. We suppose that the further studies needed to elaborate on this issue will also allow detection of the transition from a "metabolically healthy" phenotype of the patient with a high BMI to an "unhealthy" one.

Conclusions
According to the authors' knowledge, no approved omics pattern has been developed to distinguish individuals at increased risk of obesity and its comorbidities. In the present study, we analyzed clinical and anthropometrical parameters of 104 subjects with different weight conditions. Each individual was also characterized by the profile of core proteins circulating through his/her blood plasma.
Our main conclusions were two-fold: 1.
We demonstrated the impossibility to divide patients according to their weight conditions based only on the results of standard blood tests. Orthogonal, in our caseproteomic, data upgrades the level of understanding of the controversial nature of obesity.

2.
Our overall results indicate that studies of proteins circulating in blood have the prediction power of the weight status of the patient under study. We composed two proteomic patterns (including 5 and 18 proteins, respectively), which provide additional information about the patient's phenotype for more personalized treatment.
We strongly believe that such proteomic patterns have great potential as warning labels, signaling about obesity-associated alterations, and, thus, improving early-stage therapy of both metabolically unhealthy obese and lean individuals.
Supplementary Materials: The following are available online at https://www.mdpi.com/2075-442 6/11/2/64/s1, Table S1: Anthropometrical and clinical characteristic of individuals under study; File S1: Principal Components Analysis of clinical parameters and proteins identified in the samples of blood plasma.  Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Data Availability Statement:
The data presented in this study are available in PRIDE repository (dataset identifier PXD023526); Supplementary Materials are available via link https://zenodo.org/ record/4432333#.YAbOw1X7QuU.