Proteomic and Metabolomic Analyses of the Blood Samples of Highly Trained Athletes

: High exercise loading causes intricate and ambiguous proteomic and metabolic changes. This study aims to describe the dataset on protein and metabolite contents in plasma samples collected from highly trained athletes across different sports disciplines. The proteomic and metabolomic analyses of the plasma samples of highly trained athletes engaged in sports disciplines of different intensities were carried out using HPLC-MS/MS. The results are reported as two datasets (proteomic data in a derived mgf-file and metabolomic data in processed format), each containing the findings obtained by analyzing 93 mass spectra. Variations in the protein and metabolite contents of the biological samples are observed, depending on the intensity of training load for different sports disciplines. Mass spectrometric proteomic and metabolomic studies can be used for classifying different athlete phenotypes according to the intensity of sports discipline and for the assessment of the efficiency of the recovery period.


Summary
The molecular profile of an athlete is the result of a comprehensive analysis of proteins, endogenous metabolites, and other biomolecules performed using omics technologies [1].Annotating the features of the molecular profile of biological samples allows one to detect new predictors and candidate biomarkers in sports medicine, including personally recommending changes in the athletes' training process [2].By analyzing the contents of proteins and low-molecular-weight components in biological samples, one can formulate a hypothesis about the effect of physical load on changes in the metabolic processes occurring in an athlete's body [3,4].Other fields where the results of omics research can be applied include assessing the acute effects of hydration, managing oxidative stress, inflammation, and immune responses [5], as well as identifying the long-or short-term effects of training load on the proteome and metabolome [6].
Physical activity is a potent stimulus modulating metabolism in the body.Research is currently ongoing into the features of the metabolomic and proteomic profiles of the biological samples of athletes.Regular prolonged physical activity alters the levels of Data 2024, 9, 15 2 of 10 metabolites, mostly those participating in carbohydrate, lipid, amino acid, and nucleoside metabolism in bodily fluids (plasma, urine, and capillary blood) (Figure 1) [7][8][9][10][11][12].

Data Description
Plasma samples were collected from trained athletes engaged in sports disciplines of different intensities and were treated to acquire proteomic and metabolomic data by using HPLC-MS/MS.The results are reported as two datasets, each containing the findings obtained by analyzing 93 mass spectra.Variations in the protein and metabolite contents of the biological samples are observed depending on the intensity of the training load for different sports disciplines (Table 1) [24].The first dataset contains the findings obtained by measuring and analyzing the protein compositions of the analyzed biological samples (Table 2).The comparison groups were formed in accordance with training load intensity for an athlete: "High", "Above Average", "Moderate", and "Low" groups.Data were combined into spectral libraries according to the isolated groups (Table 2, "Intensity").After a meticulous assessment of the measurement results, we selected 108 proteins shared by all the investigated groups and fractions of group-specific proteins.A total of 38 unique proteins were identified for the "High" group; 82 unique proteins were identified for the "Above Average" group; and 114 and 57 unique proteins were identified for the "Moderate" and "Low" groups, respectively (Figure 2).
Data were combined into spectral libraries according to the isolated groups (Table 2, "Intensity").After a meticulous assessment of the measurement results, we selected 108 proteins shared by all the investigated groups and fractions of group-specific proteins.A total of 38 unique proteins were identified for the "High" group; 82 unique proteins were identified for the "Above Average" group; and 114 and 57 unique proteins were identified for the "Moderate" and "Low" groups, respectively (Figure 2).Supplementary Materials Tables S1 and S2 show the data on the levels of differentially expressed serological proteins and the frequency of their occurrence in the biological samples of the comparison groups.
The second dataset contains the results of the quantitative measurements and analyses of the compositions of 40 endogenous metabolites (canonical and noncanonical amino acids and carboxylic acids) in the investigated biological samples organized into spectral libraries in accordance with the identified comparison groups.The reduced levels of ascorbic acid and pyruvate were observed in the "High" group compared to the other groups.Supplementary Materials Table S3 lists the data on the quantitative content of the analyzed metabolites in the comparison groups.
Figure 3 shows the contents of the endogenous metabolites in the comparison groups.Among the analyzed low-molecular-weight metabolites, increased contents of 4hydroxyproline (p = 0.0121), alanine (p = 0.0138), histidine (p = 0.0001), leucine (p = 0.0001), and phenylalanine (p = 0.0007) were observed in the biological samples of the "High" group compared to other groups (see Supplementary Materials Table S4).Supplementary Materials Tables S1 and S2 show the data on the levels of differentially expressed serological proteins and the frequency of their occurrence in the biological samples of the comparison groups.
The second dataset contains the results of the quantitative measurements and analyses of the compositions of 40 endogenous metabolites (canonical and noncanonical amino acids and carboxylic acids) in the investigated biological samples organized into spectral libraries in accordance with the identified comparison groups.The reduced levels of ascorbic acid and pyruvate were observed in the "High" group compared to the other groups.Supplementary Materials Table S3 lists the data on the quantitative content of the analyzed metabolites in the comparison groups.
Figure 3 shows the contents of the endogenous metabolites in the comparison groups.Among the analyzed low-molecular-weight metabolites, increased contents of 4-hydroxyproline (p = 0.0121), alanine (p = 0.0138), histidine (p = 0.0001), leucine (p = 0.0001), and phenylalanine (p = 0.0007) were observed in the biological samples of the "High" group compared to other groups (see Supplementary Materials Table S4).A sparse partial least-squares discriminant analysis (sPLS-DA) was conducted to determine the variations between the data and classify the samples into comparison groups.
The analysis demonstrated that the "High" group was segregated with respect to the "Low" group, with 0.95 confidence in the proteome (Figure 4A) and metabolome (Figure 4B), correspondingly.The discriminant analysis explained the variability of 7% and 7% for the total proteome and 45% and 12% for the summarized metabolome.A sparse partial least-squares discriminant analysis (sPLS-DA) was conducted to determine the variations between the data and classify the samples into comparison groups.
The analysis demonstrated that the "High" group was segregated with respect to the "Low" group, with 0.95 confidence in the proteome (Figure 4A) and metabolome (Figure 4B), correspondingly.The discriminant analysis explained the variability of 7% and 7% for the total proteome and 45% and 12% for the summarized metabolome.The scatter plots demonstrate the explanation for the between-group variation of PC1 = 7% and PC2 = 7% in the proteomic data (a) and PC1 = 45% and PC2 = 12% in the metabolome.The selected metabolite/proteins of the first component and their weighting factors are presented in Supplementary Materials Table S5.

Ethics Statement
The participants in the study were informed about the possible risks and discomforts that could have emerged during the investigation, and signed consent was obtained from each participant.A local board for Ethical Questions in the A.I. Burnazyan State Research Center of the Federal Medical-Biological Agency of the Russian Federation approved the protocol of the study (Protocol No. 40 dated 18 November 2020) in accordance with the WMA Declaration of Helsinki.

Subjects
In total, 93 athletes were enrolled to participate in the study.The applied exclusion criteria were the presence of cardiac, muscle, or kidney disease, ongoing use of antiinflammatory medications, antibiotics, or nicotine.The participants were selected according to the exclusion criteria after they completed a survey about their medical history and training experience.The sports disciplines included endurance (n = 28), speedstrength (n = 2), strength endurance (n = 12), and technical sports (n = 51).The basic anthropometric characteristics are shown in Table 3. S3).S5.

Ethics Statement
The participants in the study were informed about the possible risks and discomforts that could have emerged during the investigation, and signed consent was obtained from each participant.A local board for Ethical Questions in the A.I. Burnazyan State Research Center of the Federal Medical-Biological Agency of the Russian Federation approved the protocol of the study (Protocol No. 40 dated 18 November 2020) in accordance with the WMA Declaration of Helsinki.

Subjects
In total, 93 athletes were enrolled to participate in the study.The applied exclusion criteria were the presence of cardiac, muscle, or kidney disease, ongoing use of antiinflammatory medications, antibiotics, or nicotine.The participants were selected according to the exclusion criteria after they completed a survey about their medical history and training experience.The sports disciplines included endurance (n = 28), speed-strength (n = 2), strength endurance (n = 12), and technical sports (n = 51).The basic anthropometric characteristics are shown in Table 3.

Preanalytical Stage of Analysis
A chain of custody for the preparation of the plasma samples is described in detail in [25].

HPLC-MS/MS Analysis
The analysis was performed on a quadrupole time-of-flight mass spectrometer Xevo™ G2-XS Q-tof (Waters, Wilmslow, UK) equipped with an Acquity™ UPLC H Class Plus chromatography system (Waters, Wilmslow, UK).
Both peptides and metabolites were separated on an Acquity™ UPLC BEH C18 column (1.7 µm particle size, geometry 2.1 × 50 mm; Waters, UK) at a flow rate of 0.2-0.3mL/min for the proteomic analysis and a flow rate of 0.4 mL/min for the metabolomic analysis.
Peptide precursor ions were surveyed in the hybrid information-independent (DIA) MS E -SONAR mode, whereas the information-dependent (DDA) mode was utilized to survey the metabolite ions.
Proteomic data were treated using PLGS software (Protein Lynx Global Server, version 3.0.3,Waters, UK) using the UniProt KB database (version dated March 2021) with preset parameters for the SONAR/MS E scanning mode and a preset correction for the calibration mass.In the analysis of the proteomic data, we were guided by the requirements of the Human Proteome Organization-Proteomics Standards Initiative Quality Control Working Group [26].We utilized amino acid and keto acid reference standards (Sigma, catalog number A9906, Germany) as an external standard to evaluate the retention time matching, generated spectral library to match features in full-MS and tandem-MS scans, and on-line calibration using Warfarin lock-mass (m/z = 309) to control the mass tolerance of the instrumental analysis.
The details of the modes and conditions of the HPLC-MS/MS measurements are presented in [25].The measurement results (proteomic data in a derived mgf-file and metabolomic data in processed format) are available on Petrovskiy D. (2023).Proteomic and Metabolomic Analyses of Blood Samples of Highly Trained Athletes.Figshare.Dataset.https://doi.org/10.6084/m9.figshare.24541366.v6[27].

Data Analysis
The proteomic and metabolomic datasets were handled using the Wilcoxon test in the R statistical package (v4.1.2;R Core Team 2021).To demonstrate the variable selection and classification of the studied cohort, we chose to undertake a sparse partial least-squares discriminant analysis (sPLS-DA) with 0.95 ellipse confidence.The acceptance criteria for protein identification were based on the Human Proteome Project Mass Spectrometry Data Interpretation Guidelines 3.0 [28].A candidate feature in the proteomic data had to meet the unicity criterion, meaning that certain proteins had to be covered by more than one unique protein-specific peptide without interference from any other proteins.

User Comments
Mass spectrometric proteomic and metabolomic studies can be used for classifying different athlete phenotypes according to the intensity of sports discipline and for the assessment of the efficiency of the recovery period using machine learning algorithms [29,30].This phenotype classification can be performed in two variants.The first one is to employ non-correlated data (i.e., the raw data before identification) using the machine learning approaches of a 1D or 3D convolutional neural network [29,31].The processed data (i.e., the data after biomolecule identification) are used in the second variant.Machine learning approaches, such as logistic regression, support vector classifier, decision tree, multinomial naive Bayes, random forest, and multinomial regression, can be utilized in the second variant, depending on the study objectives [32][33][34][35].
We use the presented dataset in two ways: (1) for classifying athlete phenotypes depending on physical load intensity and (2) for identifying phenotype-specific signatures describing the observed metabolomic changes and the presence of post-translational modifying groups in proteins.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/data9010015/s1,Table S1: Differentially expressed proteins in comparison groups; Table S2: Frequency of occurrence of protein identifications for each comparison group; Table S3: Metadata for Study Participants; Table S4: Differences in metabolite content between comparison groups (p-values); Table S5: Metabolite/proteins on the first component along with their weighting factors.Informed Consent Statement: Informed consent was obtained from all subjects involved in the study.

Figure 1 . 1 .
Figure 1.The metabolic features of strength and endurance sports disciplines: (a) changes in m tabolites during peak training load, 2 h after training load, and after the training load is comple Figure 1.The metabolic features of strength and endurance sports disciplines: (a) changes in metabolites during peak training load, 2 h after training load, and after the training load is completed (study participants leading a sedentary lifestyle are used as the control); (b) metabolic indicators of strength and endurance training.

Figure 2 .
Figure 2. The UpSet plots of proteins shared among the studied groups.Set size denotes the number of proteins identified in a particular group.Intersection size denotes the number of proteins unique to a certain comparison group (a single point in the diagram) and shows the number of proteins shared by at least two comparison groups (dots connected with a line).

Figure 2 .
Figure 2. The UpSet plots of proteins shared among the studied groups.Set size denotes the number of proteins identified in a particular group.Intersection size denotes the number of proteins unique to a certain comparison group (a single point in the diagram) and shows the number of proteins shared by at least two comparison groups (dots connected with a line).

10 Figure 3 .
Figure 3.The levels of the low-molecular-weight metabolites in the comparison groups (OX, µ g/mL).

Figure 3 .
Figure 3.The levels of the low-molecular-weight metabolites in the comparison groups (OX, µg/mL).

Figure 4 .
Figure 4. Sparse partial least-squares discriminant analysis (sPLS-DA) with a 0.95 ellipse confidence level for the most significantly differed features in the proteome (a) and metabolome (b).The scatter plots demonstrate the explanation for the between-group variation of PC1 = 7% and PC2 = 7% in the proteomic data (a) and PC1 = 45% and PC2 = 12% in the metabolome.The selected metabolite/proteins of the first component and their weighting factors are presented in Supplementary Materials TableS5.

Figure 4 .
Figure 4. Sparse partial least-squares discriminant analysis (sPLS-DA) with a 0.95 ellipse confidence level for the most significantly differed features in the proteome (a) and metabolome (b).The scatter plots demonstrate the explanation for the between-group variation of PC1 = 7% and PC2 = 7% in the proteomic data (a) and PC1 = 45% and PC2 = 12% in the metabolome.The selected metabolite/proteins of the first component and their weighting factors are presented in Supplementary Materials TableS5.

Author
Contributions: Conceptualization, A.T.K., K.A.M., V.I.P. and A.L.K.; methodology, A.T.K., E.I.B. and K.A.Y.; software, A.A.S.; validation, L.I.K. and V.R.R.All authors have read and agreed to the published version of the manuscript.Funding: The work was conducted under the Russian Federation Fundamental Research Program for the long-term period of 2021-2030 (no.122092200056-9).Institutional Review Board Statement: The study was approved by the Board for Ethical Questions in A. I. Burnazyan State Research Center of the FMBA of Russia (Protocol No. 40 from 18 November 2020) according to the principles expressed in the Declaration of Helsinki.

Table 1 .
Sports disciplines and type and intensity of training load for the study participants.

Table 2 .
Protein compositions of the plasma samples.

Table 3 .
Anthropometric, clinical, and psychometric characteristics of the participants (see Supplementary Materials TableS3).Sample information: the unique identifier of the study participant; • Information about the anthropometric characteristics of the participant: the sex, age at the time of the examination, age at which their career began, and type of sport; • Information about the clinical characteristics of the participant: allergies, infectious diseases, and sports injuries; • Information about the training regime: main sports activities; the dynamics of sports results, number of training sessions per day during tapering and competition periods, number of rest days per week during tapering and competition periods, and training status self-assessment.