Profiling of Seed Proteome in Pea (Pisum sativum L.) Lines Characterized with High and Low Responsivity to Combined Inoculation with Nodule Bacteria and Arbuscular Mycorrhizal Fungi

Legume crops represent the major source of food protein and contribute to human nutrition and animal feeding. An essential improvement of their productivity can be achieved by symbiosis with beneficial soil microorganisms—rhizobia (Rh) and arbuscular mycorrhizal (AM) fungi. The efficiency of these interactions depends on plant genotype. Recently, we have shown that, after simultaneous inoculation with Rh and AM, the productivity gain of pea (Pisum sativum L) line K-8274, characterized by high efficiency of interaction with soil microorganisms (EIBSM), was higher in comparison to a low-EIBSM line K-3358. However, the molecular mechanisms behind this effect are still uncharacterized. Therefore, here, we address the alterations in pea seed proteome, underlying the symbiosis-related productivity gain, and identify 111 differentially expressed proteins in the two lines. The high-EIBSM line K-8274 responded to inoculation by prolongation of seed maturation, manifested by up-regulation of proteins involved in cellular respiration, protein biosynthesis, and down-regulation of late-embryogenesis abundant (LEA) proteins. In contrast, the low-EIBSM line K-3358 demonstrated lower levels of the proteins, related to cell metabolism. Thus, we propose that the EIBSM trait is linked to prolongation of seed filling that needs to be taken into account in pulse crop breeding programs. The raw data have been deposited to the ProteomeXchange with identifier PXD013479.

proteome underlying high responsivity to combined inoculation with rhizobia and AM-fungi. For this, we reproduced the combined inoculation setup in a pot experiment with the high-EIBSM K-8274 plants in parallel to the line K-3358, characterized with low EIBSM [29]. Although the low EIBSM line K-3358 was characterized with a 23% and 25% higher degree (in comparison to the line K-8274) of shoot and seed biomass accumulation, respectively, its inoculation with the combination of two BSM did not give additional gain in productivity (Supplementary Material 2).

Identification of Seed Proteins
Selection of an appropriate sequence database is the pre-requisite for successful identification of proteolytic peptides in enzymatic digests and, hence, reliable annotation of seed proteins. In this context, the use of reviewed databases, containing entries confirmed at the level of transcriptome or proteome, is advantageous. However, such information is not readily available for pea. Therefore, here we decided on a non-redundant combined database relying on several legume proteomes, closely related to pea-Medicago truncatula Gaertn, Lotus japonicas (Regel) K. Larsen, and Phaseolus vulgaris L. Earlier, to confirm the applicability of this database, we manually evaluated the MS/MS spectra of confidently identified peptides with the lowest values of the SEQUEST function XCorr [34]. As the spectra were acquired with the mass accuracy within 5 ppm, peptide sequences could be unambiguously assigned by characteristic patterns of N-and C-terminal ion series ( b and y ions, respectively) [31].
Analysis of the seed proteome of both lines resulted in confident identification of 3963 peptides in total (3557 and 3726 in the seeds of K-8274 and K-3358, respectively, Figure 2A, Supplementary Material 3). Based on this information, 5832 proteins were annotated (5195 in the seeds of K-8274 and 5593 in K-3358, Figure 2B), which represented 1500 non-redundant proteins (i.e., protein groups-1346 and 1425 in the seeds of K-8274 and K-3358, respectively, Figure 2C). For the line K-8274, 84 non-redundant proteins could be annotated only in the absence of BSM, whereas 103 features were found specifically in the seeds of inoculated plants. For K-3358, these values were 114 and 69, respectively ( Figure 2C). The numbers of non-redundant proteins, not dependent on inoculation, were 1159 for the line K-8274 and 1242 for the line K-3358, with 1101 being, overall, common for both lines. Interestingly, only 12

Label-Free Quantification
Analysis of the whole dataset with the Progenesis QIP software revealed 79 differentially expressed proteins (ANOVA, p ≤ 0.05). Additionally, a further 32 proteins were identified with the original redundant database, containing non-reviewed entries. The correctness of these identifications was confirmed by manual interpretation of the corresponding MS/MS spectra ( Figure  S1-2). Thus, 111 proteins were differentially expressed (as could be proved by verification of peak integration, Figure S1-5), in other words, demonstrated at least 1.5-fold significant abundance differences in intra-and inter-line comparisons (Table 1, Supplementary Material 4). One of the raw files (corresponding to one of the triplicates of not inoculated group of line K-8274) could not be satisfactory aligned to the whole dataset, and was therefore excluded from quantitative analysis.
Among regulated seed proteins, 84 were differentially expressed between the lines in the inoculated (BSM) and 99 in the not inoculated (NI) group. Remarkably, 36 and 61 proteins were more abundant in the BSM and NI groups of K-3358 plants, respectively, in comparison to the same groups of the K-8274 line. In contrast, the abundance of 48 and 38 proteins in BSM and NI groups of K-3358 plants was lower in comparison to K-8274 plants. Totally, 60 proteins in the seeds of K-8274 demonstrated inoculation-related changes in expression profiles (50 and 10 polypeptides were upand down-regulated upon combined inoculation with BSM, respectively). For the line K-3358, these values were 31 and 29, respectively.
Principle component analysis (PCA) revealed clear differences between two lines, which could be distinguished by the first component (67.2% of difference, Figures 3A,B and S1-3A,B corresponding loading plots are given on Figure S1-4). For each line, the differences between inoculated and not inoculated plants were much less pronounced, although clearly observable (3.3% and 7.6% differences in the components 2 and 3, respectively, Figure 3A,B, respectively). At the next step, hierarchical clustering was applied to classify individual differentially regulated proteins according to their intra-and inter-line differences in expression profiles. Based on the heat map, built

Label-Free Quantification
Analysis of the whole dataset with the Progenesis QIP software revealed 79 differentially expressed proteins (ANOVA, p ≤ 0.05). Additionally, a further 32 proteins were identified with the original redundant database, containing non-reviewed entries. The correctness of these identifications was confirmed by manual interpretation of the corresponding MS/MS spectra ( Figure S1-2). Thus, 111 proteins were differentially expressed (as could be proved by verification of peak integration, Figure S1-5), in other words, demonstrated at least 1.5-fold significant abundance differences in intraand inter-line comparisons (Table 1, Supplementary Material 4). One of the raw files (corresponding to one of the triplicates of not inoculated group of line K-8274) could not be satisfactory aligned to the whole dataset, and was therefore excluded from quantitative analysis.
Among regulated seed proteins, 84 were differentially expressed between the lines in the inoculated (BSM) and 99 in the not inoculated (NI) group. Remarkably, 36 and 61 proteins were more abundant in the BSM and NI groups of K-3358 plants, respectively, in comparison to the same groups of the K-8274 line. In contrast, the abundance of 48 and 38 proteins in BSM and NI groups of K-3358 plants was lower in comparison to K-8274 plants. Totally, 60 proteins in the seeds of K-8274 demonstrated inoculation-related changes in expression profiles (50 and 10 polypeptides were up-and down-regulated upon combined inoculation with BSM, respectively). For the line K-3358, these values were 31 and 29, respectively.
Principle component analysis (PCA) revealed clear differences between two lines, which could be distinguished by the first component (67.2% of difference, Figure 3A,B and Figure S1-3A,B corresponding loading plots are given on Figure S1-4). For each line, the differences between inoculated and not inoculated plants were much less pronounced, although clearly observable (3.3% and 7.6% differences in the components 2 and 3, respectively, Figure 3A,B, respectively). At the next step, hierarchical clustering was applied to classify individual differentially regulated proteins according to their intra-and inter-line differences in expression profiles. Based on the heat map, built for average values of each group ( Figure 3C), all differentially expressed non-redundant proteins could be assigned to one of 17 individual groups, organized by similarity of expression profiles (Supplementary Material 4). The original results of data clustering in Perseus are given on Figure S1-3C. Finally, depending on the direction of protein expression changes, these groups (further referred to as sub-clusters) were organized in ten principle clusters. Thus, response of individual proteins to inoculation with combined BSM, could be expressed as "up-regulated", "down-regulated", and "not responsive" or "steady" relative to corresponding NI controls. Thus, combination of these regulation states in two lines yielded nine principle clusters (i.e., clusters 1-3, 4-6, and 7-9) comprising the proteins up-regulated, down-regulated, and not responsive in line K-8274, respectively, with different regulation status of the line K-3358 (Table 1). The last cluster (#10) comprised the proteins, identified only in one of the lines (Table 1).
The first principle cluster represented non-redundant proteins, abundance of which increased in the seeds of both lines in response to combined inoculation with BSM. Similarly, the fifth cluster represented the proteins with decreased abundances (i.e., down regulated) in both lines in response to the inoculation, whereas the non-responsive proteins built the ninth cluster. The proteins comprising the second cluster were up-regulated in the line K-8274, but down-regulated within line K-3358. This principle cluster consisted of two sub-clusters: The proteins, demonstrating the lowest abundance in (i) NI group of line K-8274, and (ii) in the BSM group of line K-3358. The proteins of the forth cluster demonstrated inverse response to inoculation. This principle cluster also consisted of two sub-clusters: Demonstrating the lowest abundance (i) in BSM group of line K-8274 and (ii) in the NI group of line K-3358. The next group of principle clusters was represented by the seed proteins, regulated by inoculation with BSM only in one of the lines. Thus, the proteins of the clusters 3 and 6 (both including two sub-clusters) were up-and down-regulated in the seeds of K-8274, respectively, but demonstrated a "steady" behavior in the line K-3358. Analogously, the proteins of the clusters 7 and 8 (represented by one and three sub-clusters, respectively) were up-and down-regulated in response to inoculation with BSM in the seeds of line K-3358, respectively, with no abundance changes in the seeds of the line K-8274. Finally, β-hexosaminidase was found only in the seeds of line K-3358, and probable S.7-like l-type lectin-domain containing receptor kinase was identified in the line K-8274. These two proteins comprised the last, tenth cluster.     Plants were grown under non-controlled light and temperature conditions in a greenhouse, as described in Materials and Methods section. The plants were harvested at the stage of mature seeds (three months after planting). The total seed protein fraction was isolated by phenol extraction, the proteins were digested by trypsin and resulted digests were analyzed by nanoHPLC-Q-Orbitrap-LIT-MS. Abbreviations: Nr.-number of protein; UDP-uridine diphosphate; LRR-leucine-rich repeat; GDP-guanidine diphosphate; NB-ARC-nucleotide-binding adaptor shared by APAF-1, R proteins, and CED-4; ADP -adenosine diphosphate. a Initial grouping of proteins by expression profiles relied on hierarchical clustering (using Spearman correlation as a distance measure) with subsequent manual correction of individual protein plots in Perseus software (if necessary); individual expression profiles were defined based on the direction of changes in protein abundance in response to inoculation with BSM; b the descriptions for individual proteins were taken from headers of corresponding fasta files; c for the proteins, annotated as "Uncharacterized" or "Putative uncharacterized", additional information from UniprotKB was collected; d functional annotation relied on the Mercator software; e binary logarithm of fold changes (log 2 FCs) within the lines K-8274 and K-3358 is calculated for the abundance ratios BSM K-8274 /NI K-8274 and BSM K-3385 /NI K-3385 , whereas the comparisons of the lines relied on the ratios BSM K-3385 /BSM K-8274 and NI K-3385 /NI K-8274 ; f p values were obtained by one-way ANOVA using Progenesis QI software; g q values were obtained with Progenesis QI software; h the tenth profile corresponds to proteins which were not found in one of the lines: A0A072TYG8 was identified and quantified only in line 3358 and Lj0g3v0098069.1 only in line 8274; + indicates the proteins identified in the search against a redundant sequence database and manually checked for quality of identification. NS-"Non-significant" denotes fold changes <1.5 in absolute scale or <0.6 and >−0.6 in log 2 scale.

Figure 3
Post-processing of the label-free quantification data, acquired in nanoHPLC-ESI-Q-Orbitrap-MS/data-dependent acquisition experiments, performed with seed protein tryptic digests of pea (P. sativum L) plants, lines K-8274 (high EIBSM, A) and K-3358 (low EIBSM, B), grown with (BSM, beneficial soil microorganisms) and without (NI, not inoculated) simultaneous colonization of pea roots with rhizobia and arbuscular mycorrhizae (AM). The K-8274 (orange) and K-3358 (blue) pea lines could be separated by the first component (A,B), whereas BSM (squares) and NI (circles) were separated by the second (A) and third (B) components. Hierarchical clustering was done for average group values, calculated by three biological replicates (C). Post-processing relied on Perseus software (n = 3). For the original Perseus export data (i.e., prior manual verification of clusters) see Figure S1-3

Functional Annotation of Differentially Regulated Proteins
Functional annotation of differentially expressed proteins relied on the Mercator tool, and revealed clear inter-line differences in functional profiles of regulated proteome (Supplementary Material 5). Forty of the 60 proteins, changing their abundance in the seeds of the high-EIBSM line K-8274 upon combined inoculation, were successfully assigned to specific functional bins. Totally, 50 of the 60 proteins (including 34 assigned to functional bins) were up-regulated ( Figure 4A). Protein biosynthesis represented the most strongly affected function-only one polypeptide, namely 60S ribosomal protein L26-1, was down-regulated. Such processes as RNA biosynthesis, RNA processing, protein modification, and degradation were affected as well. Accordingly, symbiosis-related up-regulation of energy metabolism was observed: Three enzymes of cellular respiration and two enzymes involved in photosynthesis increased their abundance in response to colonization of K-8274 roots. Another strongly up-regulated function was chromatin organization-three different types of core histones increased their abundance, which was in line with the overall up-regulation of RNA and protein biosynthesis.

Functional Annotation of Differentially Regulated Proteins
Functional annotation of differentially expressed proteins relied on the Mercator tool, and revealed clear inter-line differences in functional profiles of regulated proteome (Supplementary Material 5). Forty of the 60 proteins, changing their abundance in the seeds of the high-EIBSM line K-8274 upon combined inoculation, were successfully assigned to specific functional bins. Totally, 50 of the 60 proteins (including 34 assigned to functional bins) were up-regulated ( Figure 4A). Protein biosynthesis represented the most strongly affected function-only one polypeptide, namely 60S ribosomal protein L26-1, was down-regulated. Such processes as RNA biosynthesis, RNA processing, protein modification, and degradation were affected as well. Accordingly, symbiosis-related up-regulation of energy metabolism was observed: Three enzymes of cellular respiration and two enzymes involved in photosynthesis increased their abundance in response to colonization of K-8274 roots. Another strongly up-regulated function was chromatin organization-three different types of core histones increased their abundance, which was in line with the overall up-regulation of RNA and protein biosynthesis. In agreement with the fact that approximately half of differentially expressed proteins were up-regulated in K-3358 seeds in response to interaction with BSM (31 of 60), the number of annotated up-regulated proteins was 17 out of overall 36 successfully annotated species ( Figure 4B). Remarkably, in the seeds of the K-3358 plants, the proteins, involved in protein biosynthesis, showed more prominent difference in expression profiles: Besides the seven up-regulated polypeptides, four species, namely UTP-glucose-1-phosphate uridylyltransferase, 60S ribosomal protein L3B, and two poorly characterized probable structural constituents of ribosome, decreased their abundance after inoculation with BSM. The large number of polypeptides involved in protein biosynthesis, including ribosomal proteins and the EF2 elongation factor, were up-regulated in both lines upon inoculation with BSM. Interestingly, the exact set of ribosomal proteins, up-regulated in presence of symbiosis with BSM, was different in two lines, possibly reflecting the difference in response of the microorganisms to the symbiosis. The same was the case only for the proteins involved in cellular respiration (e.g., ATP synthase subunit alpha and probable triosephosphate isomerase), which were clearly up-regulated. The triosephosphate isomerase was shown to be required for post-germinative transition to autotrophic growth in seeds [36]. The significant increase of ATP-synthase abundance points at the significant increase in metabolism of the seeds of the line K-8274 upon symbiosis.
Remarkably, in the K-3358 line, several functional protein groups were exclusively down-regulated upon inoculation with BSM. This can be exemplified by the proteins involved in redox homeostasis (catalase, probable peroxiredoxin (UniProt ID B7FH22), and superoxide dismutase), protein degradation and modification (proteasome alpha subunit, serine carboxypeptidase-like protein, and glutathione S-transferase), and vesicle tracking (clathrin heavy chain and GDP-dissociation inhibitor). All these observations indicate a decrease in metabolic activity in the seeds of the low-EIBSM line in comparison to those of the high-EIBSM one. In agreement with the fact that approximately half of differentially expressed proteins were up-regulated in K-3358 seeds in response to interaction with BSM (31 of 60), the number of annotated up-regulated proteins was 17 out of overall 36 successfully annotated species ( Figure 4B). Remarkably, in the seeds of the K-3358 plants, the proteins, involved in protein biosynthesis, showed more prominent difference in expression profiles: Besides the seven up-regulated polypeptides, four species, namely UTP-glucose-1-phosphate uridylyltransferase, 60S ribosomal protein L3B, and two poorly characterized probable structural constituents of ribosome, decreased their abundance after inoculation with BSM. The large number of polypeptides involved in protein biosynthesis, including ribosomal proteins and the EF2 elongation factor, were up-regulated in both lines upon inoculation with BSM. Interestingly, the exact set of ribosomal proteins, up-regulated in presence of symbiosis with BSM, was different in two lines, possibly reflecting the difference in response of the microorganisms to the symbiosis. The same was the case only for the proteins involved in cellular respiration (e.g., ATP synthase subunit alpha and probable triosephosphate isomerase), which were clearly up-regulated. The triosephosphate isomerase was shown to be required for post-germinative transition to autotrophic growth in seeds [36]. The significant increase of ATP-synthase abundance points at the significant increase in metabolism of the seeds of the line K-8274 upon symbiosis.
Remarkably, in the K-3358 line, several functional protein groups were exclusively down-regulated upon inoculation with BSM. This can be exemplified by the proteins involved in redox homeostasis (catalase, probable peroxiredoxin (UniProt ID B7FH22), and superoxide dismutase), protein degradation and modification (proteasome alpha subunit, serine carboxypeptidase-like protein, and glutathione S-transferase), and vesicle tracking (clathrin heavy chain and GDP-dissociation inhibitor). All these observations indicate a decrease in metabolic activity in the seeds of the low-EIBSM line in comparison to those of the high-EIBSM one. Interestingly, probable peroxiredoxin (B7FH22) and phosphoenolpyruvate carboxylase, involved in redox homeostasis and photosynthesis are down-regulated in this line, while they are among the up-regulated species in the seeds of the line K-8274 (Figure 4). Among the proteins without assigned functional category, a polypeptide annotated as an embryogenesis abundant protein, significantly decreased its abundance in seeds of K-8274 and increased it in K-3358. Proteins of this group were mostly found in seeds at late developmental stages [37] and can thus be related to seed maturity.
Prediction of cellular localization, performed for differentially expressed proteins, revealed cytosol as the major cellular fraction, responding to inoculation with BSM, although nuclear and plastid proteins were highly represented as well ( Figure 5, Supplementary Material 5). Thus, plastid proteins constituted the most symmetrically changing group of proteins; in both lines, these species represented 10%-12% of up-and down-regulated polypeptides. On the other hand, the most variable groups were represented by extracellular (up to 20% of all of the proteins), membrane (up to 10%), and mitochondrial (up to 10%) proteins ( Figure 5, Supplementary Material 5). Interestingly, vacuolar proteins represented the only down-regulated localization protein group within both lines ( Figure 5B,D). Some proteins were identified exclusively in specific groups: One protein with shared localization in Golgi apparatus and mitochondrion was up-regulated within line K-8274, while two peroxisome proteins were found only down-regulated in line K-3358 ( Figure 5A,D, respectively), also indicating a differential response of metabolism of the seeds to inoculation.

Complex Inoculation Affects Seed Productivity only in the High-EIBSM Line K-8274
During the recent years, legume seed proteome was intensively studied [34,38]. Specifically, Sistani et al. addressed the changes in pea seed protein content upon inoculation with rhizobia and/or arbuscular-mycorrhizal fungi in the context of resistance to the pathogenic fungi Didymella pinodes [39].
Here we consider these inoculation-related effects with a specific focus to susceptibility of plants to inoculation with BSM. For this, we employ two pea lines with contrasting EIBSM as an efficient tool to dissect metabolic effects underlying the observed increase in seed protein contents. In agreement with this aim, we rely here on the methods of bottom-up proteomics-an efficient functional genomics tool, well-established in seed research during the last decade [40]. Recently, we validated our nanoHPLC-ion trap (IT)-Orbitrap-MS-based approach for label-free quantification and confirmed its high reliability, precision, and sensitivity [32]. In our earlier studies it proved to be well compatible with other functional genomics techniques [24].
At the level of morphology, the beneficial effect of complex inoculation (namely, increase in seed/shoot biomass and seed number) was observed for the high-EIBSM line K-8274, but not the low-EIBSM one K-3358, which was in agreement with the previous studies [41]. Thereby, as inoculation of K-8274 with individual BSM did not result in any significant beneficial effects on plant productivity [30], only the effects of complex inoculation (with rhizobia and AM fungi simultaneously) were addressed here. It is well known, that individual pea lines strongly differ in their response to inoculation with individual BSM and their combinations. Indeed, for the pea cultivar Messire, double inoculation was inefficient [42], whereas root colonization with an individual culture of rhizobia resulted in significant gain in seed weight [39]. This example clearly illustrates the importance of using contrasting genotypes for these kinds of studies.

Differences in Protein Expression Patterns between the Pea Lines with High and Low EIBSM
The overall success of comprehensive proteome characterization critically depends on proper protein identification methods. Therefore, here we applied a representative sequence database of legume species, closely related to pea. This approach proved to be efficient in our previous studies [44]. Thus, altogether 1500 non-redundant proteins were identified here (1346 and 1425 in the seeds of K-8274 and K-3358, respectively). Although it was slightly lower in comparison to the results of our recent comprehensive profiling of pea seed proteome [34], the conclusions drawn here are still based on the most complete, to the best of our knowledge, pea seed protein map.
In agreement with the results of Turetschek et al. [45], the effect of plant genotype on seed proteome signatures was more pronounced, than the impact of inoculation with BSM. Thus, due to the contrasting seed color (green and yellow for K-3358 and K-8274, respectively), several proteins related to photosynthesis were differently expressed in two analyzed lines. Indeed, the yellow color of seeds of the K-8274 plants was due to the lack of the active SGR (STAY GREEN) protein, involved in regulation of chlorophyll degradation, encoded by the gene I [46,47]. Further, the proteins

Differences in Protein Expression Patterns between the Pea Lines with High and Low EIBSM
The overall success of comprehensive proteome characterization critically depends on proper protein identification methods. Therefore, here we applied a representative sequence database of legume species, closely related to pea. This approach proved to be efficient in our previous studies [44]. Thus, altogether 1500 non-redundant proteins were identified here (1346 and 1425 in the seeds of K-8274 and K-3358, respectively). Although it was slightly lower in comparison to the results of our recent comprehensive profiling of pea seed proteome [34], the conclusions drawn here are still based on the most complete, to the best of our knowledge, pea seed protein map.
In agreement with the results of Turetschek et al. [45], the effect of plant genotype on seed proteome signatures was more pronounced, than the impact of inoculation with BSM. Thus, due to the contrasting seed color (green and yellow for K-3358 and K-8274, respectively), several proteins related to photosynthesis were differently expressed in two analyzed lines. Indeed, the yellow color of seeds of the K-8274 plants was due to the lack of the active SGR (STAY GREEN) protein, involved in regulation of chlorophyll degradation, encoded by the gene I [46,47]. Further, the proteins involved in abscisic acid (ABA) signaling were clearly more abundant in the seeds of K-8274 in comparison to the seeds of the low-EIBSM line. The role of these molecules (17.6 kDa class I heat shock protein, translation elongation factor EF-2 subunit, UTP-glucose-1-phosphate uridylyltransferase, and ABA-responsive protein, Table 1, Supplementary information 4) in ABA signal transduction is well-characterized in non-legume species [48][49][50][51]. As ABA is a critical regulator of late steps of seed development [47], this observation might indicate inter-line differences in seed maturation rates. We also identified proteins differentially expressed in two lines that might indicate high polymorphism of pea seeds in respect of their proteome signatures. Thus, the approach relying on relative quantification of individual proteins might have a high value in breeding. This conclusion is supported by the work of Bourgeois et al. [52], where the genetic architecture of seed proteome variability was uncovered and the protein quantity loci, responsible for different seed protein composition and protein content, were identified.

Response of High-and Low-EIBSM Pea Lines to Inoculation with Rhizobia and Arbuscular Mycorrhiza
In general, the observed responses of seed proteome to inoculation with BSM could be classified as line-unspecific and line-specific. The non-specific responses manifested as up-regulation of the polypeptides, involved in protein biosynthesis and vesicle transport. This fact might indicate an improved availability of soil phosphorous and nitrogen. However, the number of such hits was lower in comparison to the proteins, demonstrating inter-line expression differences (as shown on the PCA plots on Figure 4A,B; corresponding loading plots on Figure S1-6). As these proteins could contribute on the observed difference in EIBSM, we addressed this group in more detail.
The analyzed lines showed a differential response to inoculation with BSM. Thus, K-8274 demonstrated stronger symbiosis-related differences in expression of seed proteins in comparison to the low-EIBSM line. Moreover, the functional patterns of the expression differences were clearly line-specific. Thus, inoculation of the low-EIBSM line K-3358 with two BSM resulted in down-regulation of the proteins involved in central and energy metabolism, as well as biosynthesis and post-translational modification of proteins. In agreement with this, the K-3358 plants inoculated with BSM completed seed development earlier, than corresponding not inoculated controls.
In contrast, inoculation of the high-EIBSM line K-8274 resulted in up-regulation of the polypeptides, involved in biosynthetic pathways, cellular respiration, detoxification of reactive oxygen species (ROS), and photosynthesis. One of the up-regulated biosynthetic enzymes, namely phosphoenolpyruvate carboxylase, was previously shown to be highly correlated with seed protein and lipid contents in soybean [53]. On another hand, a plastid protein with triose phosphate isomerase activity appeared to be up-regulated (Supplementary information 5, Table S5-1). A protein with this activity was earlier shown to be crucial for post-germinative switch from heterotrophic to autotrophic growth in Arabidopsis [36]. The observed inoculation-related changes might indicate a high level of cell metabolism, which is essential for seed filling and beneficial for seed development. Differential expression of some other proteins might be related to line-specific differences in interaction of pea plants with symbiotic bacteria. Thus, β-hexosaminidase, expressed exclusively in K-3358 seeds, was not earlier reported in the context of legume-rhizobial symbiosis. On the other hand, the S.7-like l-type lectin-domain containing receptor kinase, found only in the K-8274 seeds, can potentially be involved in the reception of microorganisms and thus potentially may represent a link between the inoculation and seed formation.
In agreement with this, the high-EIBSM line K-8274 demonstrated an inoculation-related down-regulation of late embryogenesis abundant (LEA) protein A0A072TMR3. This might indicate retardation of seed maturation. Indeed, a similar observation was done by Sistany et al. [39], who reported lower levels of LEA proteins in pea plants, inoculated with rhizobia, in comparison to corresponding non-inoculated controls. Thus, we assume that the high-EIBSM genotype of the K-8274 line might contribute to the prolongation of the immature stage of seed development upon the inoculation with BSM, whereas the low-EIBSM line K-3358 did not respond to complex symbiosis in this way. Recently, we have shown that arbuscular mycorrhiza results in prolongation of the pea life cycle (Shtark et al., under revision). Therefore, we assume the mycorrhizal component of the inoculum to be the main contributor to the inoculation-related seed biomass increase, observed in this study for the high-EIBSM K-8274 line.
Another important marker of the inoculation-related retardation in seed maturation is 1,2-dihydroxy-3-keto-5-methylthiopentene dioxygenase-an enzyme involved in methionine salvage and annotated here by the M. truncatula part of our combined sequence database as MEDTR1G102870.1. According to the M. truncatula gene expression atlas [54], this enzyme is expressed in most of the tissues. Thereby, in seeds, it shows a characteristic expression pattern (i.e., its abundance decreases from the 10th to the 24th day after pollination (DAP), and increases until the 36th DAP. Under symbiotic conditions, this increase in abundance was four-fold in the seeds of K-8274, whereas this enzyme was two-fold down-regulated in the seeds of K-3358 plants, inoculated with BSM. This observation was in agreement with the here proposed prolongation of the immature stage of seed development in the high-EIBSM line upon the inoculation with BSM. As MEDTR1G102870.1 can be a promising marker of this "prolonged seed youth" phenomenon, expression levels of the corresponding gene and kinetics of the enzymatic reaction product deserve to be determined in future studies.

Ecological and Agricultural Aspects of the High-EIBSM Trait
Most probably, the differential response to inoculation with BSM (i.e., low-and high-EIBSM traits) reflects two strategies of nitrogen assimilation upon its supplementation: Some genotypes demonstrate prolonged seed filling under optimal nitrogen supply conditions, whereas the others complete seed development as fast as possible (reminiscent to r-and K-strategies characteristic for different higher organisms [55]). Obviously, representation of the both strategies in a population might increase its overall adaptation flexibility. On another hand, the difference in response to available nitrogen can be attributed to the breeding history of individual pea varieties and cultivars. In this context, we assume that the plants of the low-EIBSM line prioritize the speed of seed maturation over the maximization of the nutrient content of the seeds. Interestingly, the K-3358 plants develop multiple reproductive nodes (i.e., new seeds can form during the whole ontogenesis). In contrast, the K-8274 plants can produce only a limited number of reproductive nodes, and, hence, develop a limited pre-determined number of pods and seeds, in which the available resources invested. Thus, K-and r-strategies of different pea genotypes might reflect corresponding growth patterns. Most probably, at the metabolic level, these strategies can be due to differences in (i) nitrogen sensing, (ii) efficiency of nitrogen uptake from soil or efficiency of its fixation in nodules, and (iii) assimilation of nitrogen by the seeds.
Seed development is, metabolically, closely associated with re-mobilization of nitrogen from vegetative tissues to seeds, which triggers leaf senescence and shortens seed filling period [56]. On another hand, mycorrhization prolongs the metabolically active stages of leaf ontogenesis (i.e., Shtark et al., under revision). Thus, high-EIBSM genotypes, like K-8274, represent well-balanced systems with improved efficiency of seed filling due to a longer immature stage in seed development. Therefore, such genotypes give access to higher biomass gain. Hence, involvement of high-EIBSM lines in breeding programs might increase the overall agricultural efficiency. One needs to keep in mind; however, that environmental stress, like drought, common in most pea culturing countries, might eliminate the favorable effects of symbiosis with BSM [57]. Therefore, additional experiments in adequate drought models [58] are necessary to address inoculation of high-EIBSM pea plants with BSM under conditions of environmental stress.

Reagents
Unless stated otherwise, materials were obtained from the following manufacturers. All other chemicals were purchased from Sigma-Aldrich Chemie GmbH (Taufkirchen, Germany). Water was purified in house (resistance 5-15 mΩ/cm) on a water conditioning and purification system «Elix 3 UV» (Millipore, Moscow, Russia). The seeds of pea (Pisum sativum L) lines with accession numbers K-8274 (cultivar Vendevil, France) and K-3358 (local landrace from Saratov region, Russia), characterized by high and low EIBSM, respectively, were initially obtained from the collection of the Vavilov Institute of Plant Genetic Resources (St. Petersburg, Russia) and were propagated prior to the experiment in ARRIAM (St. Petersburg, Russia).

AM Fungal Inoculum
The AM inoculum relied on a combination of three R. irregularis strains, namely BEG144, BEG53 (both provided by the International Bank for the Glomeromycota, Dijon, France), and ST3 (All-Russia Research Institute for Agricultural Microbiology, Saint-Petersburg) [59]. All isolates were cultured individually in a sand/soil mixture (1:1 v/v) using Plectranthus australis R. Br. as a host plant. To obtain the inoculum of AM fungi, the seeds of sorghum (Sorghum sp.) were surface sterilized with a 0.15% (w/v) aqueous solution of potassium permanganate for 15 min, and transferred to pots, filled with a soil-based substrate (pH 7) containing dried P. australis roots colonized with the three above mentioned R. irregularis strains. After about 120 days of vegetation, the colonized sorghum roots were separated from the substrate, cut into 1 cm pieces, dried and mixed with the substrate to establish the inoculum.

Plant Experiments and Characterization of Biomass Gain and Seed Productivity
The seeds were surface sterilized with concentrated sulfuric acid, rinsed with sterile water, germinated on wet vermiculite for three days in darkness at 25 • C, planted in 5 L pots filled with sod-podzolic light loamy soil (five plants per pot), and inoculated with 150 ml of water suspension (106 CFU * l-1) of symbiotic bacteria (Rhizobium leguminosarum bv. viciae RCAM1026) [60] in combination with prepared inoculum (see previous section). Thereby, the planted seeds (n = 5) were overlaid with 30 g/pot of the AM fungal inoculum (see previous section). Before planting, the weight of pots was adjusted with soil to obtain the same value. The plants were grown under non-controlled light and temperature conditions in a vegetation house of the All-Russia Research Institute for Agricultural Microbiology, St. Petersburg (June-August 2016). Formation of AM was verified on the 28th day after germination by light microscopy, as described by Shtark et al. [61]. The plants were harvested at the stage of mature seeds (3 months after planting), and the dry weight of aerial part, the weight of seeds and the total number of seeds per plant, were recorded. Data processing and statistical evaluation was done with SigmaPlot 12.0 software (Systat Software, San Jose, CA, USA).

Protein Isolation
Pea seeds (10 per biological replicate) were frozen in liquid nitrogen and ground in a Mixer Mill MM 400 ball mill with a Ø 20 mm stainless steel ball (Retsch, Haan, Germany) at a vibration frequency of 30 Hz for 2 × 1 min, and kept on dry ice prior to protein extraction. The total protein fraction was isolated from the frozen ground material by phenol extraction, as described by Frolov and co-workers [62] with some modifications. Briefly, approximately 50 mg of plant material (placed in 2 mL polypropylene Bremen, Germany). The eluents A and B were 0.1% (v/v) aq. FA and 0.08% (v/v) aq. FA in acetonitrile, respectively. The peptides were eluted with linear gradients ramping from 1% to 35% B over 90 min, followed by 35% to 85% eluent B over 5 min. The column was washed for 5 min, and re-equilibrated at 1% eluent B for 10 min. The nano-LC-Orbitrap-MS analysis relied on data-dependent acquisition (DDA) experiments performed in the positive ion mode, comprising a survey Orbitrap-MS scan and MS/MS scans for the most abundant signals in the following 5 s (at certain tR), with charge states ranging from 2 to 6. The mass spectrometer settings and DDA parameters are summarized in the Supplementary Material (Table S1-1). The mass spectrometry proteomics data have been deposited to the ProteomeXchange Consortium via the PRIDE partner repository [67] with the dataset identifier PXD013479.

Data Analysis and Post-Processing
Identification of peptides, annotation and label-free quantification of proteins relied on the Progenesis QIP software (Waters GmbH, Eschborn, Germany). After peak alignment, spectral and peptide filters were applied (Table S1-2), and thereby selected MS/MS spectra were searched with SEQUEST engine against a combined database, containing protein sequences of three legume species closely related to pea (Table S1-3), as was recently proposed by Matamoros and co-workers [44]. The database search settings are summarized in Table S1-4. Afterwards, the resulted pepXML file (obtained after the search against decoy database, FDR < 0.05) was imported to Progenesis QIP software for relative quantification of identified differentially expressed proteins based on Hi-N algorithm, picking the three most abundant peptides for quantification. Finally, the proteins meeting the filter criteria (listed in Table S1-2), were exported for statistical interpretation in Perseus 1.6.0.0 software (Max-Planck Institute of Biochemistry, Martinsried, Germany) [68]. This included logarithmical (log2) transformation of analyte abundances, and normalization by uniting vectors in individual experiments (columns) and by z-score based on median calculated for individual proteins (rows). Hierarchical clustering relied on Pearson correlation coefficient to cluster individual experiments and Spearman correlation coefficient to cluster individual proteins. After subsequent manual adjustment of the heat-map-based clusters, specific profile plots were built for each of them. Annotation of individual proteins relied on the original sequence database. For the proteins, annotated as "uncharacterized", further information was derived from the Uniprot database [69].
For qualitative characterization of seed proteome, the raw files were directly analyzed in Proteome Discoverer 2.2 using the search parameters described above (Table S1-4). Venn diagrams were built by means of the InteractiVenn tool [70]. Thereby, only the proteins and protein groups (i.e., non-redundant proteins) identified with at least one unique peptide were considered. For building Venn diagrams, all specifically modified peptides were considered as unique species.
Functional annotation of the identified proteins relied on the Mercator 4 (v1.0) web application [35]. The results were interpreted and visualized by custom Rscripts (v 3.4.4). The closest homologues of the analyzed proteins to the Arabidopsis proteins were identified with reciprocal best-hit methods. Prediction of intracellular localization relied on the SUBA4 tool [43].

Conclusions
Bottom-up shotgun proteomics is a powerful tool in legume seed research. Here we successfully applied it to probe seed metabolic differences related to simultaneous inoculation of low-and high-EIBSM pea plants with rhizobia and AM fungi (i.e., the conditions mimicking the real plant rhizosphere). Thus, the high-EIBSM genotype responded to the inoculation with a prolongation of the seed filling period. This effect was due to changes in expression of the proteins involved in central energy metabolism and protein biosynthesis and folding. Of course the presented data are preliminary, and in the future these proteomics studies might be complemented with other methods of functional genomics-metabolomics and transcriptomics (i.e., a multi-omics approach can be employed). Besides, genome-wide association studies (GWAS) might help to discover novel determinants of the beneficial traits. It is important to mention; however, that efficiency of data interpretation and integration of different approaches will dramatically increase when the sequencing of the pea genome is accomplished.