Immature Seed Endosperm and Embryo Proteomics of the Lotus (Nelumbo Nucifera Gaertn.) by One-Dimensional Gel-Based Tandem Mass Spectrometry and a Comparison with the Mature Endosperm Proteome

Lotus (Nelumbo nucifera Gaertn.) seed proteome has been the focus of our studies, and we have recently established the first proteome dataset for its mature seed endosperm. The current study unravels the immature endosperm, as well as the embryo proteome, to provide a comprehensive dataset of the lotus seed proteins and a comparison between the mature and immature endosperm tissues across the seed’s development. One-dimensional gel electrophoresis (SDS-PAGE) linked with tandem mass spectrometry provided a protein inventory of the immature endosperm (122 non-redundant proteins) and embryo (141 non-redundant proteins) tissues. Comparing with the previous mature endosperm dataset (66 non-redundant proteins), a total of 206 non-redundant proteins were identified across all three tissues of the lotus seed. Results revealed some significant differences in proteome composition between the three lotus seed tissues, most notably between the mature endosperm and its immature developmental stage shifting the proteins from nutrient production to nutrient storage.


Introduction
Nelumbo nucifera (Gaertn.) is an aquatic perennial belonging to the family of Nelumbonaceae, whose most used common name is the lotus. The lotus typically grows in shallow ponds, with its rhizomes under the mud and its large leaves rising on stalks 1-2 m above the water surface. Flowers are white to rosy, sweet-scented, solitary, hermaphrodite and 10-25 cm in diameter, while its fruits are ovoid having nut like achenes. Seeds are black, hard and ovoid [1]. In its immature form, the lotus seed is initially of a yellowish color (early stages) and becomes green as it grows and matures. In its late immature stages, the seed is a 1.2-1.5 cm long ovoid covered in a soft green husk containing a moist and soft endosperm and the developing embryo. When the seed reaches maturity, the husk turns dark brown and hardens, and both the endosperm and embryo become considerably dry. The lotus embryo, or germ, is a small, stalk-like tissue at the core of the lotus seed. The embryo is green and yellow in color. In the mature seed, the embryo tissue is dry, and while inside an intact seed, it can remain viable for germination for more than a thousand years, making it the most durable seed known [2][3][4][5]. The immature seed, which is composed largely of the endosperm, has a water content of 77.5%, as opposed to the 13.1% water content of the mature seed. The immature seed also has lower protein and carbohydrate content, 5.9% and14.9%, respectively, compared to 19.1% and 62.6% for the mature seed [6].
The lotus seeds and rhizome are extensively consumed as food in China and Japan and regarded as a health food [7][8][9], and the plant is also utilized as a source of traditional medicine in India and China [1,10]. Furthermore, extracts from the lotus leaves, rhizomes, and seeds have been shown possess multiple health benefits and a diverse amount of secondary metabolites (more details are given in our review [11] and references therein). The genome of the lotus has only recently been sequenced [12], and a few targeted genome and transcriptome-level works have led to the identification of some functional proteins, as well as their successful cloning and transgenic expression [13][14][15][16]. Considering its documented health benefits and several desirable characteristics for nutritional, agricultural and scientific uses, such as its protein content, ability to be cultivated in flooded areas, growth and germination vigor, and extreme seed durability, the lotus plant would consist of an excellent candidate as a crop, source of recombinant genes, or even as potential model organism. However, despite these characteristics, proteome analysis of the plant is still at the initial stages of research. Figure 1 depicts the lotus fruit and seed, its importance and proteomic study goals. Aiming to develop a proteome catalogue of the lotus plant-starting with its seed, the nutrient rich food source-the first study by our research group has unraveled the mature endosperm proteome of the lotus seed, which included the establishment of protocols for protein extraction and analyses by one-dimensional gel electrophoresis (1-DGE) and by two-dimensional gel electrophoresis  in conjunction with mass spectrometry [17]. In the present work, we advance our study of the lotus seed by further analyzing the endosperm of the lotus seed in its immature stage and the embryo, the other prominent component of the mature seed, by utilizing 1-DGE linked with tandem mass spectrometry proteomic approach. The resulting proteome from each tissue (immature endosperm and embryo) is compared with the mature endosperm proteins in hope to bring to light any notable differences in protein content between the different tissue locations and developmental stages.

Plant Material and Tissues (Immature Endosperm and Embryo of Lotus Seed) Preparation
Lotus seeds, both mature and immature, were obtained from a small cultivation pond in the Ibaraki University's College of Agriculture campus in Ami town, Ibaraki, Japan [17]. The immature seed endosperm was collected from seeds extracted from the lotus seedpod in their post-pollination late immature stage. At the point of collection, the seeds were approximately 1.3 cm long, and the external husk was still green and soft. The seeds were washed and stored whole at −80 C until tissue extraction. The seeds were cut open and the soft and white core was removed whole and then cut across its length. The translucent sheet around the core, any discernible embryo tissue, as well as the

IMPORTANCE OF LOTUS PLANT & SEED
 Underused source of proteins and nutrients. central portion of the core immediately around the embryo was removed. The remaining soft endosperm fragments were ground under liquid nitrogen and the resulting powder was stored in sterile BD Falcon tubes at −80 C until extraction of protein. For embryo tissue sample preparation, the mature seeds (stored at room temperature) were cracked open in a clean environment and the endosperm and embryo portions were cleanly separated and stored in sterile BD Falcon tubes at −80 C. The embryo fragments were ground into a fine powder in liquid nitrogen, with a pre-chilled mortar and pestle. Resulting powder was stored in sterile 2.0 mL microfuge tubes at −80 C until further analysis.

Extraction of the Lotus Seed Immature Endosperm and Embryo Proteins
Proteins were extracted from the powdered samples using the Tris-buffered saline (TBS) extraction method described in a previous study [17]. Briefly, a 3:1 mixture of TBS-20 buffer [10 mM Tris-HCl, 150 mM NaCl, pH 7.4, 0.1% (v/v) Tween-20, plus one tablet of EDTA-free proteinase inhibitor (cOmplete Mini, Roche) per 50 mL] and SDS (sodium dodecyl sulfate) reducing buffer [62 mM Tris (pH 6.8), 10% (v/v) glycerol, 2.5% (w/v) SDS, 5% (v/v) 2-mercaptoethanol] was used to extract the powdered samples at 2 mL/100 mg. The sample/buffer mixtures were also subjected to several 30 s ultrasonic bath cycles and at 95 C heating for 5 min to help extraction. The extract was separated by centrifugation, and its proteins precipitated and purified using the ProteoExtract kit (Calbiochem). The dry protein pellets obtained were either resolubilized in LB-TT (7 M urea, 2 M thiourea, 4% (w/v) CHAPS, 18 mM Tris-HCl (pH 8.0), 14 mM Trizma base, 0.2% (v/v) Triton X-100 and 50 mM dithiothreitol) for immediate use or stored at −80 C. Prior to use, protein content of the resolubilized extracts was measured by Bradford assay [18].

Extraction of the Lotus Seed Immature Endosperm and Embryo Proteins
Protein samples from both tissues were subjected to 1-DGE (SDS-PAGE, 12.5%), both for visualization of protein profiles ( Figure 2) using Coomassie Brilliant Blue [19] staining, and prior to analysis by 1DGE-MS.

Protein Content of the Immature Endosperm and Embryo Tissues
Protein extracts from the lotus immature seed endosperm presented very low protein yield (ca. 1.5% in the TBS method), requiring larger amounts of tissue to be extracted in order to obtain a suitable amount of protein. The reason for low protein yield lies in the high water content of the immature seed compared to its mature form. The lotus seed embryo showed a similar total protein yield to the endosperm extract [17] when extracted by the TBS/clean-up method (ca. 9%, compared to ca. 11% for the mature endosperm).
A comparison of the of the 1-D band profile on the SDS-PAGE of the embryo extract with the endosperm one showed many similarities, but also some noticeable differences, such as an absence of strongly stained bands at ca. 20 kDa and 40 kDa, and more numerous bands at low-molecular weights, under 30 and 20 kDa (see above, Figure 2). In the case of the immature endosperm, the 1-D profile is more similar to the mature endosperm than the embryo, but still was found to be different from both tissues profiles. Compared with the mature endosperm extract, the immature endosperm extract most notably does not present a high amount of protein bands around the 20 kDa range. The cluster of bands around 50 kDa is similar to that in both the endosperm and embryo, and the immature endosperms profile of bands in the 60-90 kDa range seems more similar to the mature endosperm than the embryo.

Lotus Immature Endosperm Proteins Identified by 1-DGE and MS/MS Analyses
The 1-DGE separation (SDS-PAGE) of proteins in an extract, followed by MS/MS analysis is part of the so-called "bottom-up" approach to proteomics, a methodology in which proteins are proteolytically digested into peptides prior to mass spectrometric analysis, and the ensuing peptide masses and sequences are used to identify corresponding proteins. This simple approach is a useful method for performing large-scale analyses of complex samples [21]. For the sample consisting of a purified extract of lotus immature endosperm proteins, after separation by SDS-PAGE, the sample was divided into eight fractions, analyzed by LC-MS/MS, and matched against a green plant database, as detailed in the Experimental Section. Results revealed more than 500 protein matches with at least two confirmed peptide fragment matches were identified amongst all fractions, and from these 333 unique protein matches were identified. Different database matches that were likely to refer to the same protein in the sample, such as two or more matches for the same protein but from different database organisms, were grouped together based on taxonomical proximity and similarity of identified peptide sequences. Finally, 122 non-redundant (nr) protein matches were listed, along with the number of repeated matches found for each one (Table 1), with the protein match listed being the one with the highest score amongst its group of similar proteins.

Lotus Embryo Proteins Identified by 1-DGE and MS/MS Analyses
The 1-DGE-MS analysis of the lotus embryo protein extract was performed following the same methodology, green plant database, and same parameters as for the immature endosperm extract. For the sample consisting of a purified extract of lotus embryo proteins, after separation by SDS-PAGE, the sample was divided into eight fractions, analyzed by LC-MS/MS, and matched against a green plant database, as above. From the initial results, 500+ protein matches with at least two confirmed peptide fragment matches were identified. After removing duplicate results from different gel fractions, there were 373 unique protein matches remaining. After grouping results likely to be the same protein in the sample, based on protein taxonomy and similarity of identified peptide sequences, 141 nr protein matches were listed (Table 2).

Comparative Analysis of Lotus Seed (Immature Endosperm, Mature Endosperm, and Embryo) Proteins
As is to be expected, there were many proteins in common found among the immature endosperm and embryo tissues, as well as with the mature endosperm previously analyzed [17]. Amongst all three seed tissues, a total of 206 nr proteins were identified against the plant database ( Figure 3). Of these, 31 (15%) were common to all three tissues, 40 (19%) were unique to the immature endosperm, and 65 (32%) were unique to the embryo; only 14 (7%) were exclusively found in the mature endosperm. To note, the larger share of embryo-only proteins is a consequence of the embryo tissue being much more involved in plant metabolism, and therefore is expected to express a larger number of functional proteins than the endosperm, which, especially in its mature phase, has nutrient storage as its primary function. The immature endosperm, as a developing tissue, also expresses a larger number of proteins than its mature form, and also shares a significant number of proteins with the embryo-35 (17%) of the identified ones. Common proteins between mature and immature endosperm only amounted to 5% of the identified ones (same as for between the mature endosperm and embryo). Although, considering that both immature endosperm and immature embryo are much softer and with a higher water content than their mature stages, there is a possibility that some of the proteins in common with the embryo identified in the immature endosperm might have originated from the embryo and diffused through the endosperm, despite the care taken to remove embryo fragments and the endosperm immediately around them in the sample preparation. Listing of the total and nr protein matches found for each lotus seed tissue analyzed (b); * see reference [17].

Functional Significance of the Identified Seed Proteins
Gene ontology data (biological processes, molecular functions and cellular localization) for all identified proteins were obtained from the UniProtKB database, using the EMBL-EBI (www.ebi.ac.uk) search tool (Table 3). Table 3. List of all 206 non-redundant (nr) proteins found across the three tissues of the lotus seed (embryo, immature endosperm and mature endosperm).       Analysis of the annotations referent to the immature endosperm revealed that functions related to protein synthesis (translation, protein folding and polymerization, etc.), general metabolism (amino acid, carbon fixation) and carbohydrate metabolism (glycolysis, etc.) are all considerably represented, with the proteins in the first category being relatively more numerous (Figure 4).

Protein Accession Protein Description
. On the other hand, the embryo proteome shows considerable prevalence of proteins involved in protein synthesis, followed then by carbohydrate and general metabolism processes ( Figure 5).

Biological Function of the Identified Seed Proteins
Furthermore, the nr protein matches were also classified according to their broader biological function [22,23], divided into 10 categories: metabolism, energy, cell growth/division, transcription, protein synthesis/destination, transporters, cell structure, signal transduction, stress response, and unclassified ( Figure 6). A comparison of the distribution of protein functionality between the seed immature endosperm and embryo, and the previous results obtained from the mature endosperm shows that immature endosperm and embryo have a quite similar functionality profile of the mature endosperm. However, in the embryo the identified proteins related to general cell housekeeping functions (non-energy metabolism, cell growth, transcription, transport, and signaling) were slightly more apparent than in the immature endosperm. In contrast with the mature endosperm, both immature endosperm and embryo show a larger percentage of the identified proteins related to protein synthesis. This correlated well with the fact that the tissues are either in a growing phase, i.e., immature endosperm or have growth as their main function, i.e., embryo. The mature endosperm, on the other hand, having its primary function as energy and nutrient storage, has the larger share of its proteins related to energy metabolism. A common element for all the lotus seed tissues is the large presence of stress-/defense-related proteins across all samples.

Lotus Seed Proteome Compared with Other Seed Proteomes
Unlike some seeds, such as tomato, where non-germinating embryo and endosperm were shown to have very similar proteomes [24], the analysis of lotus seed proteomes showed some remarkable difference in proteins identified/function between the non-germinating embryo and mature endosperm. Contrary to other seed proteomes like Jatropha curcas [23] and sugarbeet [25], the lotus embryo in its pre-germination stage did not seem to have a considerably higher expression of metabolism-and energy-related proteins compared to the mature endosperm. Structural proteins, however, did seem to be at least slightly more represented in the endosperm, as in the case of J. curcas. Compared with other embryo proteomes, such as Brassica campestri [26], and sugarbeet, the lotus embryo appears to have a larger percentage of proteins related to protein synthesis in comparison to primary and energy metabolism, as well as a much greater presence of defense related proteins. We further discuss below the key proteins identified in this study.

Key Proteins of the Lotus Immature Seed Endosperm
Contrary to the mature endosperm, the key functional proteins identified in the lotus immature endosperm mostly consisted of proteins related to plant growth and development (Figure 7). Amongst the identified proteins were several transcription proteins (cell division control and transcription factors), translation (ribosomal) proteins, post-translational modification proteins (elongation factors and ubiquitins) and nutrient production proteins (RuBisCO subunits and sucrose synthase). Many stress response-and plant defense-related proteins were also present in the immature endosperm. Of these, the largest subgroup is the heat shock response proteins (high-and lowmolecular weight heat shock proteins (HSPs), as well as chaperone and annexin proteins). Antioxidative stress (peroxidases, endoplasmin, and monodehydroascorbate reductase) are also present,

Key Proteins Previously Identified in the Lotus Mature Seed Endosperm
In the case of mature endosperm proteome [17], the two most significant groups of proteins identified were related to energy/carbohydrate metabolism, and stress response and plant protection (Figure 8). In the first group, several proteins that are part of glycolysis, gluconeogenesis, citric acid cycle and starch metabolism including other carbohydrate metabolism proteins, were identified. Of the stress response proteins, HSPs, along with other heat response proteins (chaperones, annexin), constituted the most numerous category. Anti-oxidative stress proteins were not greatly represented. Of note is the identification of storage proteins (such as globulins, castanins) for the mature endosperm by 2-D MS and N-terminal sequencing, but not by 1-D MS [17,27], which might indicate a possible detection gap of this technique. Figure 8. Key functional proteins identified in the lotus seed mature endosperm, and subdivided according to their role in plant metabolism. * for original protein lists, see reference [17].

Proteome Changes between Mature and Immature Stages of the Endosperm
Despite constituting the endosperm tissue samples, protein extracts from the mature and immature seed presented a notably different proteome composition (Figure 9).  This reflects the changes the endosperm undergoes during the maturation process, where it develops from a soft wet tissue to a dry one with a large amount per weight of both carbohydrates and proteins [6]. The endosperm's main role in the seed is as a nutrient storage tissue, so it is expected that during the maturation phase, these nutrients are going to be produced for later storage, hence the larger number of functional proteins related to the protein and carbohydrate synthesis categories. In the mature endosperm, a large percentage of the total protein content is expected to be seed storage proteins (SSPs). Although not many SSPs were identified by MS analysis of the mature endosperm, several possible matches were found by N-terminal sequencing analysis [27]. The prevalence of carbohydrate metabolism proteins amongst the identified functional proteins in the mature endosperm could be a result of production in the late maturation stage, with such proteins playing a quasi-dormant role in managing the nutrient content of the seed before and during germination.

Key Proteins of the Lotus Seed Embryo
In the case of the embryo proteins identified by database matching, the distribution of key proteins was similar to that of the immature endosperm, in that they can be divided in the same main groups: proteins related to plant growth, and proteins responsible for plant protection and germination vigor ( Figure 10). Of the first group, those also include the same subgroups of transcription, translation, and post-translation proteins as well as nutrient production proteins. In the case of stress/defense-related proteins, the embryo was also found to possess the largest number heat shock response proteins (12 HSPs, mostly of high-molecular weight, five chaperone proteins and two annexins). However, the embryo also contained a larger number of anti-oxidative stress proteins, including L-ascorbate peroxidase, catalase, monodehydroascorbate reductase, superoxide dismutase [Mn], and endoplasmin. S-adenosylmethionine synthase and adenosylhomocysteinase (also found in the endosperm tissues), and two proteins from the active methyl cycle, which is of great importance to plant metabolism as well as their nutritional value [28], were also identified in the lotus embryo. Figure 10. Key functional proteins identified in the lotus seed embryo, and subdivided according to their role in plant metabolism.

Conclusions
Analysis of protein extracts from the lotus seed embryo and immature seed endosperm was performed following 1-DGE separation in conjunction with LC-MS/MS analysis. This "bottom-up" proteomics analysis, represented by the SDS-PAGE technique, has been shown to be a good approach for identifying the lotus seed proteins [17]. For both tissues, a great number of proteins were identified by database matching. A total of 141 nr protein matches were identified in the embryo, and 122 in the immature endosperm. Together with the 66 proteins previously identified for the mature endosperm, a total of 206 nr proteins have been identified to date.
Combined datasets are a resource in itself towards complete proteomics analysis of lotus seeds and plants. By producing more extensive datasets, these results help toward forming a complete proteomic picture of the lotus seeds. The analysis of protein makeup and functionality across different tissues within the seed also permits a comparison of metabolic functions across different tissues and developmental stages of the lotus seed, as well as allowing for the comparison with similar tissues from other plants. Furthermore, the identification of proteins of interest-such as key proteins in the metabolism, proteins that confer resistance against stress or germination vigor-opens up several possibilities for more specific studies on these proteins and their possible use in producing transgenic varieties of interest.
Future work will both strive to expand the lotus proteome to other developmentally important tissues, such as seedling and rhizome, as well as to isolate and characterize functional proteins of interest in the seed proteome. Moreover, 2-DGE-MS analysis of individual proteins, especially by de novo proteome analysis techniques, coupled with genome comparison, can help obtain more detailed sequences of lotus-specific proteins, since the high taxonomical distance of the lotus in relation to other modern plants hinders the achievement of higher homology values when database-matching proteins.