Seed Storage Protein, Functional Diversity and Association with Allergy

: Plants are essential for humans as they serve as a source of food, fuel, medicine, oils, and more. The major elements that are utilized for our needs exist in storage organs, such as seeds. These seeds are rich in proteins, show a broad spectrum of physiological roles, and are classiﬁed based on their sequence, structure, and conserved motifs. With the improvements to our knowledge of the basic sequence and our structural understanding, we have acquired better insights into seed proteins and their role. However, we still lack a systematic analysis towards understanding the functional diversity associated within each family and their associations with allergy. This review puts together the information about seed proteins, their classiﬁcation, and diverse functional roles along with their associations with allergy.


Introduction
Plant seeds plays a vital role in human life as they satisfy around half of the world's dietary protein requirements [1]. Apart from the dietary needs, seed proteins play a fundamental role in germination, cellular growth and development, thiamine accumulation [2], nutrient storage [3] and regulating hormone levels [4]. Studies have shown that seed proteins also play a critical role in endurance for extreme dryness or drought-like conditions [5], activity against microbes and fungus [6,7], hemagglutination activity [8], plant defense [9], ribosome inhibitory activity [10] and many more. Therefore, seed proteins not only serve as a warehouse for proteins during germination, but they also perform numerous metabolic and structural roles.
Based on their function, seed proteins are traditionally classified as housekeeping proteins and storage proteins. While housekeeping proteins are involved in metabolism, storage proteins provide building blocks and energy for germination. With advancements in the field, along with many structural studies, the conventional classification of seed proteins has been amended. Now, the classification is performed based on the structural motifs, sequence, and physiological function. Although the structural folds are evolutionary conserved, the members of the proteins that belong to the same family show diversity in their functions.
The structure-based automated comparison improved our understanding and identified novel functions for seed proteins. For example, 7S vicilin and 11S vicilin from the cupin family are known to have a variety of physiological functions ranging from plant defense, oxidative stress and metabolite source to hypertensives, and they can also trigger allergic reactions [11][12][13][14]. Members of the prolamin family, 2S albumin, non-specific lipid binding proteins (nsLTPs), protease inhibitors, and others, also show functional diversities [15,16]. 2S albumins play a crucial role in polyamine metabolism as they are rich in sulfur and have been shown to induce allergic reactions. Likewise, various inhibitors possess antifungal, antitumor, antimicrobial and actin-crosslinking activities [17,18]. Apart from the usual prolamin family functions, non-specific lipid transfer proteins (nsLTPs) have the ability to transfer different lipids [19]. The presence of lipids helps with plant defense, and they form a protective hydrophobic layer on the aerial organs [19].
Despite all of these advancements, an understanding of the biological function of many seed proteins is yet to be revealed. We still lack a systematic comparison of the functional diversity and associated structural motifs. This review summarizes our understanding of the structure-based functional diversity in seed proteins and their associated physiological role and allergenic potential. It also gives an overview of the biochemical and molecular characteristics of food allergens that make them capable of inducing or triggering an immune response.

Classification of Seed Storage Proteins
Seed storage proteins are conventionally categorized into different superfamilies and families at the molecular level [20]. However, Osborne earlier classified them into four categories based on their solubility as water-extractable Albumins (2S), Globulins (7S and 11S), which are soluble in the dilute salt solution, and alcohol-soluble Prolamins and Glutelins, which are soluble in mild acidic or basic pH [21,22]. In wheat, prolamins (gliadin) and glutelin (glutenin) form the major gluten components. However, recently, these proteins were classified into three superfamilies based on their amino acid sequence conservation, 3D structure and biological activity (Table 1). The Prolamin superfamily is rich in proline and glutamine amino acids, and it is thus called prolamins [21]. Members of this family have a limited sequence identity, eight conserved cysteine residues (CXnCXnCCXnCXCXnCXnC), and they are rich in α-helices [23]. The Prolamin superfamily is subdivided into various families such as the nonspecific lipid transfer protein (nsLTPS), 2S albumin and cereal prolamins [24].
The cupin superfamily consists of globular seed storage proteins, and they are characterized by a β-barrel fold [25]. They can either have a single domain cupin or a bidomain cupin. Bidomain cupins have two β-barrel folds and can assemble into a trimer or a hexamer or a higher oligomer. For example, 7S vicilins are trimers with no disulfide bond, whereas 11S legumin-type globulins have disulfide bonds, and they are hexamers that can be cleaved into two trimers. They are named 7S or 11S based on the sedimentation coefficient. Globulins have been studied in detail in many plant seeds such as eggplants (Solanum melongena) [11,26], soybeans (Vicia faba) [27], peas (Pisum sativum) [28] and French beans (Phaseolus vulgaris) [29].
The Bet V1 (or pathogenesis-related) superfamily comprises of pathogenesis-related proteins (PR proteins), cereal inhibitors of alpha-amylases, cytoplasmic disease resistancerelated proteins, Kunitz type of protease inhibitors and more [30]. The members of this family fold in an β-α2-β6-α format, where the C-terminal helix is wrapped by an antiparallel β-sheet. They have a large hydrophobic core that binds large spectra of ligands such as phytohormones and siderophores such as flavonoids and alkaloids. This family is known to have more than 15 structures with non-identical sequences [31]. The physiological function of this family is still under investigation, but to date, it is mainly governed by bound ligands [31].

Structural Studies on Seed Storage Proteins
An increase in the number of protein sequence and structural studies has made the creation of systematic and scientific databases possible. It is for this reason that Prolamins, cupins and plant pathogen-related proteins (BetV1) are described as superfamilies, while legumins, vicilins, nsLTPS and albumins are described as families [32]. Some proteins are still not completely classified into any specific groups, such as profilins, expansins and chlorophyll-binding proteins. In this section, examples of the structural properties associated with members of the different superfamilies are described ( Figure 1). Allergies 2023, 3, FOR PEER REVIEW 3 antiparallel β-sheet. They have a large hydrophobic core that binds large spectra of ligands such as phytohormones and siderophores such as flavonoids and alkaloids. This family is known to have more than 15 structures with non-identical sequences [31]. The physiological function of this family is still under investigation, but to date, it is mainly governed by bound ligands [31].

Structural Studies on Seed Storage Proteins
An increase in the number of protein sequence and structural studies has made the creation of systematic and scientific databases possible. It is for this reason that Prolamins, cupins and plant pathogen-related proteins (BetV1) are described as superfamilies, while legumins, vicilins, nsLTPS and albumins are described as families [32]. Some proteins are still not completely classified into any specific groups, such as profilins, expansins and chlorophyll-binding proteins. In this section, examples of the structural properties associated with members of the different superfamilies are described ( Figure 1).

Structural Features of Prolamin Superfamily
A lot of structural variations are known in this family, however, this superfamily shows eight conserved cysteine residues that form a disulfide bond along with the presence of unusual CC and CXC motifs [19,21]. These unusual signature motifs facilitate the nsLTPS identification of members of this superfamily, which includes 2S albumin, nsLTP and other cereal prolamins.
nsLTP: Non-specific lipid transfer proteins (nsLTP) are known as one of the major plant allergen families. As the name suggests, they are associated with lipid transportation in plants, where the lipids are bound to the hydrophobic pocket within the protein [33]. nsLTPs have conserved cysteine and disulfide bonds, and they are rich in α-helices, and along with this, they have a high pI [20]. These properties make them capable of triggering an allergenic response once they reach the gastrointestinal system [34]. nsLTPs are divided into two types, Type I nsLTPs (9 kDa) and Type II (7 kDa), depending upon polypeptide chain length [35]. Along with the difference in the polypeptide length, nsLTP I have disulfide bonds between 1-6, 2-3 and 4-7, which are swapped to 1-5, 2-3, 4-7 and 6-8 in nsLTP II, respectively [36].

Structural Features of Prolamin Superfamily
A lot of structural variations are known in this family, however, this superfamily shows eight conserved cysteine residues that form a disulfide bond along with the presence of unusual CC and CXC motifs [19,21]. These unusual signature motifs facilitate the nsLTPS identification of members of this superfamily, which includes 2S albumin, nsLTP and other cereal prolamins.
nsLTP: Non-specific lipid transfer proteins (nsLTP) are known as one of the major plant allergen families. As the name suggests, they are associated with lipid transportation in plants, where the lipids are bound to the hydrophobic pocket within the protein [33]. nsLTPs have conserved cysteine and disulfide bonds, and they are rich in α-helices, and along with this, they have a high pI [20]. These properties make them capable of triggering an allergenic response once they reach the gastrointestinal system [34]. nsLTPs are divided into two types, Type I nsLTPs (9 kDa) and Type II (7 kDa), depending upon polypeptide chain length [35]. Along with the difference in the polypeptide length, nsLTP I have disulfide bonds between 1-6, 2-3 and 4-7, which are swapped to 1-5, 2-3, 4-7 and 6-8 in nsLTP II, respectively [36].
Originally, nsLTPs were believed to have only a lipid transfer role, however, we now know that they perform various functions including cutin and wax metabolism, seed development and germination, the responses to stress factors, cell wall growth and calmodulin binding [37][38][39][40][41]. Likewise, pepper nsLTP is produced during high salinity, drought or low-temperature stress, as well as after wound formation or pest attacks [42,43]. Similarly, barley, sunflower and sugar beet nsLTPs can inhibit bacterial and fungal growth [44][45][46]. Moreover, the studies on A. thaliana show the critical role of in forming a hydrophobic layer on plant aerial organs for protection [47]. Other than these physiological functions, the nsLTP protein from peach peel was identified as an allergen, and it was named Pru p3 [48]. The LTPs from Rosaceae fruits (peaches, apricots, cherries, plums and pears) Solanaceae (potatoes, tomatoes and eggplants) [10,26], Brassicaceae (cabbages and mustard) and even legumes and cereals are categorized as pan-allergens [49][50][51]. Unlike other plant allergens, these LTPs can trigger specific IgE antibodies, and they are, therefore, also called true food allergens [34,[52][53][54].

Structural Features of Cupin Superfamily
The cupin superfamily is known to have a beta-barrel fold, and it is characterized by the signature motifs: G(X)5HXH(X)34E(X)6G and G(X)5P(X)4H(X)3N, which are known as motif 1 and motif 2, respectively, where H and E stand for histidine and glutamate. The presence of these histidine-rich motifs facilitates metal binding, as seen in the case of germin and other globulins [63]. Exceptions are seen when histidine is absent in motif 1 [63]. The members of the cupin family are resistant to proteolysis and thermal degradation, increasing their ability to be immunogenic [64]. 7S vicilin and 11S legumins are two major members of this family.

Structural Features of Bet V1 (Pathogenesis-Related) Superfamily
The Bet V1 family is a recently classified family. It is also called pathogenesis-related (PR), as these proteins are produced upon pathogen attacks. The first member of this family from tobacco, P14a, was identified in 1995. The NMR structure of the PR-1 protein shows that it adopts an α + β topology and has two hydrophobic core regions. Unlike PR-1, the PR-5 protein comprises of three domains. The first domain has from ten to seven-stranded β-sheets, whereas domain II has disulfide-rich large loops that stabilize the β-sheet structure. Although there is sequence variation, this loop is conserved among the proteins of this family [73][74][75][76]. Domain III, on the other hand, forms a small loop, and it has two disulfide bonds [73]. This class also consists of a long C-terminal α-helix (α3), which is bordered by antiparallel β-sheets (from β1 to β7). Another member of this family, PR-10, has a deep, 30 Å, Y-shaped hydrophobic pocket that facilitates ligand binding [77,78]. A few examples of the crystal structures from this family are 4RYV, 4PSB, 4Q0K, 4N3E, 4JHH, 4JHI, 4JP6 and 4JHG [79][80][81][82][83].

Physiological Function of Seed Storage Proteins
Seed proteins are the storehouse for a variety of functions starting from germination to oxidative stress and resistance, and they even are allergenic. This section reports some of the known biological functions performed by seed proteins.

Germination
The primary function of seeds is to provide nutrients to the growing seedling during germination [84]. Studies have shown that during germination, the total protein concentration gradually reduces from zero to three [85] as they keep serving essential amino acids [86]. The gradual reduction ensures the continuous nutrient needs during the different phases of the germination process [87]. Various aspects of development are regulated by the key phytohormone, abscisic acid (ABA), including stress adaptations [88][89][90]. ABA signaling is modulated by different phosphatases and kinases [91]. Similarly, the hydrolysis of storage protein during germination is performed by proteases and peptidases [92]. Storage lipids, on the other hand, facilitate malate production, which is required for fatty acid synthesis [93].

Nutrient Accumulation
Seeds behave as the nitrogen and carbon sinks of plants, as they are protein reserves that mobilize during germination. They play a vital role in regulating various metabolic processes, cellular growth, and development and nutrient accumulation and as a source of energy. Several pathways are regulated during germination to improve nutrient accumulation. For example, seed storage protein, AmA1, results in an increase in the total protein concentration along with the tuber yield of potatoes [2].

Thiamine Storage
A few seed globulins are characterized as thiamine storage proteins due to their high affinity for thiamine. Extensive studies have been conducted on maize, peas and oats towards understanding the thiamine metabolism. It is found that thiamine plays an important role in key pathways such as the pentose phosphate cycle, glycolysis and the citric acid cycle [1]. Studies have shown that the thiamine binding properties reduce as the seed germinates. These proteins are found in metabolically inactive and unphosphorylated forms [94]. During germination, thiamine phosphate synthases and thiamine pyrophosphokinase convert thiamine into thiamine pyrophosphate [95][96][97]. Thus, thiamine phosphate synthase regulates the total amount of thiamine during germination.

Plant Defense Proteins
Plants have evolved to have resistance against pathogen attacks. For example, the thick cell wall of plants acts as a barrier against such attacks. Studies have shown that plants also have innate resistance mechanisms. Upon a pathogen attack, the plant triggers different responses such as the synthesis of molecules, such as phytoalexin, or it shows cell bursting. Studies have also shown that seed proteins, known as pathogenesis-related (PR) proteins or plant defense proteins, play a vital role in providing resistance against pathogens [8,98]. To date at least 13 different pathogenesis-related proteins have been identified, for example, Chitin Binding Protein (CBP, PR4), Glycine-Histidine Rich Protein, Pathogenesis-related (PR) protein 1, Chitinases (PR3), â-Glucanase (PR2), Thaumatin-Like Protein (TLP, PR5) and more [8].

Sugar-Binding Proteins
Lectins are identified as sugar-binding proteins. These are also called haemagglutinins due to their property to agglutinate red blood cells [7,99]. Lectins are mostly oligomers [100], as observed in Glycine max, Pisum sativum, Arachis hypogaea, Lathyrus ochrus and Griffonia simplicifolia. Three-dimensional structural studies have successfully given insights into the atomic interactions between the proteins and the carbohydrates [101,102]. Lectins can also bind with physiologically relevant non-carbohydrate phytohormones such as cytokinins, auxins and porphyrins [103]. Studies have also shown that lectin plays a critical role in regulating the Indole Acetic Acid (IAA) levels in plants [3]. IAA can exist in free or bound states in seeds. The most active state is when IAA exists in a free state, whereas upon binding, it has an inactive state. The structural studies on ConM, a lectin from Canavalia maritima, show its role in controlling IAA availability during seed germination [3].

Antimicrobial Role
The plant undergoes abiotic and biotic stress during different times of its life cycle. To combat this, they produce toxic compounds, low molecular weight peptides and other molecules. These low molecular weight antimicrobial peptides (AMPs) are responsible for the plants' defenses. In general, AMPs are 10-15-amino-acid-long cationic peptides. The sequence, structure, disulfide bonds and hydrophobic nature of AMPs provide the ability to destroy microbes utilizing different mechanisms [104,105]. AMPs interact with the phospholipids plasma membrane and other intracellular or extracellular sites to prevent the microbial attack [106]. A few well-characterized AMPs are snakins, thionins and defensins [107]. A few studies have shown that AMPs form pores in the membrane, resulting in the leakage of ions and metabolites or depolarization. Antimicrobial proteins that belong to the 2S albumin family identified from Leonurus japonicus and Macadamia integrifolia are LJAMP1 and MiAMP2, respectively [5,108].

Ribosome-Inactivating Proteins (RIPs)
As the name suggests ribosomal-inactivating proteins (RIPs) act on ribosomes [9,109]. RIPs are RNA N-glycosidases that can perform site-specific deadenylation, thereby inactivating the ribosomes [110,111]. Inactivation due to RIPs is observed in many non-ribosomal nucleic acid substrates [112][113][114]. In plants, RIPs have a role in the defense mechanisms of plant cells [115].

Stress Tolerance
Storage proteins show desiccation tolerance by removing all of the water content [4] and free radicals to combat adverse conditions [116]. Osmotically active compounds synthesized by plants such as osmatin induce cell tolerance in saline conditions [117]. Like osmatin, sugars such as trehalose act as an osmoprotectant. Proteins such as late embryogenesis proteins (LEA) help in fighting against harsh conditions [118,119]. Other proteins such as dehydrin, glutathione S-transferases, heat shock proteins (HSP), diseaseresistance proteins and peroxidases are stress-related proteins that regulate plant embryo development as seen in caster, rice and vitis spp. [120][121][122].

Antioxidative Properties
Protein degradation is an important event for the plant that occurs during different stages of development. This degradation event not only happens during growth and germination, but also in pathogen attacks, programmed cell death and senescence. This regulated protein degradation is therefore linked to oxidative stress conditions [123][124][125]. Reactive oxygen species (ROS) which are produced as a result cause protein carbonylation, which is an irreversible oxidation process that leads to functional impediment. The degradation of these modified proteins occurs via proteases, which are called antioxidant proteins, thereby imparting normal physiological functions. An abundantly present natural antioxidant is phytic acid, which can chelate various ions such as zinc, magnesium, iron and calcium [126,127]. Apart from this, it inhibits iron-driven ROS and lipid peroxidation [126,128,129]. Phytic acid is also known for increasing the viability of plant tissues.

Antihyperglycaemic and Antitumor Activity
Studies have shown that there is growing interest in lupin-based products, especially as functional foods or nutraceuticals. One of the protein fractions, gamma-conglutin, has a proven ability to control glycaemia and cholesterolemia [130]. The recent studies in soybeans and mung beans have identified proteins that have antihyperglycemic activities [131,132]. Various functional peptides have been identified in buckwheat, which shows antihypotensive and antitumor activity [1,117]. Studies have shown that germinated fenugreek seeds have the potential to increase the survival rate of mice with pancreatic cancer.

Seed Storage Proteins and Association with Allergy
Along with the important physiological role of seed storage proteins, they also show allergenic properties. The member of the cupin and prolamin families are among the group of proteins that are associated with food allergies [24]. A food allergy is an immune response to some foods that are considered to be foreign upon ingestion. This occurs when the body's immune system starts treating harmless food as a harmful entity, and this triggers an immune response [133,134]. This immune response could be either IgE or non-IgE mediated, and it may mimic food hypersensitivity. This reaction is mainly because of some inherited property of food. Eggs, milk, wheat, crustacean shellfish, tree nuts, fish, peanuts and soya are among the eight major food allergens [134,135]. Recently, sesame was identified as the ninth major food allergen [136]. The symptoms which occur due to food allergy vary from mild to acute ones, which are sometimes life threatening. These symptoms depend upon the localization of triggered mast cells, and therefore, they can be cutaneous (rash and eczema), respiratory (asthma) or gastrointestinal (vomiting and diarrhea) ( Table 2) [137]. Table 2. Types of hypersensitive reaction and symptoms [137].

S.No
Type of Reactions Symptoms In recent years, a large number of the three-dimensional structures of seed allergenic proteins have been identified and deposited in the protein databank. This helps in visualizing the surface topology and exposed residue, which further helps in the identification of epitopes. Furthermore, structural studies of ligand-bound protein complexes have shed light on how it modulates the allergenic property. In one of the recent studies, the authors compared the three-dimensional structures and two-dimensional proximity plots of approximately 40 proteins and indicated that allergenic proteins can be classified into four major families based on their folds [138]. Briefly, Group 1 forms the protein that has antiparallel beta strands without helical structure. Serine proteases and soybean-type trypsin inhibitors were placed in this category. Group 2 have alpha helices along with strands, and they are tightly associated, as seen in the Profilin, aspartate protease. Group 3 is also a mixture of alpha and beta strands, but the association is not strong (e.g., Lactalbumin). The last group consists of all of the proteins that are rich in alpha helixes, for example, nsLTP and 2S albumin.
The key features that make these proteins allergens are the molecular properties associated with them. These physiochemical and biochemical properties, which are listed below, are used to characterize the food allergens.

Ligand or Metabolite Binding
One of the features that allergens possess is the stability that allows them to manifest their allergenic potential. Due to natural ligand binding, the polypeptide chain stays intact, even in harsh conditions, resulting in reduced mobility/accessibility of the backbone, improving the thermal stability and protecting it against proteolysis. Food allergens are known to bind to natural ligands, ranging from metal ions to metabolites, lipids and steroids. They generally provide stability to the three-dimensional structures by occupying the mostly buried cavities [139], or they sometimes bind superficially by interacting at the surface [24]. A variety of small molecules including, but not limited to, flavonoids and phytohormones are found in the hydrophobic core of the allergenic proteins from pathogenesis-related class 10 (PR-10) proteins. The effect of ligands can be best seen in parvalbumin, where an absence of calcium triggers conformational changes resulting in the loss of IgE epitopes [139]. Likewise, a wide array of ligands including retinol and its analogs are found in a member of the lipocalin. Similar to the PR-10 class allergen, non-specific LTPs also possess a hydrophobic tunnel for lipid binding, thereby facilitating lipophilic molecules including LCFA, steroids, sphingolipids and hydrophobic drugs [140]. Recently, nsLTPs also showed the non-canonical binding of the lipids encompassing the epitopes.

Lipids or Lipid-Membrane Interactions
One other property that food allergens show is the association with cell membranes. Seed allergens can aggregate or interact with the phospholipid vesicles, bypassing gastrointestinal degradation. Other than the non-specific lipid transfer proteins (nsLTPs) that can bind with lipids, thionins and thaumatin-like proteins (TLPs) from the pathogen-relatedclass can also interact with cell membranes, resulting in depolarization and leakage [141]. Similarly, 2S albumins, 7S vicilins and 11S globulins can interact with lipids, forming emulsified structures.

Protein Stability and Mobility
To show immunogenic properties, an allergen needs to show high thermal and gastrointestinal stability [142]. As mentioned above, allergenic proteins possess the ability to dodge the proteolysis process by acquiring a resistance toward proteolytic enzymes. The presence of disulfide bonds and compact three-dimensional structures along with bound metabolites and ligands are responsible for this resistance and stability. These properties help the protein to escape the harsh environment of the GI tract and reduce its mobility. No single motif can define an allergenic nature, however, most of the allergens have disulfide bonds, enabling high thermal stability even in extreme pH conditions [143,144], for example, 2S albumins, nsLTPs, amylase and trypsin inhibitors.

Glycosylation
Another characteristic that allergen show is undergoing post-translational modification, i.e., glycosylation. The presence of sugar moieties on the protein plays an important role in stabilizing the proteins' quaternary structure. Since N-glycan-specific IgE antibodies have been discovered, it is assumed that the carbohydrate part of the glycol allergen can trigger IgE antibody production. These specific antibodies can further induce in vitro basophil. Studies on Solanum lycopersium have shown that these basophils can initiate the release of histamines against a glyco-allergen Lyc e2 [145]. Glycosylation can affect the protein stability, as observed in the 7S vicilin of peas, AraH1 [34,146]. AraH1 is one of the well-studied 7S vicilins that is termed as isoallergen and is glycosylated in nature [34,146].

Repeated Structures, Aggregates and Glycation
Other factors, repetitive structures, aggregation and glycation also affect allergenic sensitization. Many food allergens show repetitive structures such as prolamins, globulins and tropomyosin, and they form oligomers, thereby imparting thermal stability. Members of the cupin family are the best example of those which show aggregation and higher oligomers. Unlike the above examples, a few proteins become allergenic upon thermal processing which is performed at low water levels such as roasting [147]. For example, the peanut protein during roasting becomes insoluble due to a modification that occurs through Millard's reaction. In this reaction, the sugar moiety reacts with the protein amino group and forms Amadori compounds, resulting in higher glycation-glycosylation end products. Studies have shown that this glycation increases the allergenic activity of the peanuts [139].

Conclusions
This review highlights the functional diversity among the members of seed storage proteins and how the beneficial seeds can sometimes show allergenic behaviors. The structural and biological properties governing the stability of proteolytic digestion are the main culprit of this immunogenic property of the seed proteins. This systematic analysis can thus be utilized further to improve the dietary values of seeds.