Ensuring Global Food Security by Improving Protein Content in Major Grain Legumes Using Breeding and ‘Omics’ Tools

Grain legumes are a rich source of dietary protein for millions of people globally and thus a key driver for securing global food security. Legume plant-based ‘dietary protein’ biofortification is an economic strategy for alleviating the menace of rising malnutrition-related problems and hidden hunger. Malnutrition from protein deficiency is predominant in human populations with an insufficient daily intake of animal protein/dietary protein due to economic limitations, especially in developing countries. Therefore, enhancing grain legume protein content will help eradicate protein-related malnutrition problems in low-income and underprivileged countries. Here, we review the exploitable genetic variability for grain protein content in various major grain legumes for improving the protein content of high-yielding, low-protein genotypes. We highlight classical genetics-based inheritance of protein content in various legumes and discuss advances in molecular marker technology that have enabled us to underpin various quantitative trait loci controlling seed protein content (SPC) in biparental-based mapping populations and genome-wide association studies. We also review the progress of functional genomics in deciphering the underlying candidate gene(s) controlling SPC in various grain legumes and the role of proteomics and metabolomics in shedding light on the accumulation of various novel proteins and metabolites in high-protein legume genotypes. Lastly, we detail the scope of genomic selection, high-throughput phenotyping, emerging genome editing tools, and speed breeding protocols for enhancing SPC in grain legumes to achieve legume-based dietary protein security and thus reduce the global hunger risk.


Introduction
Alarming trends of anthropogenic climate change and environmental deterioration jeopardize global crop yields, resource distribution, and ecosystems, resulting in global food insecurity and undernourishment in the growing human population [1]. An estimated 840 million people globally will be undernourished by 2030 [2]. The COVID-19 pandemic will have compounded this figure, increasing the food-related hunger crisis. Dietary protein is an essential macronutrient for human growth and development, with infants requiring 1.52 g per kg body weight per day and adults recommended 0.80 g per kg body weight per day [3]. Apart from micronutrient deficiency, malnutrition from dietary protein deficiency causes 'marasmus', 'kwashiorkor' anemia, impaired immunity, and 'environmental enteric dysfunction,' most prevalent in developing and low-income countries, especially southern Asia and sub-Saharan Africa [4][5][6]. Most of the people residing in these regions predominantly consume maize, sorghum, and cassava in their daily diets, which are rich in residing in these regions predominantly consume maize, sorghum, and cassava in their daily diets, which are rich in starch but insufficient in protein [6,7]. Thus, many people, especially infants, inhabiting these regions do not consume the required daily protein, affecting their overall growth and development [5,6]. Notably, Europe imports 70% of the plant-based protein consumed by its human population [8], a trend that the increasing global human population will further exacerbate. Breeding crops, especially legumes, with high-quality traits such as SPC is a promising approach for overcoming these challenges. Grain legumes are one of the richest sources of plant-based dietary protein, providing essential amino acids and supplying the increasing demand for protein-based human diets [9]. Grain legume seeds, popularly known as 'poor man's meat', are the cheapest protein source [10][11][12]. In addition, legumebased protein could be instrumental in minimizing greenhouse gas emissions, helping to protect the environment [13]. Screening genetic variability for protein content in various legume germplasm and crop wild relatives is the first step to identifying high-protein grain legumes for the development of high-yielding, high-protein legumes. A classical genetics-based approach could identify the inheritance pattern of high-protein gene(s) in various legumes. Advances in genomics have enabled the dissection of the genetic architecture of QTLs/gene(s) in various legumes through biparental mapping and genomewide association studies. Moreover, the availability of complete reference genome assemblies and pangenomes of various legumes could assist in underpinning high-protein genomic regions at the individual or species level. Likewise, advances in functional genomics have enabled the discovery of various candidate genes that improve legume protein content and their precise function. Proteomics and metabolomics can improve our understanding of various complex pathways, molecular networks, and metabolites underlying high-protein grain legumes. Non-destructive phenomics approaches could be instrumental for screening and identifying high-protein lines with high efficiency. Emerging technologies such as genomic selection, rapid generation advancement, and genome editing could be harnessed to improve SPC, eradicate malnutrition related to dietary protein deficiency, and meet the United Nations Sustainable Developmental Goal 2.

Grain Legumes as an Important Source of Dietary Protein
Grain legumes vary in their protein content, due to fundamental limitations on the components a seed must contain to be viable. Many grains legumes have 25-40% SPC, and it may be difficult to raise that number much beyond 40%. (See Table 1). Table 1. Seed protein contents and deficient amino acids in major grain legumes.

Crop Scientific Name Range of Grain Seed Protein Content References Deficient Amino Acids
Chickpea Cicer erietinum L.

Lentil
Lens culinaris Medik 20.6% and 31.4% [17] Methionine, cysteine [18] Lupin Lupinus albus L. 35-44% [19,20] Alanine, tryptophan [21] Cicer erietinum L. 17-22% before dehulling [14,15] Methionine, cysteine threonine and valine [16] 25.3-28.9% after dehulling Lentil Int. J. Mol. Sci. 2022, 23, x FOR PEER REVIEW 2 of 28 residing in these regions predominantly consume maize, sorghum, and cassava in their daily diets, which are rich in starch but insufficient in protein [6,7]. Thus, many people, especially infants, inhabiting these regions do not consume the required daily protein, affecting their overall growth and development [5,6]. Notably, Europe imports 70% of the plant-based protein consumed by its human population [8], a trend that the increasing global human population will further exacerbate. Breeding crops, especially legumes, with high-quality traits such as SPC is a promising approach for overcoming these challenges. Grain legumes are one of the richest sources of plant-based dietary protein, providing essential amino acids and supplying the increasing demand for protein-based human diets [9]. Grain legume seeds, popularly known as 'poor man's meat', are the cheapest protein source [10][11][12]. In addition, legumebased protein could be instrumental in minimizing greenhouse gas emissions, helping to protect the environment [13]. Screening genetic variability for protein content in various legume germplasm and crop wild relatives is the first step to identifying high-protein grain legumes for the development of high-yielding, high-protein legumes. A classical genetics-based approach could identify the inheritance pattern of high-protein gene(s) in various legumes. Advances in genomics have enabled the dissection of the genetic architecture of QTLs/gene(s) in various legumes through biparental mapping and genomewide association studies. Moreover, the availability of complete reference genome assemblies and pangenomes of various legumes could assist in underpinning high-protein genomic regions at the individual or species level. Likewise, advances in functional genomics have enabled the discovery of various candidate genes that improve legume protein content and their precise function. Proteomics and metabolomics can improve our understanding of various complex pathways, molecular networks, and metabolites underlying high-protein grain legumes. Non-destructive phenomics approaches could be instrumental for screening and identifying high-protein lines with high efficiency. Emerging technologies such as genomic selection, rapid generation advancement, and genome editing could be harnessed to improve SPC, eradicate malnutrition related to dietary protein deficiency, and meet the United Nations Sustainable Developmental Goal 2.

Grain Legumes as an Important Source of Dietary Protein
Grain legumes vary in their protein content, due to fundamental limitations on the components a seed must contain to be viable. Many grains legumes have 25-40% SPC, and it may be difficult to raise that number much beyond 40%. (See Table 1). residing in these regions predominantly consume maize, sorghum, and cassava in their daily diets, which are rich in starch but insufficient in protein [6,7]. Thus, many people, especially infants, inhabiting these regions do not consume the required daily protein, affecting their overall growth and development [5,6]. Notably, Europe imports 70% of the plant-based protein consumed by its human population [8], a trend that the increasing global human population will further exacerbate. Breeding crops, especially legumes, with high-quality traits such as SPC is a promising approach for overcoming these challenges. Grain legumes are one of the richest sources of plant-based dietary protein, providing essential amino acids and supplying the increasing demand for protein-based human diets [9]. Grain legume seeds, popularly known as 'poor man's meat', are the cheapest protein source [10][11][12]. In addition, legumebased protein could be instrumental in minimizing greenhouse gas emissions, helping to protect the environment [13]. Screening genetic variability for protein content in various legume germplasm and crop wild relatives is the first step to identifying high-protein grain legumes for the development of high-yielding, high-protein legumes. A classical genetics-based approach could identify the inheritance pattern of high-protein gene(s) in various legumes. Advances in genomics have enabled the dissection of the genetic architecture of QTLs/gene(s) in various legumes through biparental mapping and genomewide association studies. Moreover, the availability of complete reference genome assemblies and pangenomes of various legumes could assist in underpinning high-protein genomic regions at the individual or species level. Likewise, advances in functional genomics have enabled the discovery of various candidate genes that improve legume protein content and their precise function. Proteomics and metabolomics can improve our understanding of various complex pathways, molecular networks, and metabolites underlying high-protein grain legumes. Non-destructive phenomics approaches could be instrumental for screening and identifying high-protein lines with high efficiency. Emerging technologies such as genomic selection, rapid generation advancement, and genome editing could be harnessed to improve SPC, eradicate malnutrition related to dietary protein deficiency, and meet the United Nations Sustainable Developmental Goal 2.

Grain Legumes as an Important Source of Dietary Protein
Grain legumes vary in their protein content, due to fundamental limitations on the components a seed must contain to be viable. Many grains legumes have 25-40% SPC, and it may be difficult to raise that number much beyond 40%. (See Table 1).   Table 1). Chickpea seed contains two main proteinsglobulin (11S legumin and 7S vicilin) and albumin-with low amounts of glutelins and Glycine max (L.) Merr.
Cowpea (Vigna unguiculata L. Walp.) is a 'multi-functional' grain legume widely used for human consumption. It helps mitigate the challenges of malnutrition in sub-Saharan Africa, and tropical and sub-tropical regions globally [52,53]. Cowpea SPC ranges from 15 to 25% [33,34] (see Table 1). Cowpea storage proteins are abundant in lysine and tryptophan but deficient in methionine and cysteine [53]. Globulins are the most abundant storage protein fraction of cowpea grain, followed by albumins, glutelins, and prolamin [54].
White lupin (Lupinus albus L.) seeds are a rich reservoir of protein containing up to 44% [19,62], with two major classes of protein-albumin (15%) and globulin (85%) [63]. The globulin protein comprises α-, β-, γ-, and δ-conglutins [20]. Despite some allergenic effects in white lupin seed protein, they are low in antinutritive properties compared with other grain legumes such as pea and soybean [62,64]. Moreover, white lupin seed contains higher amounts of some important amino acids (lysine, phenylalanine, arginine, and leucine) than soybean, rendering it a high-demand grain legume from a nutritional point of view [65].
Soybean (Glycine max (L.) Merr.) is rich in protein, ranging from 35 to 45%. It is deficient in methionine [22,23] but has sufficient lysine to overcome the lysine deficiency of cereals [66]. In 2018, it was estimated that soybean alone contributed 70% of the global protein meal [67].
Mung bean (Vigna radiata L.) contains easily digestible protein and several essential micronutrients [68]. It is an excellent source of protein except for sulfur-containing amino acids (methionine and cysteine) [69]. Due to its ease of digestibility relative to other legumes [70] and low hypoallergic properties, mung bean is used as a weaning food for infants [71]. Moreover, mungbean is a good meat substitute for vegetarians and those who cannot afford animal-based dietary protein [12].
Pigeon pea (Cajanus cajan (L.) Millsp) seeds contain 20-22% protein and play an essential role in providing plant-based dietary protein to the vegetarian population in India, thus ensuring protein-based food security [73].
Breeding for high SPC in soybean is a primary objective in soybean breeding programs; however, progress has been limited by the negative relationship between SPC and grain yield and oil content [24,130]. For example, Bandillo et al. [131] and Warrington et al. [132] reported a highly negative correlation between the soybean SPC allele and seed oil content, reducing oil content by 1% for every 2% increase in SPC.
Hence, harnessing the available genetic variability for SPC requires the large-scale screening of land races, CWRs, and grain legume germplasm locked in gene banks across the globe.

Mendelian Inheritance of Seed Protein Content in Legumes
Several researchers have worked out the genetics of SPC based on Mendelian genetics in various grain legumes [142][143][144]. Considering pea storage proteins (legumin and convicilin), Matta and Gatehouse [145] mapped the legumin gene (Lg-1), behaving as a single Mendelian gene with five alleles on LG7, and the convicilin gene (Cvc), behaving as a single Mendelian gene on LG2 using seeds developed from 1238 × 1263, 110 × 807 and 110 × 851 F 2 crosses. Subsequently, Mahmoud and Gatehouse [146] explained the monogenic inheritance of another pea SPC vicilin (Vc-1) gene controlled by two codominant genes located on LG7 using an F 2 cross from 360 × 611.
Perez et al. [147] revealed the genetic basis of high and low SPC in pea using the genetics of seed size (round vs. wrinkled). They found that round-seeded pea plants (RR/RbRb) had low SPC with low albumin content, while those with recessive alleles (rr/rbrb) had high SPC and high albumin content [147]. High heritability of protein content and its control by a few gene(s) is an opportunity to improve protein content in cowpea [92]. Moreover, diallel crosses of six populations derived from two high-protein lines and two high-yielding soybean lines revealed a significant negative correlation between protein content and yield in the high protein × high protein population but a significant positive correlation between protein content and yield in the high yielding × high yielding population [148]. In pigeon pea, an analysis of F 1 and F 2 progenies derived from crosses involving four parents revealed a minimum of 3-4 genes controlling protein content [149]. The authors concluded that the low protein trait is partially dominant over the high protein trait.
Various studies have reported a significant effect of environment on SPC [150][151][152]. In soybean, this significant effect involved multiple genes and the quantitative nature of the SPC trait [150,151]. In chickpea, an F 2 segregating population developed from ICC5912 (blue flowered) × ICC17109 (white flowered) revealed the quantitative nature of the SPC trait and its high negative correlation with seed yield and seed size [78]. A 5 × 5 half diallel cross of cowpea lines revealed the presence of additive and non-additive gene effects for SPC. High seed albumin, prolamin, and globulin were associated with positive effects of the dominant gene, while high SPC and glutelin content were associated with recessive genes [153]. In lentil, Kumar et al. [154] also reported the quantitative nature of the SPC trait.
High genetic variation in lentil seed storage protein resulted from high G × E interactions exhibiting moderate heritability (31.3%) [152].

QTL Mapping for Seed Protein Content
Advances in grain legume genomics have facilitated the identification of underlying QTLs controlling SPC using biparental mapping populations in various grain legumes [118,119,[155][156][157].
Few studies have uncovered QTLs controlling SPC in chickpea. However, one study that phenotyped recombinant inbred lines (RILs) derived from ICC995 × ICC5912 across four environments and used a genotyping by sequencing approach delineated one major effect QTL q-3.2 for SPC that explained 44.3% of the phenotypic variation (PV) on LG3 [158].
An evaluation of a Terese × K586 RIL population in five different environments identified 14 SPC QTLs located on LGI, LGIII, LGIV, LGV, LGVI, and LGVII [119]. The study identified the underlying candidate gene for the QTL on LGI as the Rgp gene (cell wall synthesis) and two underlying candidate genes for the QTL on LGV as Ls (GA biosynthesis) and Rbcs4 (encoding small Rubisco subunit) [119].
In soybean, the SPC trait is controlled by multiple alleles and highly influenced by G × E interactions [150]. More than 300 QTLs contributing to SPC in soybean have been reported (http://www.soybase.org, (accessed on 10 May 2022)); [161] and reside across all chromosomes; however, major SPC QTLs are on chromosomes 5, 15, and 20. Diers et al. [155] first reported a major QTL governing high SPC on chromosome 20 in a population developed from crossing cultivated and wild soybean, which was later mapped to a 3 cM on LGI (Nichols et al., 2006) [156]. The location of this QTL was subsequently narrowed to 8 [167]. Of these identified QTLs, qPro_20 QTL was stable across the four tested environments. SSR, DArT, and DArTseq analysis of five RIL-based mapping populations for high and low SPC and one high × high SPC identified two major QTLs controlling SPC on LG15 and LG20 in soybean [168]. Furthermore, bulk segregation analysis of four high × low SPC mapping populations unveiled novel SPC-controlling genomic regions on LG1, 8,9,14,16,17,19, and 20 [168]. An assessment of soybean RILs developed from Linhefenqingdou × Meng 8206 in six different environments identified 25 SPC QTLs explaining up to 26.2% PV [169]. Of the identified QTLs, qPro-7-1 was highly stable across all tested environments. Recently, Fliege et al. [137] cloned a major SPC governing QTL (cqSeed protein-003) and elucidated the underlying causative candidate gene Glyma.20G85100, encoding a CCT domain protein. Thus, efforts are needed to fine map or clone major QTLs controlling SPC in other grain legumes to delineate the underlying candidate gene(s) and their function for genomic-assisted breeding to improve SPC in grain legumes.  LG6, LG13, LG20 - [198] AFLP = Amplified fragment length polymorphism; SNP = Single nucleotide polymorphism, SCAR = Sequenced cleaved amplified region, CAPS = cleaved amplified polymorphic sequence, RAPD = Random Amplified polymorphic DNA, SSR = Simple Sequence Repeats, ISSR = Inter Simple Sequence Repeat.

Underpinning Genomic Region/Haplotypes Controlling High Protein Content through GWAS
Traditional biparental QTL mapping for obtaining genetic recombinants controlling complex traits such as protein content is limited due to the incorporation of only two parents in the crossing program. However, the increased capacity of next generation sequencing technology to derive single nucleotide polymorphism molecular markers in association with advanced phenotyping facilities has facilitated the development of numer-ous genetic recombinants and identification of the underlying plausible candidate genomic regions controlling protein content in various grain legumes using GWAS [81,174,183,186]. Jadhav et al. [81] performed association mapping for SPC using SSR markers on a panel of 187 chickpea genotypes (desi, kabuli, and exotic). Nine significant marker trait associations (MTAs) for SPC were uncovered on LG1, LG2, LG3, LG4, and LG5, explaining 16.85% PV. A recent GWAS using high-throughput SNP markers on 140 chickpea genotypes subjected to drought and heat stress to shed light on MTAs with various nutrients uncovered 66 (non-stress), 46 (drought stress), and 15 (heat stress) MTAs for SPC [199], which could be used to identify high-protein lines for improving SPC in chickpea.
A GWAS relying on multilocation and multi-year phenotyping of a large set of pea germplasm representing diverse regions across the globe was undertaken to identify significant MTAs for agronomic and quality traits, including protein content [174]. Two significant MTAs controlling SPC were identified: Chr3LG5_138253621 and Chr3LG5_194530376. GWAS using 16,376 SNPs in 332 chickpea genotypes (desi and kabuli) delineated seven genomic loci controlling SPC and explaining 41% combined PV [170]. The authors also validated five SPC-controlling genes in a RIL-based mapping population ICC 12299 × ICC 4958, encoding cytidine (CMP), deoxycytidylate (dCMP) deaminases, ATP-dependent RNA helicase DEAD-box, and zinc finger protein. An earlier comprehensive GWAS of 298 soybean lines using Illumina Infinium and GoldenGate assays identified 17 significant genomic regions controlling SPC [180]. Among the SPC-controlling genomic regions, LG20 was important as it contained six candidate genes Glyma20g19680, Glyma20g21030, Glyma20g21080, Glyma20g19630, Glyma20g19620, and Glyma20g21040 in the 2.4 Mbp interval. Another GWAS performed on 139 soybean lines revealed eight significant regions contributing to SPC on LG5, LG8, LG10, LG14, LG16, LG19, and LG20 [183]. In addition, a major QTL qPC19 controlling SPC on LG19 in the 42.3 to 44.2 Mb interval explained 10.3% PV [183]. Likewise, an assay using SoySNP660k BeadChip in 144 soybean lines developed from four-way RILs identified eight candidate genes controlling SPC: Glyma.03G100800, Glyma.10G207300, Glyma.12G019300, Glyma.12G112900, Glyma.14G081600, Glyma.18G028600, Glyma.18G07110, and Glyma.18G071300 (Zhang et al. [186]; see Table 3). A comprehensive GWAS study in a collection of 877 soybean accessions, tested in five different environments in Midwest and southern USA using SoySNP50K iSelect BeadChip [188], identified significant genomic regions for SPC that coincided with previous QTL/genomic regions identified on chromosomes 15 and 20 [161,166]. Three SNPs identified within 91 kb overlapped the 118 kb genomic region of meta-QTL controlling SPC and seed oil content previously reported by Van and McHale [161]. Some important candidate genes identified in these genomic regions-Glyma.15g049100 Glyma.15g049200, Glyma.15g050100, and Glyma.15g050600-participate in partitioning carbon and regulating protein content (Lee et al. [188]; see Table 3). The authors also elucidated eight novel genomic regions controlling methionine, cysteine, lysine, and threonine contents. A GWAS using whole genome sequencing data of 631 soybean accessions combined with a biparental QTL analysis uncovered a pleotropic gene GmSWEET39 (encoding sugar transporter) controlling SPC and seed oil content in soybean [164]. The authors also reported that a 2 bp (CC) deletion in Glyma.15G049200 underlying the GmSWEET39 allele rendered high seed oil content and low SPC.
A comprehensive association and linkage analysis surveyed 985 soybean accessions, including wild species, landraces, and old and modern cultivars, to capture haplotypic variation in the high SPC locus cqProt-003 on chromosome 20 [200]. The study uncovered significant trait-associated genomic regions within a 173 kb linkage block containing three causal candidate genes: Glyma.20G084500, Glyma.20G085250, and Glyma.20G085100 [200]. Of these, Glyma.20G085100 (containing a 304 bp deletion and trinucleotide insertions) was tightly linked with the high protein content phenotype [200].

Functional Genomics Shedding Light on Causal Candidate Gene(s) Contributing Seed Protein Content in Grain Legumes
In the last decade, unprecedented advances in RNA sequencing have expedited functional genomics research, especially transcriptome analysis for discovering trait gene(s), in various grain legumes [197]. Numerous studies have elucidated various SPC-contributing candidate gene(s) and their functional roles in grain legumes; notably, cDNA cloning based functional characterization of genes encoding storage proteins such as pea seed albumin (PA1, PA1b) [201] and conglutin family in narrow leaf lupin [202]. Functional characterization of genes encoding storage protein in narrow leaf lupin by sequencing cDNA clones from developing seed identified 11 new storage protein (conglutin family)encoding genes [202]. Transcriptome analysis via RNA-seq shed light on 16 conglutin genes encoding storage protein in the Tanjil cultivar of narrow leaf lupin [203]. Conglutin gene(s) expression is similar in lupin varieties of the same species but distinct between species [203]. In soybean, functional genomic analysis via gene expression profiling identified 329 differentially expressed genes underlying qSPC_20-1 and qSPC_20-2 QTL regions accounting for SPC using a QTL-seq approach [197]. Of the nine candidate genes underlying these QTL regions, Glyma.20G088000, Glyma.20G111100, and Glyma.20 g087600 were functionally validated and identified as the most potential candidate genes controlling SPC [197]. RNAi technology-a robust functional genomic tool-offered novel insight into the regulatory role of Glyma.20g085100 harboring transposon insertion in the SPC-controlling genomic region of soybean [137]. Reduced expression of Glyma.20g085100 using RNAi enhanced the protein level in the low-protein Thorne soybean genotype [137]. Most functional genomics studies identifying SPC-controlling candidate genes with their putative function in major legumes have involved soybean; thus, studies should focus on elucidating candidate genes and deciphering the molecular mechanism for improving SPC via functional genomics in other grain legumes.

Proteomics and Metabolomics Shed Light on the Genetic Basis of High Seed Protein Content in Legumes
Proteomics helps us understand the entire set of proteins produced at a specific time under a particular set of conditions in an organism or cell [204]. This approach could be used to discover novel seed storage proteins and inquire about the molecular basis of enhancing SPC in various legumes [205]. A novel protein known as methioninerich protein was discovered in soybean using a two-dimensional (2D) electrophoresis technique [205]. Later, a 2D-PAGE proteomic tool distinguished wild soybean (G. soja) from cultivated soybean based on high storage proteins (beta-conglycinin and glycinin) detecting 44 protein spots in wild soybean and 34 protein spots in cultivated soybean; thus, this helped in identifying high-protein soybean genotypes [206]. Combined SDS-PAGE and MALDI-TOF MS analysis in LG00-13260, PI 427138, and BARC-6 soybean genotypes revealed enhanced accumulation of beta-conglycinin and glycinins and thus high grain protein content compared to William 82 ([207]; see Table 4). A combined SDS-PAGE and MALDI-TOF MS analysis, comparing protein content in nine soybean accessions with William 82, revealed significant protein content differences in seed 11S storage globulins [208]. In common bean, proteome analysis of common bean deficient in seed storage proteins (phaseolin and lectins) revealed elevated sulfur amino acid content due to increased legumin, albumin 2, and defensin [209]. Santos et al. [210] characterized the protein content of 24 chickpea genotypes using a proteomics approach to explore genetic variability in storage protein. High-performance liquid chromatography analysis indicated the presence of sufficient genetic variability for SPC, with some genotypes rich in seven amino acids. In pea, a mature seed proteome map of a diverse set of 156 proteins identified novel storage proteins for enhanced SPC [211]. High beta-conglycinin and glycinins Two-dimensional electrophoresis SDS-PAGE [207] LG00-13260 High 11S storage globulins SDS-PAGE, MALDI-TOF, two-dimensional electrophoresis [208] PI407788A High storage protein 2D-PAGE [206] Wild soybean Asparagine, free 3-cyanoalanine, and L-malic acid GC-TOF/MS [216] An iTRAQ-based proteomics analysis of CX (low SPC) and LX (high SPC) faba bean genotypes revealed differentially abundant proteins involved in amino acid metabolism [56]. Furthermore, a KEGG analysis suggested that valine, leucine, histidine, and β-alanine metabolism were significantly enriched by differentially abundant proteins [56].
Likewise, metabolomic studies help us understand various metabolic pathways and metabolites controlling protein accumulation during seed development [217]. A meticulous amino acid profiling study using contrasting high and low SPC soybean lines revealed that the ability of embryos to assimilate nitrogen and synthesize storage proteins determines SPC accumulation [217]. Further, the authors reported that high SPC at maturity is related to increased accumulation of asparagine in developing cotyledons.
A metabolomics study using GC-TOF/MS in contrasting seed protein soybean lines showed a high abundance of metabolites (asparagine, aspartic acid, glutamic acid, free 3-cyanoalanine) that were positively associated with SPC and negatively associated with seed oil content [216]. However, various sugars (sucrose, fructose, glucose, mannose) had negative associations with seed protein and oil content [216]. Saboori-Robat et al. [218] undertook metabolite profiling of common bean genotypes differing in S-methylcysteine accumulation in seeds and found that S-methylcysteine accumulates as γ-glutamyl-Smethylcysteine during seed maturation, with a low accumulation of free methylcysteine. Amino acid profiling of Valle Agricola, a nutritionally rich chickpea genotype cultivated in southern Italy, revealed that 66% of the total amino acids comprised glutamic acid, glutamine, aspartic acid, phenyl alanine, asparagine, lysine, and leucine, while~40% comprised histidine, valine, isoleucine, leucine, methionine and threonine [219]. Further advances in metabolomics could improve our understanding of various cellular metabolism networks and pathways related to SPC in legumes. Thus, integrating various 'omics' tools and emerging novel breeding approaches could assist in developing protein-fortified grain legumes (see Figure 1).
Amino acid profiling of Valle Agricola, a nutritionally rich chickpea genotype cultivated in southern Italy, revealed that 66% of the total amino acids comprised glutamic acid, glutamine, aspartic acid, phenyl alanine, asparagine, lysine, and leucine, while ~40% comprised histidine, valine, isoleucine, leucine, methionine and threonine [218]. Further advances in metabolomics could improve our understanding of various cellular metabolism networks and pathways related to SPC in legumes. Thus, integrating various 'omics' tools and emerging novel breeding approaches could assist in developing protein-fortified grain legumes (see Figure 1).

Progress of Genetic Engineering and Scope of Genome Editing for Improving SPC in Grain Legumes
Numerous studies have been undertaken to improve the essential amino acid content in various grain legumes by manipulating amino acid encoding genes using genetic engineering [220][221][222]. Many examples of improved essential amino acid contents, especially sulfur-rich amino acids, by manipulating gene(s) in various legumes using transgenic technology are available. Chiaiese et al. [223] introduced an albumin transgene encoding methionine and cysteine-rich protein from sunflower seed into chickpea to improve seed methionine content. The transgenic chickpea seed accumulated more methionine than the control. Likewise, Molvig et al. [224] improved seed methionine content in narrow leaf lupin by introducing sunflower seed albumin transgene at the transgenic level. However, cysteine-rich storage proteins, especially conglutin delta, declined in narrow leaf lupin seed due to low expression of the cysteine-encoding gene (Tabe and [225]; see Table 5). Introducing Bertholletia excelsa methionine-rich 2S albumin gene into common bean enhanced seed methionine content by more than 20% over non-transgenic plants [220]. Improving sulfur-rich amino acids, such as methionine and cysteine, in soybean has been a research priority, made possible by introducing the 15 kDa [226], 27 kDa [227], and 11 kDa [221,228] δ-zein encoding protein genes from maize using genetic engineering. Despite some successes introducing transgenes to enhance SPC in grain legumes at the transgenic level, transgenic regulatory or governing bodies do not allow or restrict the use of these genetically engineered improved grain legumes commercially due to health and environmental safety issues. To overcome these stringent issues related to genetically modified crops, rapidly evolving genome editing technologies could help develop enhanced-protein grain legumes without introducing foreign genes. Using genome editing technologies, various crop plants have improved quality traits, such as increased fragrance and low gluten, starch, or oleic acid contents (for details, see [231]). However, the use of genome editing for SPC fortification in grain legumes is limited; future studies could adopt these powerful technologies to improve SPC by editing various gene(s), such as those encoding essential sulfur-rich amino acids or improving storage proteins.

Whole Genome Resequencing and Pangenome Sequencing for Elucidating Novel Structural Variants Related to High SPC across the Genome
Current breakthroughs in genome sequencing technologies have facilitated the sequencing of the global germplasm of various crops, including legumes, to underpin novel structural variants (SVs) such as presence/absence and copy number variations prevailing at the genome level [232,233]. An analysis combining association and biparental mapping using WGRS data of 631 soybean genotypes discovered a pleiotropic sugar transporter QTL gene GmSWEET39 on chromosome 15 controlling SPC and seed oil content [164]. The authors suggested that deletion of 2 bp CC in the underlying causative Glyma.15G049200 gene reduced SPC and enhanced seed oil content. Likewise, a pangenomic approach can describe the full complement of genes in the 'core genome' and 'accessory genome' to capture structural variation (not available in 'single reference genome assembly') at the species level [232]. Pangenome assemblies have been reported in chickpea [233], pigeon pea [234], soybean [235] and mungbean [236]. Thus, future construction and annotation of pangenomes for different grain legumes could reveal missing information on SPC structural variations in the available reference genome assemblies, expediting the development of grain legumes with enriched protein.

Non-Destructive Phenomics Approach for Quantifying High Protein Content in Grain Legumes
Several high-throughput phenotyping approaches have been developed to bridge the genotyping and phenotyping gap for various quality traits, including protein content [237][238][239]. Advances in high-throughput non-destructive phenotyping approaches such as hyperspectral technologies, near-infrared reflectance spectroscopy, and nuclear magnetic resonance have enabled the phenotyping of various biochemical attributes in cereal and legume seeds, including protein content, with high accuracy and efficiency [237][238][239][240][241]. For example, Raman spectroscopy has been used to measure SPC in soybean [237]. Earlier, near-infrared reflectance spectroscopy was used to screen high-protein soybean genotypes [242,243]. Thus, non-destructive high-throughput phenotyping approaches could save time when screening high-SPC lines.

Genomic Selection and Rapid Generation Advances for Selecting High SPC Lines to Increase Genetic Gain
Unprecedented advances in genome-wide molecular marker development allow the use of genomic selection (GS) for predicting the genetic merit of progenies with complex traits without observing their phenotypic values from large target populations by developing a prediction model and calculating genomic-assisted breeding values in a 'training population' with known phenotypic observation [244]. The benefit of GS for improving genetic gain could be harnessed by increasing selection intensity (i) and selection accuracy (I), and reducing the breeding cycle length (L) in the breeder's equation: ∆G = R = h 2 S = σ a × i × r/L. [∆G = genetic gain, R = response to selection, h 2 = heritability, σ a = additive genetic variance]. Notable instances of using GS as a substitute for phenotypic selection for complex traits include grain yield under moisture stress in chickpea [245], common bean [246], cowpea (Ravelombola et al., 2021) [247], and pea [248,249] and cooking time in common bean [250]. However, GS has limited application for selecting high SPC genotypes in legumes [251]. A rrBLUP model was used to predict SPC in 306 pea genotypes derived from three RILs, tested in three autumn seasons in northern and central Italy, to determine any advantage of GS over phenotypic selection for SPC [251]. The mean predictive ability of GS for SPC was 0.53. Future studies could use GS to improve SPC and select various grain legume progenies with high SPC without phenotyping.
Likewise, the emerging benefits of speed breeding techniques could be harnessed by using optimum light intensity, photoperiod and temperature to enhance the rate of photosynthesis, resulting in early flowering and plant maturity, thus shortening the breeding cycle [252]. Speed breeding protocols have been established in chickpea, lupin, lentil, pea, soybean, and faba bean [253][254][255][256][257]. Further optimization of speed breeding protocols could fast-track improvements in various traits of breeding importance, including SPC, in grain legumes for sustaining global food security.

Fundamental Constraints on Seed Protein Content
As the offspring of plants, seeds are subject to several fundamental trade-offs that impact their size and composition. Seeds have fundamental required components, such as cell walls, and some amount of carbohydrates, lipids, and nucleic acids to make a viable embryo. Consequently, there are limits to potential selection on protein content. For example, long term selection on maize seed oil content has shown limits to the power of selection (e.g., [258]). Over the past two or more decades, ecologists have increasingly conceptualized these trade-offs as part of an economic spectrum, which influences the range of traits observed in leaves [259,260], stems [261,262] and roots [263]. As a dispersal unit, seeds are able to travel farther if they are smaller, but establish more readily if larger [264]. In many individual legume crops, wild relatives have presumably been under millenia of selection for these trade-offs in seed size and composition, limiting genetic variation and architecture. However, few researchers have linked evolutionary and ecological limits on seed composition to efforts at breeding, nor looked carefully at how they impact seed protein content. Seed size is generally an important co-variate in seed protein content, although among legumes its role differs somewhat among grain legumes.
Recent elegant work in chickpea suggests that these constraints are in fact real, and shape contemporary genetic diversity in seed size and composition. Chickpea has a QTL hotspot for seed size, leaf size, drought responses, and other "Vigour" traits. Nguyen and colleagues have recently fine mapped this QTL [265,266] showing it to be due to variation in a TIFY gene, which mutant studies in Arabidopsis have shown to impact seed size. Natural variation at this locus suggests it contributes significantly to a seed-size number trade-off, among parents that also differ in seed protein content.

Conclusions and Future Perspective
The increasing human population is facing increasing malnutrition-related problems such as dietary protein deficiency, especially in underprivileged and developing countries. Supplying protein-rich legumes improved through plant breeding and molecular breeding approaches could minimize the rising challenge of hunger and malnutrition-related problems. Moreover, improved grain legume dietary protein could be an important and economically viable alternative to high-cost animal-based dietary protein. Protein biofortification of major grain legumes will help satisfy the daily needs of human dietary protein in underprivileged and developing countries. Accurate characterization of various crop gene pool and landrace haplotypes with genetic variation for SPC needs urgent attention to accelerate SPC improvement in legumes. Harnessing the benefits of pre-breeding approaches could play a pivotal role in introgressing gene(s)/QTLs regulating high protein content from CWRs into high-yielding low-protein elite legume cultivars [96]. Recent advances in genomics, genome-wide association mapping, and whole genome resequencing approaches and the availability of complete genome and pangenome sequences in vari-ous legume crops could help underpin the causative alleles/QTLs/haplotypes/candidate genes controlling high protein at the genome level, enabling genomics-assisted selection for improving protein concentration in grain legumes. Likewise, functional genomics, proteomics, and metabolomics could enrich our understanding of the complex molecular networks controlling improved protein content in various grain legumes. Selecting protein-rich grain legume genotypes in assessed germplasm or segregating progenies is challenging as most protein-estimating processes are based on destructive methods. Thus, high-throughput non-destructive methods are important for selecting high-protein legume genotypes. Likewise, genomic selection and rapid generation advances could be important for selecting high-protein progenies and rapidly developing protein-dense legumes. To overcome the challenges of transgenic technology, genome editing will help us manipulate and edit genes(s) governing high protein content at specific locations on legume genomes to enhance SPC. Capitalizing on these modern breeding tools, we should be able to identify grain legumes with improved protein content without compromising yield, as these two traits have a strong inverse relationship [123]. Hence, the amalgamation of approaches could help combat the growing protein-based malnutrition and lower the hunger risk, ensuring sustainable human growth globally.