Wheat (Triticum aestivum
L.) is the food commodity for more than third of the world's population. Wheat grain is a rich source of starch (carbohydrate). Therefore wheat is primarily considered as a source of energy [1
]. However, wheat grain contains also moderate amounts of dietary proteins which determines, to a large extent, both the end-use quality and wheat grain price [2
]. Wheat grain total protein content (GPC) ranges from 9 to 15% of the dry weight [3
]. Although, GPC depends primarily on the genotype; the environment and genotype × environment interaction also plays an essential role in grain protein accumulation [5
Nitrogen fertilization is the critical environmental factor that affects protein accumulation; if nitrogen fertilization stays constant, increased yield often results in decreased protein content because of nitrogen dilution by the large biomass [6
]. Furthermore, under stress conditions, grain protein content tends to be higher compared to either irrigated or nitrogen-limited conditions [10
]. Water deficit increases grain total protein content, but it decreases grain yield [11
]. Wheat genotypes with higher yield potential tend to have lower protein content and vice versa
]. Several explanations for the negative relationship between grain protein content and yield have been proposed [13
]. However, some wheat genotypes deviate from the previous relationship, i.e., they produce high yield and high grain protein content [14
]. That deviation implies that the nitrogen supply to grains was increased, but it was not associated with a reduction in the grain yield [15
Exploring genetic resources to identify wheat genotypes with high grain protein content is the most efficient way to improve the nutritional value of wheat grains [16
]. Wheat breeders were successful in selecting genotypes with a high total protein content that were generated from cultivated materials such as “Atlas,” “Atlas66” and “Nap Hall” [17
]. Previous studies reported higher GPC in landraces and wild relatives compared to modern wheat genotypes [18
]. A wild emmer (Triticum turgidum var dicoccoides
) genotype was identified in Israel, i.e., “FA15-3” which was found to be able to accumulate 40% protein when given adequate nitrogen fertilization [20
]. High grain protein content gene GPC-B1
allele which was originally identified in wild emmer wheat, was transferred to a spring wheat genotype and increased grain total protein content by 3% [21
]. The GPC-B1
allele accelerates senescence and increases mobilization of nitrogen, zinc, and iron to the developing grains [22
]. Thus, accessions containing this allele most likely will contain high protein as well as high iron and zinc [23
]. However, most of the modern tetraploid and hexaploid wheat genotypes have lost a functional allele of GPC-B1
]. During the last decade, several QTLs for GPC were mapped using association mapping (AM) and biparental populations on chromosomes 5A, 5D, 2D, 2B, 6A, 6B and 7A [24
], that QTLs were validated and used in marker-assisted selection to improve GPC.
Marker-assisted selection (MAS) was defined as one of the promising avenues to improve wheat total protein content and grain yield [32
]. The critical step in MAS is to identify molecular markers associated with desirable phenotypic traits using AM or biparental populations [33
]. Association mapping (AM) can be applied to structured populations [34
], thus incorporating a broad spectrum of germplasm is possible [34
]. However, the successful application of association mapping requires comprehensive phenotypic and genotypic data. The dramatic decrease in the genotyping costs [37
] in addition to the availability of high throughput phenotyping technologies such as Near-Infrared Spectroscopy (NIR) make AM a viable approach for large populations [38
]. Furthermore, A robust sequence and annotation of the wheat genome are now available [39
] with the latest developments in genomic technologies. This might allow researchers to identify new loci associated with GPC genes and dissect the genetic architecture of GPC in wheat.
Three strategies were adopted to select for high GPC and grain yield, i.e., selecting for high grain protein alone, selecting for high grain protein within highest yielding genotypes, and using an index to simultaneously select for both protein and yield [40
]. In the current study, the most recent developments in genotyping and phenotyping technologies were applied to identify genomic regions associated with GPC and select accessions with high and low grain protein content using a worldwide collection of spring wheat accessions.
3.1. Grain Protein Content (GPC)
Normal distribution and homogeneity of variance for grain protein content (GPC) were observed across the four environments (two seasons and two water regimes). Thus, combined analysis of variance across environments was conducted. Combined analysis of variance for GPC indicated a highly significant effect (p
-value < 0.01) for the environments, genotypes, and genotype × environment interaction (Table 1
). Broad-sense heritability estimates ranged from 0.49 to 0.60 for well-watered and water deficit conditions, respectively. Furthermore, the broad sense heritability estimates across years, and water regimes (the four environments) was 0.64 lsmeans of the grain protein content (GPC) ranged from 5.96 to 17.11% with a mean of 10.15 under well-watered conditions during 2016, and 6.88 to 17.43 with a mean of 9.67 in 2017 growing seasons. On the other hand, under water deficit conditions, GPC ranged from 11.12 to 18.5 with a mean of 14.9 in 2016 and 9.8 to 18.3 with a mean of 13.97 in 2017 growing seasons. Although, no significant difference was detected for the difference between means of the growing seasons, the difference between the lsmeans of the water regimes was highly significant, based on HSD at 0.01 probability level. Overall, our results indicated that water deficit increased GPC by 29% across the two growing seasons (Figure 1
Furthermore, the correlation between GPC obtained from well-watered with that obtained water deficit across all genotypes was positive and significant (r
= 0.23, p
-value = 0.01). The first quartiles for the GPC across growing seasons (the cut off for the lowest 25%) under well-watered and water deficit conditions were ≤8.36 and 13.41, respectively (Figure 1
). Whereas, the third quartile (the cut off for the highest 25%) of the genotypes under well-watered and water deficit conditions were ≥11.35 and 14.66, respectively. The first and third quartiles in this study were used as criteria to classify the genotypes into high and low GPC genotypes. Therefore genotypes with GPC ≤8.36 under well-watered and ≤13.41 under water deficit conditions, were defined as low protein genotypes. On the other hand, genotypes with GPC ≥11.35 under well-watered and ≥14.66 under water deficit conditions were defined as high protein genotypes. Grain protein content (GPC) for all genotypes under well-watered and water deficit conditions (Figure 2
) indicated that 166 (7.8% of the genotypes) had high protein content under well-watered and water deficit conditions concurrently. Another, 200 genotypes (9.47%) were classified as low protein genotypes under both well-watered and water deficit conditions concurrently. The top 20 accessions with the highest GPC under well-watered and water deficit conditions are presented in Table 2
, in which no overlapping accessions between the two water regimes were detected. Out of the top 20 accessions, detected under well-watered conditions, nine landraces were present. On the other hand, 18 landraces were present among the top 20 accessions detected under water deficit conditions. Overall, the estimated lsmeans from the landraces (882 accessions) under well-watered conditions was 10.9; which was 11.22% higher than the overall average of all other accessions (Table 2
). Additionally, under water deficit conditions the average GPC for the landraces was 15.04 which was 7.9% higher than the overall average of all other accessions. Overall, our results indicate that moisture has a significant impact on GPC accumulation in wheat. Landraces had higher GPC, compared to other germplasm used in the current study.
3.2. Association Mapping for Grain Protein Content
A total of 3215 mapped SNPs were used for estimating the extent of linkage disequilibrium (LD) in the 2111 wheat accessions. Only SNP loci having MAF ≥0.05 and missing values ≤10% were used to estimate r2
across all SNPs. The estimates of r2
for all pairs of SNPs loci were used to determine the rate of LD decay with genetic distance. Across the three wheat genomes, i.e., A, B and D using only markers with significant r2
-value = 0.001), the LD ranged from 0 to 0.35. Overall, LD declined to 50% of its initial value at about 8 cM (Supplementary Materials
, Figure S1
). Eigenvector decomposition of the kinship matrix was used to investigate the population structure among accessions. The first principal component (PCA) accounts for less than 1% of the total variance (Supplementary Materials
, Figure S2
). Nevertheless, GWAS models with kinship matrix (K matrix, supporting information Figure S3
) with zero, one, two or three PCAs were compared using Bayesian information criteria (BIC). The results indicated noticeable difference between the four models. Additionally, the first model, i.e., with no PCA produced the highest BIC values, given that the largest is the best [55
]. Therefore, we reported the results of association mapping using only the K matrix in which it accounted for most of the stratification among accessions.
Association mapping analysis was conducted on each environment separately (two growing seasons and two water regimes). Genome-wide association mapping (GWAS) indicated that 46 SNP markers found to be significantly linked with GPC. The significant SNP markers were located on chromosomes 1A (12 SNPs), 1B (12 SNPs), 1D (7 SNPs), 6A (6 SNPs), 6B (7 SNPs) and 6D (3 SNPs) (Figure 3
and Figure 4
). Out of the 46 significant SNP markers, ten markers were linked with GPC under well-watered and water deficit conditions in one growing season at least. Three SNP markers (IWA3169, IWA3501, and IWA7937) were significantly linked with GPC across the four environments (2016, 2017 growing seasons, and well-watered and water deficit conditions) (Table 3
). Four markers (IWA6649, IWA6787, IWA3481 and IWA4351) found to be linked with GPC in three environments (2016 well-watered, 2016 and 2017 water deficit conditions) (Table 3
). These results together indicate that some loci were significantly associated with GPC in wheat irrespective of water status.
Under well-watered conditions for the two growing seasons, seven SNP markers (IWA5150, IWA4643, IWA4754, IWA3923, IWA6466, IWA6467, and IWA5986) found to be significantly linked with GPC. On the other hand, under water deficit conditions for the two growing seasons, six SNP markers (IWA7191, IWA8199, IWA7345, IWA3446, IWA7288, and IWA7287) found to be significantly linked with GPC. In contrary, 14 markers (IWA4753, IWA4678, IWA4644, IWA4506, IWA4163, IWA3738, IWA5020, IWA5019, IWA5018, IWA4598, IWA4551, IWA4552, IWA4962, and IWA4730) found to be significantly linked with GPC during only one growing season under well-watered conditions. Another ten markers (IWA7616, IWA3624, IWA6673, IWA7007, IWA8551, IWA6610, IWA6611, IWA7480, IWA7048, and IWA7050) found to be significantly linked with GPC under water deficit conditions in only one growing season. Repeatability of the GPC associated loci in 2 seasons under any given water treatment suggests the feasibility of using/developing markers in LD with these loci.
Protein content is an essential compositional trait in wheat, which has a broad impact in the food industry concerning human nutrition and health. Consequently, breeding for enhanced end-use quality is one of the essential breeding goals in wheat. However, GPC in wheat is positively affected by water deficit compared to well-watered conditions [10
]. In this study, we seek to evaluate a comprehensive spring wheat collection for grain protein content (GPC) and to locate genomic regions associated with GPC under well-watered and water deficit conditions using GWAS approach.
The most striking observation in this study was the weak, positive and significant correlation between GPC obtained from the well-watered condition and water deficit conditions (r
= 0.23). That weak correlation implies strong genotype × environment interaction, in which genotypes responded differently concerning water treatment. Increase in GPC under water deficit conditions could be mainly due to higher rates of accumulation of grain nitrogen and lower rates of accumulation of carbohydrates. High moisture, on the other hand, may decrease GPC by dilution of nitrogen with carbohydrates [56
]. An increased grain protein and gluten content in response to water deficit as compared to the well-watered experiment in a winter wheat was also reported in a previous study [57
]. The current study, as well as previous reports, indicated a significant effect of environment (moisture and growing seasons) on wheat GPC accumulation. Analysis of variance indicated a significant effect of moisture, genotype, and genotype × environment interaction on GPC in wheat, suggesting that GPC is a complex trait influenced by several factors. The significant genotypic effect observed in this study also indicated a wide range of variation for GPC accumulation among wheat accessions used. Moreover, around 366 (166 with high GPC and 200 with low GPC) wheat genotypes performed relatively the same across environments, which implies that GPC accumulation on these genotypes was less responsive to moisture.
Genotypic variation is a result of several alleles on genes which result in different responses to environmental conditions [58
]. Furthermore, landraces serve as a valuable genetic resource in which it might provide new alleles for improvement of economically important traits such as GPC [19
]. Results reported herein showed that landraces outperformed cultivated genotypes concerning GPC. These findings agree with previous reports [59
] in which 121 landraces, 101 obsolete cultivars, and modern wheat cultivars were evaluated for GPC under the same environmental conditions, and landraces had higher total protein content compared to other studied accessions. Grain quality of some wheat landraces should be of particular interest because much broader diversity can be found in landraces compared to modern wheat cultivars [61
]. Additionally, most of the organic wheat production systems rely on cultivars that were developed for high-input production systems [60
]. Wheat landraces have been developed mostly in environments with low nutrient availability; they represent a source of variation for selection of genotypes adapted to cropping systems with low fertilizer input [61
]. In the current study, we identified 224, 214 and 70 wheat landraces that were found to have high GPC under well-watered, water deficit and both conditions, respectively. Our results and previous reports indicated that GPC depends mainly on genotype, environment, and genotype × environment interaction [59
]. However, the response mechanism that modifies protein accumulation under water deficit conditions is still unclear. Recently, a putative mechanism underlying the increased accumulation of storage proteins in wheat endosperm under water deficit was provided by Chen et al. [63
]. They identified four differentially expressed miRNAs induced by drought stress that may affect the development of protein bodies in caryopsis by regulating the expression levels of target genes involved in protein biosynthesis pathways.
One of the primary goals of this study was to locate significant genomic regions that control the accumulation of GPC which might shed light on the genetic architecture of GPC and the protein accumulation mechanism. The genome-wide association mapping analysis, applied in the current study, using the kinship (K) matrix in a mixed model indicated that K matrix was adequate in accounting for population structure [64
]. Also, these results agree with those of Zhao et al. [65
], in which they found that K models were adequate for genome-wide association mapping. Furthermore, the K model was more effective in reducing the false-positive rate compared to using the Q + K model. Linkage disequilibrium (LD) was estimated using r2
among all pairs of SNPs loci, in which r2
in this study was 0.09, which is higher than that obtained by Breseghello and Sorrells [66
] and 0.019 reported by Neumann et al. [67
] because of their small size populations, and with a similar number of marker pairs. This indicates that the population size might have an impact on the LD.
Genome-wide association analysis (GWAS) was conducted on each environment separately to measure the repeatability of the significant SNPs, and the effect of moisture on the genomic regions controlling GPC. Several SNPs found to be significantly linked to the GPC under well-watered conditions but not significantly linked to GPC under water deficit conditions and vice versa
. Moreover, ten QTLs were linked with GPC under both well-watered and water deficit conditions. The GWAS analysis suggested a significant role of genotype × environment interaction in detecting GPC associated loci. Genome-wide association studies using diverse wheat germplasm have successfully detected GPC associated loci in durum wheat [68
], and bread wheat lines [69
]. Thus, the SNPs associated with GPC under water deficit or well-watered environmental conditions, from this study might provide useful molecular information for wheat breeders to incorporate specific QTLs to increase GPC in low input or drought-stressed environments. Around 50% of the significant SNPs detected in the current study was on chromosome 1, where copies of Glu-B1
genes reside [70
genes were previously reported to contribute of about 24.6 and 19.5% of the total phenotypic variation for sedimentation volume (determines gluten strength and in turn cooking quality of pasta) [2
]. Several SNP loci in LD with sedimentation volume were discovered recently on chromosome 1A and 1B, in durum wheat [68
These results together emphasized the importance of using diverse worldwide germplasm to dissect the genetic architecture of GPC in wheat and identify accessions that might be potential parents in wheat breeding programs. Ongoing multiple years, multiple replication study using 406 accessions identified in the current study is being conducted, to evaluate these genotypes for yield and validate the GPC associated loci detected herein. Furthermore, GPC estimates under well-watered and water deficit conditions was used as a selection parameter to downsize the number of accessions from 2111 to 406. Reducing the number of accessions will allow us to profoundly investigate other wheat quality aspects such as concentrations (soluble and insoluble) of glutenin, α/β, γ gliadin and albumin/globulin in addition to the total protein for high and low GPC genotypes.