Utilization of Evolutionary Plant Breeding Increases Stability and Adaptation of Winter Wheat Across Diverse Precipitation Zones

: Evolutionary plant breeding (EPB) is a breeding method that was used to create wheat ( Triticum aestivum L.)-evolving populations (EP), bi-parental and composite-cross populations (BPPs and CCPs), by using natural selection and bulking of seed to select the most adaptable, diverse population in an environment by increasing the frequency of favorable alleles in a heterogeneous population. This study used seven EPs to evaluate EPB in its ability to increase the performance of agronomic, quality, and disease resistance traits and adaptability across di ﬀ erent precipitation zones. The populations were tested in ﬁeld trials in three diverse locations over 2 years. Least signiﬁcant di ﬀ erences showed the EPs performance was dependent on their pedigree and were statistically similar and even out-performed some of their respective parents in regards to grain yield, grain protein concentration, and disease resistance. Stability models including Eberhart and Russel’s deviation from Regression ( S 2 d i ), Shukla’s Stability Variance ( σ 2 i ), Wricke’s Ecovalance ( W i ), and the multivariate Additive Main E ﬀ ects and Multiplicative Interaction (AMMI) model were used to evaluate the adaptability of the EPs and their parents. The BPPs and CCPs demonstrated signiﬁcantly greater stability over the parents across precipitation zones, conﬁrming the capacity of genetically diverse EP populations to adapt to di ﬀ erent environments.


Introduction
The basis of mass selection was credited to Danish biologist W. Johansen, in 1903 [1]. Mass selection is referred to as one of the oldest methods of selection and has been long utilized by farmers who saved seed from desirable plants in order to plant them the following year. Mass selection works on a population level by increasing the genetic frequencies of desirable genes [1]. Selection for self-pollinated crops starts with planting a heterogeneous population in a field, and then selection is made on a visual basis. The seeds from the selected plants are bulked and replanted for the next generation. This method is repeated until a desired homogenous genotype is created. The homogenous genotype is then grown in replicated, maintained plots in order to properly select the superior genotype [1]. Mass selection is used in conjunction with the bulking of seed [2]. Bulk breeding is one of the simplest methods of developing recombinant inbred lines [2]. It is accomplished by bulking seeds from an F 2 population that was created by allowing the F 1 hybrid to self-pollinate. The F 2 seeds that are bulked are grown

Genetic Markers
Genetic markers for each EP at the F 8 generation and their respective parents, along with the EPs 101 and 107 from the F 2 generation were analyzed using the 9K iSelect beadchip assay [20]. DNA was extracted from each EP and parent by growing 12 seeds from each of the EPs to the three-leaf stage, and cutting 2.5 cm of the newest leaf. In total, 12 seeds were used to capture most of the heterozygosity in the F 8 and F 2 generations. Extraction was completed using the standardized methods of the DNeasy Sustainability 2020, 12, 9728 4 of 23 96 Plant Kit (Qiagen). After DNA extraction, DNA concentration was determined, diluted to 40 ng/µL, then the 12 samples were combined in equal proportions to capture any heterogeneity that may have still existed in the population. Allelic and heterozygosity frequency was measured for each population and then markers were filtered for at a missing rate of 20% and monomorphic markers were removed. Population structure and genetic diversity was analyzed using model based clustering using "mclust" and principal component analysis using "prcomp" in R [21]. The genotype data presented in this study are available in Supplementary File 1 (File S1).

Experimental Design
Once the EPs reached the F 8 generation, they were subsampled and grown in field trials for evaluation in the 3 locations under conventional management systems. This was completed in 2 separate growing seasons, 2010-2011 and 2011-2012 for multiyear analyses. Field trials were planted in randomized complete block designs (RCBD), with a total of 3 blocks and 33 different entries. Supplementary Table S1, displays the entries in each field trial. The entries included the 7 EPs from the initial F 3 generation respective to their location as F 3 controls. For example, the F 3 EPs tested in each location were the source for the EPB method and development of the F 8 EPs. The F 8 generation EPs developed from each population in each location were also included as entries in each field trial and accounted for 21 entries composed of the 7 populations developed in the 3 locations. The remaining 5 entries included the different parents for each EP as a parental control, except for the parent Sorbas.
Pullman was the high precipitation location with a recorded 423.67 mm and 496.32 mm of rain in 2011 and 2012, respectively (Table 2). Lind was the low precipitation location with a recorded 163.83 mm and 308.10 mm of rain in 2011 and 2012, respectively ( Table 2). Central Ferry is an irrigated location and received 600 mm of irrigation each year and received 219.5 mm and 352.0 mm of precipitation in 2011 and 2012, respectively. The precipitation for Pullman was relatively the same in both years, whereas Lind 2012 recorded almost double the amount of rain than in 2011. Precipitation and temperature were recorded using the AgWeatherNet system provided by Washington State University [22]. Precipitation and temperature for Central Ferry were not available from the AgWeatherNet. Therefore divisional data were obtained from the National Climatic Data Center for the averages in the central basin division in Washington State [23].

Data Collection
Field trials were evaluated in 3 locations under conventional management systems. A total of 8 traits were recorded for each entry including three agronomic traits, three quality traits, and 2 stripe rust resistance traits. The phenotype data presented in this study are available in Supplementary File 2 (File S2).

Agronomic Data
In total, 3 agronomic traits were measured for each plot. Heading date, which is based on the Julian calendar date when 50% of the plot reached Feekes growth stage 10.1 [24]. Plant height Sustainability 2020, 12, 9728 5 of 23 was measured from the base of the plants to the top of the fully exposed head in cm. Grain yield (kg ha −1 ) was collected by harvesting each plot using a Wintersteigher ® Nurserymaster combine (Ried im Innkreis, Austria). Grain yield, heading date, and plant height were measured in all locations and blocks, with the exception of one population in the third block in Central Ferry in 2012 where data were lost.

End-Use Quality Data
A total of 3 quality traits were measured; these included test weight, grain protein content, and kernel hardness. Each trait was measured in at least one block for each environment in 2012, and every block in each environment in 2011 except for data from one parental line were lost in Central Ferry in 2011. Test weight was measured as kg hL −1 and is an indicator of the quality of wheat and correlates to easier processing and flour yield [1]. Kernel hardness was measured using the Single Kernel Characterization System 4100 (Perten Instruments, Springfield, IL, USA) and is a unitless measure. Lower values are equal to greater softness and were measured by taking the average of 300 kernels. Grain protein content was measured using a Perten DA 7000 NIR analyzer (Perten Instruments, Springfield, IL, USA) and is reported in g kg −1 , expressed on a 12% moisture basis.

Disease Evaluation
The disease traits measured were stripe rust (Puccinia striiformis Westend. f. sp. tritici Erikss.), infection type, and stripe rust disease severity. The recordings of these traits were dependent on natural infection and stripe rust incidence at the time of observation. Some trials had 2 observations for these traits and were identified with sequential numbers. The trials with only one recording were recorded right after anthesis to measure adult plant resistance. The reason there was only one recording was that stripe rust was not present in the field at earlier growth stages. If there were 2 recordings, stripe rust was present in the field at earlier growth stages, and the first recording was taken at flag leaf emergence, whereas the second trait was taken again after anthesis. In trials such as Lind in 2012, no stripe rust observations were recorded. In Pullman in 2012, only the first replication was recorded, and in Lind in 2011, there was no recording in the third block for the second set of observations. The stripe rust infection type was recorded based on a 0-9 scale, with infection type 9 being the most susceptible reaction [25]. Stripe rust severity is recorded as a percentage of the total area of the leaf using a modified Cobb Scale [26]. The latest observation for each trait in each location was used for analysis.

Statistical Analysis
Analysis of variance (ANOVA), least significant difference (LSD), Pearson correlations, and coefficient of variation (CV) were calculated using a linear model across years and locations and for each trait in a randomized complete block design (RCBD) [27]. The analyses were calculated in R, and the package "agricolae" was used for LSD tests [28]. Figures were created using the package "ggplot2" in R [21]. The linear model used for each trait across locations and years is represented as follows: where response is the trait of interest; µ is the effect of the mean; G i is the effect of the ith genotype; L j is the effect of the jth location; Y k is the effect of the kth year; B r is the effect of the rth block; and e is the residual errors, [29]. For individual locations, the same model was used except without the L j Sustainability 2020, 12, 9728 6 of 23 location factor. Broad-sense heritability was calculated using the above model as a linear mixed model with random effects for genotype, locations, and years using the formula: where σ 2 g , σ 2 gl , σ 2 gy , σ 2 gly , and σ 2 e are the genotype, genotype by location interaction, genotype by year interaction, genotype by location by year interaction, and error variance components, r is the number of replicates, l is the number of locations, and y is the number of years in the analysis [29].
Univariate and location value statistics were calculated using RGxE programs in R. Univariate models for stability including the joint linear regression model with Eberhart and Russell's deviation from Regression (S 2 d i ), Shukla's Stability Variance (σ 2 i ), and Wricke's Ecovalance Stability Index (W i ) were used to assess genotype stability across locations and years and were used with code adapted from RGxE using the package "agricolae" and function "stability.par" in R [21,28,[30][31][32][33][34][35]. The multivariate model, Additive Main Effects and Multiplicative Interaction (AMMI), was used to calculate stability values and were calculated and plotted using the formula suggested by Purchase et al. [30], using the function "index.AMMI" in the "agricolae" package in R [28][29][30]. Index selections calculated were Kang's Yield Stability Statistic (YS i ) using RGxE in R and the AMMI Yield Stability Index (AYSI) using the function "index.AMMI" in the package "agricolae" [28,35]. All stability models were recorded as rankings. Genotype plus genotype-by-environment (GGE) biplots were implemented in using the "metan" package in R [36].

Genetic Marker Analysis
The maximum allele frequency change between the a population and it's respective parent was 3% (Table S2). The largest difference between populations and their parents in respect to heterozygosity is an increase of 15% heterozygosity in the F 8 generation. Model-based cluster classification and optimal cluster number for the BPPs showed five distinct clusters (Table S3). The five cluster results show the EPs are not necessarily grouped in their respective EPs. For example, the 104 populations, the Central Ferry, and Pullman populations are clustered together in cluster 3 with the 105 populations, and the Lind populations (8BPL104) are clustered with the 101 and 102 populations in cluster 1 (Table S3). The exception is the 107 populations where the populations are all grouped together in cluster 2 (Table S3).
The five clusters are visibly displayed in the principal component analysis biplot in Figure 1. We can see that clusters 1 and 3 contain the majority of the EPs and are grouped based on precipitation adapted parents. Cluster 1 mostly contains populations 101 and 102, which are derived from Eltan and Lewjain, both low precipitation adapted parents. Cluster 3 contains mostly populations 104 and 105 along with the common parent WA007933, and Albion and Stephens which are both high precipitation adapted parents. The parents are clustered together depending on the precipitation adaptation they were selected and bred in (Figure 1).
The results suggest the EPs are genetically grouped based on their pedigree rather than any indication the EPs were grouped base on the location they were developed in. The only exception is the BPP 8BPL104, which had a pedigree derived from high precipitation-adapted parents, but which was genetically similar to the low precipitation adapted populations 101 and 102.

Trait Analysis
The ANOVA for multilocations and LSD for multiple comparisons between individual EPs and parents showed year, location, and genotype main effects were significant for every trait with the Sustainability 2020, 12, 9728 7 of 23 exception of year for grain yield and protein concentration (Table 3). Heritability was high for all traits with the exception of infection type (0.38; Table 3). The results suggest the ability to discriminate and compare genotypes for all traits due to the high genetic effects determined by the genotype significance and high heritability. The CV for each trait was low except for infection type and disease severity, and the R 2 value was high for all traits (Table 3), suggesting reliable results for the majority of the traits analyzed. Sustainability 2020, 12, x FOR PEER REVIEW 7 of 24

Trait Analysis
The ANOVA for multilocations and LSD for multiple comparisons between individual EPs and parents showed year, location, and genotype main effects were significant for every trait with the exception of year for grain yield and protein concentration (Table 3). Heritability was high for all traits with the exception of infection type (0.38; Table 3). The results suggest the ability to discriminate and compare genotypes for all traits due to the high genetic effects determined by the genotype significance and high heritability. The CV for each trait was low except for infection type and disease severity, and the R 2 value was high for all traits (Table 3), suggesting reliable results for the majority of the traits analyzed.   ; PROT: grain protein concentration (g kg −1 ; 12% moisture basis); HARD: kernel hardness; IT: stripe rust infection type; SEV: stripe rust disease severity (%); LSD: least significant difference; CV: coefficient of variation; DF: degrees of freedom. "*" p-value < 0.05; "**" p-value < 0.01; "***" p-value < 0.001. Multiple comparisons were completed for EPs and parents that were included in each trial. Grain yield, grain protein concentration, and disease severity multiple comparisons were plotted for a visual representation of agronomic, quality, and stripe rust disease resistance representations (Figure 2A-C). Means followed by a common letter are not significantly different by the LSD test at the 5% level of significance.
Multiple comparisons were completed for EPs and parents that were included in each trial. Grain yield, grain protein concentration, and disease severity multiple comparisons were plotted for a visual representation of agronomic, quality, and stripe rust disease resistance representations (Figure 2A-C). Means followed by a common letter are not significantly different by the LSD test at the 5% level of significance. Across locations, the common parent WA007933 and the parent Stephens yielded statistically higher than the EPs (5716 kg ha -1 and 5503 kg ha -1 , respectively) with the exception of the CCP 8BPL106 (5200 kg ha -1 ). The parent Eltan was the fourth-highest-yielding genotype (5137 kg ha -1 ), but was not statistically different than the top nine EPs. For soft white wheat, lower protein is preferable. There were only a few statistical differences between the EPs and parents. The parent Stephens was statistically lower than most the EPs except for the CCP 8BPL106. The EPs 8BPL107 and 8BPCF107 had statistically higher grain protein concentration compared to all other EPs and parents. However, for disease severity, only the common parent WA007933 displayed a statistically lower disease severity than the EPs, with all three of the BPP 107s being in the five lowest values. There was no apparent advantage for EPs developed in specific environments or for the CCPs (103 and 106). Instead, the EPs performed statistically like their parents. For example, population 107 performed very similar for all three traits (Figure 2A-C), regardless of which environment it was selected in. Across locations, the common parent WA007933 and the parent Stephens yielded statistically higher than the EPs (5716 kg ha −1 and 5503 kg ha −1 , respectively) with the exception of the CCP 8BPL106 (5200 kg ha −1 ). The parent Eltan was the fourth-highest-yielding genotype (5137 kg ha −1 ), but was not statistically different than the top nine EPs. For soft white wheat, lower protein is preferable. There were only a few statistical differences between the EPs and parents. The parent Stephens was statistically lower than most the EPs except for the CCP 8BPL106. The EPs 8BPL107 and 8BPCF107 had statistically higher grain protein concentration compared to all other EPs and parents. However, for disease severity, only the common parent WA007933 displayed a statistically lower disease severity than the EPs, with all three of the BPP 107s being in the five lowest values. There was no apparent advantage for EPs developed in specific environments or for the CCPs (103 and 106). Instead, the EPs performed statistically like their parents. For example, population 107 performed very similar for all three traits (Figure 2A-C), regardless of which environment it was selected in.

Location Analysis
Location means and correlations in Table 4 show a high discriminating ability for Pullman for grain yield, heading date, and protein concentration. A significant correlation reflects a strong represention of the average location, and therefore, genotypes perform similarly for the trait in question when compared to the average of all locations [37]. A significant value represents a poor discriminating ability for a trait between locations. Lind and Central Ferry have high significance for all traits, and therefore poor discriminating ability, with the exception of grain yield and heading date (Table 4). Grain yield and heading date in Lind have a significant correlation but less significant than Central Ferry and the majority of the traits. Table 4. Location means and Pearson correlation significance levels for agronomic, quality, and disease resistance traits in each location to the mean of all locations in Washington.
Sustainability 2020, 12, 9728 10 of 23 ANOVA and LSD tests were completed for all traits in each location. In the individual location comparisons, the F 3 EPs were included along with the regular F 8 EPs and their parents. Grain yield, grain protein concentration, and disease severity multiple comparisons were plotted for a visual representation of agronomic, quality, and stripe rust disease resistance representations.
For the trials in Pullman, year and genotype had significant effects for each trait with the exception year for grain yield and genotype for infection type ( Table 5). The CV for each trait was low, and the R 2 was high with the exception of infection type and disease severity (Table 5), suggesting reliable results for the majority of the traits analyzed in Pullman.
In Lind, year and genotype were significant for each trait except for infection type and disease severity due to no observations taken in 2012 (Table 5). Central Ferry also had significant effects for year and genotype with the exception of test weight, grain protein concentration, and disease severity for genotype and disease severity for year. Again, the CV and R 2 were relatively high in both locations for most traits.
In Lind, year and genotype were significant for each trait except for infection type and disease severity due to no observations taken in 2012 (Table 5). Central Ferry also had significant effects for year and genotype with the exception of test weight, grain protein concentration, and disease severity for genotype and disease severity for year. Again, the CV and R 2 were relatively high in both locations for most traits.
For grain yield, most EPs were not significantly different in Lind, but significant differences were seen in Central Ferry and Pullman ( Figure 3A-C). The parent Stephens was the highest yielding genotype in Pullman and Central Ferry (7857 kg ha -1 and 3609 kg ha -1 ), and the lowest yielding in Lind (3149 kg ha -1 ) ( Figure 3A-C). However, the EPs showed average stability in all environments, rather than the extremes displayed by the parents, with the EP 8BPL102 yielding the highest in Lind (3889 kg ha -1 ) ( Figure 3B).  The EPs that were derived in their specific location did not necessarily determine their performance. The F 8 's derived from Pullman did perform the best in Pullman (Figure 3), but they also performed similarly to their parental pedigrees. The highest yielding EPs (8BPCF105 and 3BPP105) were derived from the highest yielding genotype, Stephens, and the lowest yielding EPs, 104, were derived from the lowest yielding genotype, Albion ( Figure 3A-C). This can also be seen with the majority of the highest yielding EPs in Central Ferry were selected in Central Ferry.
The inclusion of the F 3 's in individual locations allowed the comparisons of the F 8 's to determine the effect of the EPB method. The F 8 's did not necessarily out-yield the F 3 's in their respective environments. However, several of the F 8 's did yield higher than their corresponding F 3 EPs from the corresponding location where they were developed. The trend showed that if the F 8 EPs had pedigrees adapted to the respective environment, they performed better. For example, in Lind, all the F 8 BPPs and CCPs (Populations 103 and 106) with low precipitation-adapted pedigrees (101, 102, and 103), yielded higher than their F 3 counterparts ( Figure 3B). The performance of the BPPs and the CCPs show no overall trend in mean performance for agronomic, quality, nor disease resistance traits.
Grain protein concentration displayed similar results over all environments, presumably due to higher heritabilities (Table 3) and less environmental effect on the performance of the genotype (Figures 4 and 5). Grain protein concentration and disease severity in multiple comparisons showed higher performance of an EP in the environment in which it was developed. In Pullman, 7 out of the top 10 genotypes for grain protein concentration were EPs developed in Pullman (Figure 4).
performance. The F8's derived from Pullman did perform the best in Pullman (Figure 3), but they also performed similarly to their parental pedigrees. The highest yielding EPs (8BPCF105 and 3BPP105) were derived from the highest yielding genotype, Stephens, and the lowest yielding EPs, 104, were derived from the lowest yielding genotype, Albion ( Figure 3A-C). This can also be seen with the majority of the highest yielding EPs in Central Ferry were selected in Central Ferry.
The inclusion of the F3's in individual locations allowed the comparisons of the F8's to determine the effect of the EPB method. The F8's did not necessarily out-yield the F3's in their respective environments. However, several of the F8's did yield higher than their corresponding F3 EPs from the corresponding location where they were developed. The trend showed that if the F8 EPs had pedigrees adapted to the respective environment, they performed better. For example, in Lind, all the F8 BPPs and CCPs (Populations 103 and 106) with low precipitation-adapted pedigrees (101, 102, and 103), yielded higher than their F3 counterparts ( Figure 3B). The performance of the BPPs and the CCPs show no overall trend in mean performance for agronomic, quality, nor disease resistance traits.
Grain protein concentration displayed similar results over all environments, presumably due to higher heritabilities (Table 3) and less environmental effect on the performance of the genotype (Figures 4 and 5). Grain protein concentration and disease severity in multiple comparisons showed higher performance of an EP in the environment in which it was developed. In Pullman, 7 out of the top 10 genotypes for grain protein concentration were EPs developed in Pullman (Figure 4).
Disease severity showed a trend of similar pedigree performance across all environments for the EP 107 ( Figure 5). However, for the remaining EPs, selection pressure also was a factor in EP performance. For example, the EPs developed in Lind performed worse when grown in Central Ferry or Pullman when subjected to higher stripe rust pressure.  Disease severity showed a trend of similar pedigree performance across all environments for the EP 107 ( Figure 5). However, for the remaining EPs, selection pressure also was a factor in EP performance. For example, the EPs developed in Lind performed worse when grown in Central Ferry or Pullman when subjected to higher stripe rust pressure.
When LSD tests were applied between the overall groups (Parents, BPPs, CCPs, and F 3 ), there were few statistical differences between overall populations ( Figure 6). In Central Ferry, Pullman, and across all locations, the parents were statistically higher for grain yield compared to the BPPs, but not for the CCPs or F 3 's ( Figure 6A). However, in Lind, the BPPs yielded statistically higher than the F 3 's and had a higher mean than the other types of EPs ( Figure 6A). Grain protein concentration showed a statistical difference between the CCPs and BPPs compared to their parents in all locations except for Pullman. The CCPs had statistically lower grain protein concentration than the F 3 's in Central Ferry, and the BPPs had statistically lower grain protein concentration than the F 3 's in Lind ( Figure 6B). Disease severity showed statistically higher severity for the parents in Lind, Pullman, and overall locations ( Figure 6C). With the exception of grain yield and grain protein concentration in Central Ferry, there were few statistical differences between BPPs and CCPs ( Figure 6A-C). When LSD tests were applied between the overall groups (Parents, BPPs, CCPs, and F3), there were few statistical differences between overall populations ( Figure 6). In Central Ferry, Pullman, and across all locations, the parents were statistically higher for grain yield compared to the BPPs, but not for the CCPs or F3's ( Figure 6A). However, in Lind, the BPPs yielded statistically higher than the F3's and had a higher mean than the other types of EPs ( Figure 6A). Grain protein concentration showed a statistical difference between the CCPs and BPPs compared to their parents in all locations except for Pullman. The CCPs had statistically lower grain protein concentration than the F3's in Central Ferry, and the BPPs had statistically lower grain protein concentration than the F3's in Lind ( Figure 6B). Disease severity showed statistically higher severity for the parents in Lind, Pullman, and overall locations ( Figure 6C). With the exception of grain yield and grain protein concentration in Central Ferry, there were few statistical differences between BPPs and CCPs ( Figure 6A-C).

Stability Analysis
Stability analysis rankings for grain yield, grain protein concentration, and disease severity are shown in Table 6. Multiple stability analyses are used to appropriately account for all genotype-by-

Stability Analysis
Stability analysis rankings for grain yield, grain protein concentration, and disease severity are shown in Table 6. Multiple stability analyses are used to appropriately account for all genotype-by-environment effect [38]. Low rankings, the higher the number, and significance for Deviation from Regression and Shukla's Stability Variance display unstable genotypes from the mean. Low rankings for Wricke's Ecovalence and AMMI Stability Value (ASV) also show unstable genotypes. Table 6. Stability analysis rankings for Deviation from Regression (S 2 d i ) AMMI Stability Value (ASV), Wricke's Ecovalence (W i ), and Shukla Stability Variance (σ 2 i ), across all locations and years for grain yield, grain protein concentration, and stripe rust disease severity in Washington.
The parents Eltan, Stephens, and WA007933 showed consistently unstable results in comparison to the EPs for yield across all four stability analyses (Table 6). For Deviation from Regression, the lowest rankings 18-19 and 22-26 were significant and included three BPPs developed in Central Ferry (8BPCF102, 8BCF105, and 8BPCF107) and one CCP in Lind (8BPL106). Similar results for Shukla's Stability Variance with only one BPP being significant (8BPCF107) and two parents (Stephens and WA007933). Wricke's Ecovalence and ASV display very similar results for rankings. The genotypes that show the highest ranking for stability across all analyses included three EPs developed in Pullman, 8BPP107 for Wricke's Evovalence and ASV, 8BPP106 for Deviation from Regression, and 8BPP101 for Shukla's Stability Variance.
The stability analysis for grain protein concentration and disease severity are similar but the results of grain yield stability were different than grain protein concentration and disease severity. Only three genotypes were significant for being unstable for Deviation from Regression for grain protein concentration (8BPL104, 8BPL107, and Lewjain), and no genotypes were significant according to Shukla's Stability Variance. Similar to grain yield, the parents displayed lower stability for Wricke's Ecovalence and ASV for grain protein concentration. The genotypes with the highest stability for grain protein concentration were 8BPP107 and 8BPP106 for Shukla's Stability Variance and Wricke's Ecovalence, and 8BPCF105 was the highest-ranked genotype for both Deviation from Regression and ASV. Eltan and 8BPP102 were significant for Deviation from Regression and Shukla's Stability Variance along with 8BPCF106 for Shukla's Stability Variance. Lewjain was the lowest-ranked for Wricke's Ecovalence and ASV. The highest stability ranks were 8BPP101, 8BPP106, 8BPL104, and 8BPCF102 for Deviation from Regression, Shukla's Stability Variance, Wricke's Ecovalence, and ASV, respectively.
The stability analysis for the EPs and parents confirm the results seen in Figures 3-6. Overall, the parents were the most unstable genotypes for grain yield, and the EPs consistently the ranked the highest for grain yield, grain protein concentration, and disease severity stability (Table 6). Similar to the LSD results (Figures 3-6), there were no apparent advantages of BPPs over CCPs, with some both BPPs and CCPs displaying high stability across traits and analyses.
The stability of the EPs over their parents are visually shown in GGE biplots ( Figure 7A). In a mean vs. stability biplot displays both the mean and stability of a genotype. The further a genotype falls from the x-axis shown with a dotted line, the less stable a genotype is [39]. The majority of the EPs are centered around the origin of the graph ( Figure 7A). The lower stability of the parents (Eltan, Stephens, and WA007933) and the EPs (8BPCF103 and 8BPCF107) are displayed by the large distances between their points on the graph to the x-axis. The higher yielding of the parents is also seen in the ranking of the genotypes with Stephens and WA007933 the two furthest right on the x-axis. GGE biplots also allow the visualization of the performance of genotypes across locations. For which-won-where biplots, environments and genotypes that fall within the same two dotted lines indicate similar environmental and genotypic effect [39]. Using the which-won-where view of the GGE biplot ( Figure 7B), the parents Stephens and WA007933 performed the best for Central Ferry, and Pullman, respectively. Additionally, both Pullman trials had a similar effect on genotype for grain yield as Central Ferry in 2011.

Selection and Indices
Comparisons based on the rank of means, Kang's Stability Statistic, and the AMMI Yield stability index can be made for selection purposes accounting for performance and stability for grain yield and negative selection for grain protein concentration and disease severity. Kang's Stability Statistic also denotes a (+) for selected stable genotypes, and (-) can be used for negative selection. For grain yield, the top three for the highest rank for mean and stability indexes were WA007933, Stephens, and the CCP 8BPL106 (Table 7). Negative selection for stable high grain protein concentration and high disease severity can also be completed using the same methods for grain yield, but since lower

Selection and Indices
Comparisons based on the rank of means, Kang's Stability Statistic, and the AMMI Yield stability index can be made for selection purposes accounting for performance and stability for grain yield and negative selection for grain protein concentration and disease severity. Kang's Stability Statistic also denotes a (+) for selected stable genotypes, and (-) can be used for negative selection. For grain yield, the top three for the highest rank for mean and stability indexes were WA007933, Stephens, and the CCP 8BPL106 (Table 7). Negative selection for stable high grain protein concentration and high disease severity can also be completed using the same methods for grain yield, but since lower grain protein concentration and disease severity is preferred, the (+) in Kang's Stability Statistic can be used for negative selection. In this regard, the common parent WA007933 is the fourth-highest rank in grain protein concentration and can be negatively selected for high grain protein concentration, whereas the parent Stephens is the only genotype with a (-) showing a stable lower grain protein concentration and can be selected on that basis. Due to the common parent having a high grain protein concentration, the majority of the high grain protein concentration genotypes are BPPs and CCPs and can thus be negatively selected. The opposite is seen for disease severity with the common parent WAA007933 conferring stable low disease severity with a (-) and the other parents conferring higher disease severity. Therefore, BPPs and CCPs confer better disease resistance, and parents can be negatively selected. (+) and (−) are selected genotypes for above and below the population mean, respectively.

Discussion
This study evaluated the EPB method in its ability to increase performance and stability in wheat. EPB presents an alternative plant breeding method that uses natural selection and bulking of seed. It has been shown to increase yield and is based on a mass selection technique used by farmers throughout the history of agriculture [8,9]. It has primarily been used in developing countries that allow farmers with limited resources to mitigate their losses for a certain pathogen or disease. Instead of committing all resources into the development of the release of one or two genotypes every year, EPB allows breeders to plant a mixture of pedigrees to fill the need that normal commercial genotypes do not meet. Therefore, the EPB genotypes, when given enough time for random mass selection to improve the genotypes in the selected environments, require minimal inputs and mitigate the potential impact of variable climatic conditions via conservation of favorable alleles in the heterogeneous populations [3]. Examples of this would be the changing races of stripe rust or the effects of drought or climate change. EPB focuses on genetic diversity in a crop mixture that can withstand the variation in environmental effects, rather than relying solely on chemical inputs.

Genetic Diversity
The genetic diversity in the BPP and CCP populations is the core of the EPB method that promotes stability [3]. Analysis of genetic markers for differences of allele frequencies and genetic diversity in populations can be a useful approach to detect natural selection pressure [40]. In a previous study, genetic diversity based on single-sequence repeats (SSR) markers for CCPs could not be detected between populations in which they determined the management system did not drive allele frequency changes [5]. This can be seen in our results. The BPPs were not clustered with their respective parents, they were, however, clustered with the common parent, WA009733. The differentiation between clusters of genetic diversity is driven by pedigree rather than the location it was derived. This may be accounted for by the limited genetic variability in BPPs and the genetic differentiation due to the random assortment of alleles during the initial crossing rather than allele frequency changes due to the environment in which they were selected. The grouping of populations based on pedigree in our results indicates minimal change in allele frequency and increased rate of heterozygosity.
This observation in which the BPPs were clustered together without their respective parent was also seen with CCPs in Soliman and Allard [12]. The clusters represent the heterozygosity from the crossing of the two parents as stated above. This is supported by the minimal change in allele frequency but the increase in heterozygosity in our results. The genetic diversity and heterozygosity present in the population represents the potential response of the population to selection forces, natural or artificial. Natural selection is a slow process that favors alleles that contribute to the fitness of the genotype. Natural selection also drives allele frequency to genetic equilibrium or stability, in which alleles are not entirely removed from the population [41]. This is the driving force behind the maintenance of the genetic diversity needed for an appropriate response to the environment. Since genetic diversity is the central requirement for EPB, these populations need to evolve and need genetic diversity, heterozygosity, and the capacity to do so.

Performance
Agronomic, quality, and disease-resistance traits are important factors in developing genotypes. Yield during periods of drought favor bulk populations over uniform genotypes. The difference in the performance of the genotypes in differing locations represents the large genotype-by-environment interaction in the precipitation environments. The parent Stephens has the highest mean yield in environments with higher precipitation (Pullman and Central Ferry) but has the lowest mean yield in Lind which has the lowest mean precipitation. This illustrates the unstable performance of commercial varieties that are selected for specific environments. However, the EPs generally show less variation in performance over differing precipitation environments and thus exhibit higher stability and adaptability. Natural selection will favor high-yielding genotypes in environments with fluctuating biotic and abiotic selection pressures [9]. The higher-yielding genotypes are strongly favored in natural selection due to a positive correlation between yield ability and fitness components [9]. The ability for plants to germinate, survive, and reproduce seeds are important fitness components highly selected upon by natural selection, and thus lead to selection within segregating genotypes such as BPPs and CCPs [9].
Both BPPs and CCPs were significantly affected by their pedigrees and with some of the EPs yielding more than their parents. For example, the 104 BPP and 106 CCP out yielding their parent Stephens in Lind and outperformed their parent Albion in Pullman. The lower grain yield of the EPs compared to their parents has been shown in previous studies [9,13,14]. However, this was in direct opposition to the classical study by Suneson [8]. The contradiction can be attributed to two factors: the length of time the CCPs were submitted to natural selection and the pedigree mixture. CCPs in the Suneson [8] study were submitted to a range of 12 to 29 generations, a much larger timeframe than the eight generations the EPs in this study were submitted to. Natural selection is a slow process driven by the fitness of genotypes, which can be affected by both abiotic and biotic stress and are dependent on the environment [41]. The longer generation time would allow the forces of natural selection to change the allele frequencies more than the EPs in our study. Thus, it would partially explain the higher performance of the Suneson CCPs, which had the time to both accumulate and fix favorable alleles.
The CCPs in the Suneson [8] study was composed of anywhere between 28 to 30 parents, a vast difference between the two parents BPPs and three-parent CCPs in our study. A higher number of parents greatly increases the genetic variability in the base population, thus allowing for a larger dispersion of allele frequencies and subsequent genetic gain potential. The BPPs in this study have just two parents, which limits the change in allele frequencies to the range and diversity of the parental pedigrees. The limited number of parents and generations can also explain the lack of statistical differences between BPPs and CCPs. In terms of selection, the genetic variability can be quickly reduced, and reach the limits of selection, causing genetic gain to no longer occur. This occurrence is common in breeding programs, and is the reason recurrent selection is needed to further genetic gain [42].
Quality traits are similar to yield in the fact that EPs performed on a pedigree basis, but the distinction between these two traits is seen in natural selection [9]. Natural selection will not directly improve the quality of a population unless the trait is genetically linked to a gene that is positively influenced by natural selection [9]. There is also a trade-off between the suggested negative correlation between yield and protein [43]. In soft white wheat, this is not as important of an issue compared to hard red wheat in which high protein is ideal. As a breeder selects genotypes with a higher yield, they must also take into account any reduction of grain protein. Therefore, including a high-protein parent into the creation of a BPP or CCP will not only account for a trade-off between grain yield and grain protein concentration but account for the lack of natural selection pressure on yield [9]. Bulk populations have been shown to maintain the mid-parent value for six quality traits even after 6 years [9]. This midparent value is exhibited in our effects and follows the same trend as yield [43].
Breeding for disease resistance is a major goal for most breeding programs because of the effect disease has on yield and quality performance. Disease resistance is commonly illustrated by single genes conferring vertical resistance. In stripe rust, the average resistance conferred by a single gene is only 3.5 years [44]. The use of mixtures of genotypes with different resistance genes in multilines or the pyramiding of several genes into a single genotype are strategies to delay the breakdown of resistance [45]. Natural selection has been shown to be effective in bulk populations [46,47]. This is explained by a higher fitness of resistant plants in which the resistant genotypes produce more seeds than susceptible genotypes. Therefore, a gradual push towards resistance by natural selection is effective. Our results show lower disease severity in the EPs than the parents with the exception of the common parent WA007933. This indicates a midparent value of most EPs due to a mixture of susceptible and resistant pedigrees.
Even though the EPs were not the most resistant genotypes, the effect of higher fitness for resistance in CCPs has been shown to reverse susceptibility after 45 generations [48]. This reversal suggests the possibility for the EPs in our study to increase their level of resistance, given more time and selection pressure. With more time comes more opportunity for meiotic events to break up linkage groups and reveal more genetic variance, which has been shown in the Illinois long-term selection experiment in maize [49]. As noted previously, a longer period also allows for more exposure to biotic stresses that create differences in fitness for natural selection to act upon. The effect of selection pressure is hinted at in our results with the EPs that were developed in Lind, which had low disease incidence and selection pressure, performed worse when grown in Central Ferry and Pullman, which have higher precipitation and therefore disease incidence.

Stability
The presence of genotype-by-environment interaction occurs when genotypes differ in their relative performance across environments. Plant breeders handle this change in rank performance in three ways. They can ignore it and select the genotype that performs the best in all environments. They can reduce it by partitioning environments into similar subgroups and focus on each subgroup separately [42]. The last and one of the most common methods in breeding programs is in exploiting the effect of the environment and selecting a genotype for each environment. Exploiting genotype-by-environment interaction is the current breeding method for overcoming the significant difference between precipitation zones in inland Washington. The significant differences in both temperature and precipitation in inland Washington have led to selection of uniform genotypes for each environment. An example of this is the precipitation adaptations for the different parents ( Table 1). The effect of precipitation on different locations can clearly be seen by different performances of pedigrees in different locations and the difference in LSD values and significant differences. Lind is a low precipitation location, and greatly limits the performance of genotypes for grain yield, grain protein concentration, and disease severity. The lack of significant differences between genotypes in our study creates a challenge when trying to select superior genotypes. The lack of significant differences decreases the potential of transgressive segregation and response to selection. Large environmental differences and performances of genotypes create the need for multiple breeding populations and result in a considerable strain on resources for breeding programs in which they are operating two different breeding pipelines. The strain creates a need for stable genotypes that can be grown and exploited in every environment and frees up resources to develop a single breeding pipeline with a larger and more diverse breeding population to increase genetic gains rapidly.
Most breeders use the term stability to characterize genotypes, which show relatively constant yield independent of the environment [50]. Stability analysis assesses the genotype performance relative to other genotypes in different environments [42]. Stability analysis identifies genotypes that perform similarly in diverse environments and are referred to as 'stable' or 'unstable.' Multiple stability analyses ned to be used because no single stability measure can adequately explain genotype performance over multiple environments [51]. Stability analyses can be correlated for a trait in question, and therefore different types of stability analysis must be used [52]. The four major benefits of genetic diversity that aid in stability are complementation, cooperation, compensation, and capacity [3].
In accordance with previous studies, the BPPs and CCPs conferred higher stability for agronomic, quality, and disease resistance traits [10,12,53,54]. Deviation from Regression has been used to measure stability in barley CCPs, in which they were significantly more stable than the commercial genotypes across environments. The ability for CCPs to maintain yield stability is reliant on the genetic diversity and heterozygosity of the population [9]. The effect of compensation can explain the stability occurring in the EPs. Compensation is supported by the drastic change in yield ranking of the parents from the high to the low precipitation environments as seen in Figure 3. As one parent fails in the low environment, the other parent, which performs better in the low precipitation environment, compensates for the reduction in yield and maintains the mean of the population. Compensation is indicated by the extreme performance of Stephens, while the common parent, WA009733, has a more consistent performance.
The stability of BPPs and CCPs also derives from the stabilizing forces of natural selection. Natural selection favors genes at intermediate frequencies and drives allele frequency to equilibrium and stability. The equilibrium is also known as genetic homeostasis, and resists changes in allele frequency [37]. Low gene frequencies reduce genetic variability and increase environmental sensitivity [41]. The broader genetic base and heterozygosity of the EPs allows for a wider range of response to environmental pressure. The low genetic variation in the parental genotypes is driven by the fixation of favorable alleles and narrow genetic variation within the population. This creates uniform genotypes that do not have the genetic variation to deal with and respond to environmental fluctuations or novel stress factors. The uniform genotypes may not have alleles capable of the appropriate response. The inclusion of just one more parent in the CCPs compared to the BPPs, was not shown to include enough genetic variability for differentiation between the different types of EPs.
The stability of BPPs and CCPs displayed over the diverse precipitation zones in inland Washington, is an indication of the EPs ability to handle fluctuating environmental conditions. These results support the conclusions displayed by another breeding method termed evolutionary participatory plant breeding (EPPB) that was also examined across diverse precipitation zones in inland Washington [7]. These evolutionary breeding approaches have proven to develop EPs that are able to be maintained via continued selection in changing conditions that are displayed in either diverse environmental conditions or climate change.

Selection
For most plant breeders, both yield and stability are essential goals that allow high performance while facing abiotic and biotic stresses [9]. Stability is meaningful in selection when it is associated with high yield or excellent performance for other traits. A misconception of selecting both yield and stability is the reduction in yield when selection is based on criterion other than yield [34]. The stability component of Kang's Stability Statistic is based on Shukla's Stability Variance statistic and provides the contribution of a genotype to the total genotype-by-environment interaction attributable to all genotypes in the trial [34]. Flores et al. [50], found Kang's Stability Statistic and AMMI to be useful tools for simultaneously selecting for yield and stability [50]. The selection indices still ranked and selected the high yielding parents even with lower stability. Whereas the selection indices proved useful for negative selection for high protein and disease severity showing differential performance between all EPs.
Differences in performance between the parents and the EPs can be explained by the type of selection they were submitted to. The natural selection used in the creation of the BPPs and CCPs selects for fitness, which is defined as the contribution of genes to the next generation [41]. The underperformance of the EP in comparison to their adapted parent is due to the presence of the unfavorable alleles still present in the population. In natural selection, unfavorable genes are not removed from the population as fast as artificial selection due to the intensity of selection forces. This slower process may explain the lack of change in allele frequency and higher genetic diversity in the EPs. As long as a few plants pass on the unfavorable alleles, they will stay in the population, thus reducing the performance of the population. However, if the unfavorable alleles contribute to lower fitness, the allele frequency of the unfavorable alleles will be reduced. Since the change in alleles is due to the fitness of the genotypes, the selection intensity is generally less than in artificial selection. This results in a slower process to drive allele frequency change for specific environmental adaptation, which can occur due to the fixation of favorable alleles.
Artificial selection was the method of production for the parents. In artificial selection, only genotypes showing transgressive segregation are selected, and the genotypes not selected do not contribute any progeny to the next generation [41]. Artificial selection rapidly accumulates alleles to increase performance and response to selection. However, this can result in the complete loss of alleles and reduction in genetic diversity needed to respond to fluctuating environmental influences. The reduced response explains the low stability of the parental genotypes and creates a need to breed genotypes for specific environments, as mentioned previously.
The BPPs and CCPs performed best in the environment they were selected in, but greatly influenced by their pedigree. The performance of the EPs has been reported in previous studies [55,56]. The EPs' performance, coupled with the grouping of pedigrees, exhibits the ability of the EPB method to improve the performance of a population without significantly changing allele frequency outside of their original pedigree. This is not to say that natural selection did not change allele frequency. There was a significant improvement of the F 8 populations in respect of their initial F 3 populations. However, the F 8 populations performed better than the F 3 populations as long as the EPs' pedigrees were also adapted to the same environments they were selected in. Therefore, natural selection can be effective but also works in conjunction with the pedigree of the population. These observations can be explained by the narrow genetic base of having just two parents in both adapted to the same environment and by the slow and stabilizing change in allele frequency due to natural selection.
Our results show that natural selection was shown to select for higher-performing genotypes for a specific environment but does not overcome the parental pedigree and its performance. In our study, the BPPs and CCPs consist of pedigrees with parents adapted to the same environment. If the EPs consisted of parents adapted to different environments, they would potentially have a higher genetic diversity, and the capacity for natural selection to increase favorable alleles and performance regardless of their parental pedigree, as seen in Suneson [8].
One of the reasons natural selection increased F 8 's over F 3 's dependent on pedigree adaptation is that the pedigrees adapted to an environment have been selected over time to accumulate favorable alleles for a higher fitness in said environment. Therefore, since natural selection is a slow process, unfavorable alleles from poor-performing pedigrees or pedigrees not adapted to the environment may still be present in the EPs and reduce its performance. The EPs with adapted pedigrees would already have a majority of favorable alleles for the environment. Those favorable alleles would have higher fitness and rapidly drive the change in allele frequency and performance closer to their adapted parental genotypes. If given more time, the unfavorable alleles may be removed from the EPs and eventually increase its performance, similar to the observations in Suneson [8].
The demonstration showed that the BPPs and CCPs performed better in the environment in which they were developed in, and the higher stability demonstrated the ability for the EPB method to develop populations that are stable without sacrificing performance. These observations of both the CCPs and BPPs creates a valuable resource for breeders that allow the creation of varieties that can be selected in one environment and perform well across varying environments. The EPs will allow the consolidation of breeding populations and reduce the need for a large number of environments for testing. With the right selection of parents, the EPB method can be a valuable tool for breeders to focus their breeding programs without having to divide their resources for different precipitation zones. The potential for adaptation presented by this study shows the capability of the EPB method to develop stable BPPs and CCPs for genetic resources and germplasm for future pure line genotypes to be developed from or even released as BPPs and CCPs.

Conclusions
The EPB method significantly improved F 8 populations in respect of their initial F 3 populations in just 5 years of natural selection as long as these populations contained pedigrees that were also adapted to the same environments they were selected in. Both BPPs and CCPs were significantly affected by their pedigrees and did not always out-yield their parents, but the EPs performed better in the environment in which they were developed in. The EPs showed significant stability across precipitation zones and out-performed their respective parents in that regard. The EPB method was shown to increase stability and performance of both BPPs and CCPs similarly. This study displayed the validity of EPB in regard to creating genetic diversity adaptable to different precipitation zones, and as a valuable resource for breeding programs.