Development of a Statistical Crop Model to Explain the Relationship between Seed Yield and Phenotypic Diversity within the Brassica napus Genepool

: Plants are extremely versatile organisms that respond to the environment in which they find themselves, but a large part of their development is under genetic regulation. The links between developmental parameters and yield are poorly understood in oilseed rape; understanding this relationship will help growers to predict their yields more accurately and breeders to focus on traits that may lead to yield improvements. To determine the relationship between seed yield and other agronomic traits, we investigated the natural variation that already exists with regards to resource allocation in 37 lines of the crop species Brassica napus . Over 130 different traits were assessed; they included seed yield parameters, seed composition, leaf mineral analysis, rates of pod and leaf senescence and plant architecture traits. A stepwise regression analysis was used to model statistically the measured traits with seed yield per plant. Above-ground biomass and protein content together accounted for 94.36% of the recorded variation. The primary raceme area, which was highly correlated with yield parameters (0.65), provides an early indicator of potential yield. The pod and leaf photosynthetic and senescence parameters measured had only a limited inﬂuence on seed yield and were not correlated with each other, indicating that reproductive development is not necessarily driving the senescence process within ﬁeld-grown B. napus . Assessing the diversity that exists within the B. napus gene pool has highlighted architectural, seed and mineral composition traits that should be targeted in breeding programmes through the development of linked markers to improve crop yields. during development does not necessarily imply that resources will be translocated into seeds at a later stage. Flowering window has a non-signiﬁcant relationship with yield and biomass. The relationship between the majority of the leaf minerals analysed and yield was also non-signiﬁcant, the exceptions being canopy leaf potassium, which was strongly negatively correlated with yield ( − 0.432), total biomass ( − 0.403) and vegetative biomass ( − 0.375) and early leaf phosphorus, which was positively correlated with yield (0.281), total biomass (0.384) and vegetative biomass (0.408). pod chlorophyll, progression of plant development, oil/protein


Introduction
Crop yield is a complex trait determined by a number of contributing environmental and genetic factors. Mathematical modelling approaches are one way of synthesising information and simplifying the complex interactions that exist throughout a plant's life cycle to gain information about the most appropriate target traits that could increase yields. The rationale is that, if used as early predictors of final yield, models have the potential to inform growers of alterations in farming practices that, if implemented early in the season, could increase yields. Models exist for the three major cereal crops grown worldwide (rice, wheat and maize) [1][2][3], and all aim to predict yield in the face of environmental or genetic variation. Crops such as oilseed rape (canola, rapeseed, colza) are of increasing economic importance, yet they have an indeterminate growth habit compared to cereals, having been domesticated for only 4000 years compared to the 10,000 years for wheat, and therefore, require a dedicated approach in order to determine yield components that are of use to growers and breeders. Several models of yield prediction in oilseed rape have been generated previously [4][5][6][7][8].
As with all models, they have limitations; for instance, only applying data from optimum growth conditions, using just one variety of oilseed rape and/or being solely based on parameters derived from the literature. Hence, whilst modelling oilseed rape growth provides a useful tool to evaluate the parameters affecting yield, to date, this technique is still in its infancy and has not had the power of the diversity trial used in the present study to inform the modelling approach.
Oilseed rape (Brassica napus) is a crop species primarily harvested for its oil-containing seeds within temperate regions and is globally ranked as the third leading source of plant oil after soybean (Glycine max) and oil palm (Elaeis guineensis) [9]. The oil is used in the food industry, but the seed and remaining biomass material also have a number of other roles, including use as a protein meal for the feed industry and as a feedstock for biofuels. Its success as an oilseed can be attributed to the fact that its seeds are composed of~25% protein and~50% oil (w/w); the oil having a desirable composition of almost entirely unsaturated fatty acids [10], of which the primary components are the unsaturated fatty acids oleic acid (C18:1), linoleic acid (C18:2) and linolenic acid (C18:3; [11]. Brassica napus (AACC) is an amphidiploid species arising from a spontaneous hybridisation between B. rapa (AA) and B. oleracea (CC) [12]. Compared to other crops, such as wheat, oilseed rape has only recently undergone a domestication event [13]; this is highlighted by the increase in its cultivation as a crop between 1961 and 2013 when there was a 481% global increase in the area of oilseed rape harvested, contrasting with a 7% increase in wheat over the same period [14]. This observation could be partly attributed to the fact that oilseed rape is often used as a break crop within a wheat rotation, yet within these few decades, seed quality has already been improved by reducing the concentration of erucic acid and glucosinolates (GSLs), which were believed to be anti-nutritional seed components [15]. This may have been achieved at a cost, as one study linked Quantitative Trait Loci (QTL) for seed yield in winter oilseed rape to high Glucosinolate (GSL) concentration, revealing that reducing the latter has the potential to negatively impact seed yield [16].
As crops become domesticated, there is strong selection against the natural variation in plant development and architecture within the population, leading to increasingly uniform monocultures that are suited to mechanised harvest at a single time point. In general, the growth habit, yield, pest/pathogen defence system and response to environmental conditions throughout the growing season largely depend on the genotype of the accession sown [17].
In order to maximise crop yield, a plant within a field-based canopy has to balance the extent of vegetative growth, the length of the photosynthetic period, the number of reproductive structures in which it invests and the amount of resource (protein, lipid, carbohydrate) that it imports into each seed [18]. It is also important that the plants within a canopy retain the adaptive capacity (plasticity) to respond to environmental changes and exploit any extra resources that may accrue during the growing season [19]. The optimal idiotype of oilseed rape is yet to be defined; the crop currently produces extensive above-ground vegetative biomass and hence has a low harvest index (the ratio between harvestable yield and total plant biomass). Our hypothesis is that plants with high yield in terms of Agronomy 2017, 7, 31 3 of 26 seed mass per plant achieve this as a consequence of more pods per plant rather than a positive increase in resource accumulation within individual seeds or pods. Whilst the vegetative structure of the plant contributes towards increased pod production by providing more sites at which pods can be formed, it can also reduce the amount of incident light reaching photosynthetic components lower down in the canopy. Even the relatively small petals are believed to block out~60% of the photosynthetically-active radiation (PAR) [20].
The timing of leaf senescence is not predetermined, but is influenced by both environmental [21] and genotypic factors [22,23], which will ultimately affect both the number of fruits that develop and the extent to which the seed will fill. However, the relationship between senescence and yield is a complex one and not fully understood [24], especially in oilseed rape. The role of the pod wall extends far beyond that of a protective organ [18,25], as the pods are themselves photosynthetic units that senesce, contributing 50%-60% of the final dry mass of the mature plant [26]. It has been noted that Arabidopsis accessions that naturally senesce early have a greater number of pods per plant compared to late-senescing accessions (although the nutritional composition and weight of these seeds were not stated), indicating a greater investment in reproductive as opposed to vegetative development [22,23]. Some Arabidopsis mutants that exhibit a delayed senescence phenotype, such as the abscisic acid insensitive (abi3) mutant, develop more pods per plant compared to the wild-type [27]. This delay was predicted to enhance yields by providing a longer photosynthetic period in which to synthesise photoassimilates for subsequent re-allocation into reproductive structures. Whilst this might increase carbon assimilation in the seed, nitrogen (N) remobilisation becomes delayed in late senescing Arabidopsis phenotypes [28], so while delaying senescence might result in higher yields, the trade-off is that the seeds contain less protein [17]. Hence, when choosing ideal traits for crops, there is confusion over whether maximum yield will be achieved from fast or slow senescing lines [29].
The diversity trials performed as part of the Oilseed RapE Genetic Improvement Network (OREGIN) project [30,31] sampled in the current study provided a robust basis to begin modelling, as they represent the genetic diversity that exists across the domesticated B. napus gene pool, therefore providing a useful insight into direct future breeding programmes. The present study developed statistical models of yield across two years that were able to elucidate the most important architectural and quality traits related to yield and that provide a predictive model of yield based on parameters that could be measured prior to pod formation and then acted on to maximize yield.

Development of an Efficient Statistical Model to Explain Yield
Using data for 45 architectural and physiological traits measured in Year 1 of the field trial, forward selection was used to develop a deterministic statistical model to explain log [yield]. Data from only the varieties present in both years of the trial were included. Yield was defined as the weight of seeds harvested per plant. The varieties were grown under two nitrogen treatments in Year 1, but only in residual soil nitrogen in Year 2; residual N was found to be 18-22 kg ha −1 in both years, and high N in Year 1 was determined as 148 kg ha −1 . A two-factor ANOVA from Year 1 data to compare the significance of variety and nitrogen on yield showed that both terms are significant factors in predicting yield, with variety explaining 44.5% of the variance and nitrogen explaining 23.1% of the variance, respectively (Table S1). A comparison by ANOVA using variety and year terms showed that variety was significant (explained 37% of the variance), but year was not (explained 0.7% of the variance), thus validating our approach of comparing data across years. Variety was excluded as a significant term from the analysis in order to examine which traits consistently contribute to high or low yield across different accessions ( Table 1). The model explained over 50% of the yield variance with the inclusion of five trait terms: chlorophyll a from canopy leaves, seed linolenic acid content, number of seeds per pod, the rate of N applied to the plot, glucosinolate content of the seeds. The effect of N was positive overall and confirmed that addition of N increases yield. However, N treatment (high/low) did not segregate the dataset into two distinct groups according to yield; some varieties grown on low N had higher yield than others grown on high nitrogen and vice versa (Figure 1a). A two-way ANOVA of log[yield] against variety and N showed that the size of the effect differs between varieties, and for most varieties in the present study, the effect of N on seed yield per plant is not significant (Figure 1b), although if more years/blocks were available, other varieties in the middle of the graph may also show a significant response. is not significant (Figure 1b), although if more years/blocks were available, other varieties in the middle of the graph may also show a significant response.

Genetic Factors Are Stronger Predictors of Yield than Environmental Factors within Year
In order to explore models that could predict yield independently of N application, and in preparation for characterising Year 2 data where only one N treatment was included in the trial, statistical models were developed independently for low and high N treatments ( Figure 2 and Table 2). Both models performed slightly less well than the combined model ( Figure 1a). Each shared some covariates with the combined model, with seeds per pod and seed linolenic acid common between the combined and low N models and log[seed glucosinolate] common between the combined and high N models. There were no common covariates between the low and high N models. Having seed compositional traits as core terms in both the combined and separated models, as opposed to the terms being entirely of growth and development traits, suggests that the combined model is applicable to a range of growing environments. The six terms in the low N model accounted for 52.67% of the variance in yield. The low N treatment demonstrated the importance of senescence when resources are limited; a higher percentage of green pods at the stages examined showed a positive and significant (p < 0.05) relationship with yield ( Table 2). Variety was excluded as a candidate for this model; however, in a separate model, where only variety was considered, it explained 44.50% of the variation in yield (across all plots), rising to 60.92% in high N plots and 78.43% in low N plots. Therefore, although N does have an impact on yield, the genetic basis of yield (i.e., variety) is a much more powerful predictor.

Developing an Improved Model to Explain Yield
Application of the Year 1 low N model to Year 2 data (the crop was only grown under low N in Year 2) showed that the model was poor at explaining yield, being able to only account for 25.48% of yield variation (Table 3, Figure 3a). This is half of the yield variation explained by the same model in

Developing an Improved Model to Explain Yield
Application of the Year 1 low N model to Year 2 data (the crop was only grown under low N in Year 2) showed that the model was poor at explaining yield, being able to only account for 25.48% of yield variation (Table 3, Figure 3a). This is half of the yield variation explained by the same model in Year 1 (52.67%) and demonstrates that within-year variation in yield is dominated by genetic factors, whilst between-year variation is strongly influenced by the environmental conditions to which the plants were exposed. However, although the estimates of variation for the terms used in the model are different between years, we see the same directional effects, i.e., there is a negative estimate of variation for seed linolenic acid and a positive estimate of variation for seeds per pod.
To develop the model further, a second forward selection process was performed to determine which traits best explained log[yield] in Year 2. Plants in Year 2 were grown on residual N only, and additional traits were recorded. As predicted, variety was highly significant (p = 0.009) in predicting log[yield], but as with Year 1, this was not included as a term in the following analyses in order to determine traits that contribute to the yield difference observed between varieties. Seed compositional traits, such as linolenic acid, total seed oil and seed protein content, were important factors; however, total vegetative mass (excluding seeds) was the most significant term, explaining 73.70% of log[yield] (Table 3, Figure 3b). Seed oil was shown to be a positive term, whereas protein content had a negative impact on yield, demonstrating that the oil:protein ratio is important in determining total yield. In total, the forward selection model accounted for 95.12% of the variation contributing to log [yield].
Given that total above-ground biomass at harvest was such an important predictor of log[yield], we investigated this trait further and used forward selection to identify the traits and variables that explained above-ground vegetative biomass at harvest (Table 3, Figure 3c). A consequence of including a trait in the model is that other traits correlated with it are less likely to be significant themselves and are not included in the model; hence, none of the terms explaining biomass appear in the yield model, as the biomass terms acts as a proxy for them all. The first three predictors of vegetative biomass were all architectural traits with positive effects, but these three traits only explain 77.74% of the vegetative biomass, suggesting that other traits listed in both models make a small, but significant contribution to yield. The first four terms in the biomass model (stem area, number of branches on the primary stem, plant height and pod carotenoid content at Week 41) were all measured pre-harvest, thus providing a means of predicting yield.

Ten-Fold Cross-Validation of the Year 2 Models
In order to investigate how robust these models are to new data, we divided the plots from Year 2 at random into ten groups (folds). We then fitted the models (estimated the parameters) using nine of the folds and then applied the model to the remaining fold and measured the percentage of variance explained. We repeated this procedure for each of the ten folds in turn. Over the 10 folds, the percentage of variance explained by the yield model ranged from 83.28%-97.90%, with an average of 93.61%. The same approach was applied to the "Vegmass" model, where the average percentage of variance explained was 67.57%. As expected, there is a slight drop in the percentage of variation explained compared to the original model that used all of the available data, but the decrease is not substantial, suggesting that the models are robust to new data obtained from plants grown under similar environmental conditions.

Seed Oil Content Rather Than Protein Drives Seed Yield
A correlation matrix was used to visualise positive and negative relationships between different traits ( Figure 4). The matrix showed (factors mentioned in the text below are boxed in blue in Figure 4) that the oil:protein ratio was correlated positively with seed number per pod (0.756) and negatively with seed packing density per pod (−0.702). This appeared to be driven by the lipid component, since there was a negative relationship between protein content and seed weight per pod (−0.627). This suggests that seed yield gain for an individual plant was driven by increased oil content. There was also a strong negative relationship between pigment (chlorophyll and carotenoid) content of the pod and protein/sugars (range from −0.239-−0.579). This implies that senescence may drive the remobilisation of sugars and proteins into the developing seed and that retention of pigment in the pod during development does not necessarily imply that resources will be translocated into seeds at a later stage. Flowering window has a non-significant relationship with yield and biomass. The relationship between the majority of the leaf minerals analysed and yield was also non-significant, the exceptions being canopy leaf potassium, which was strongly negatively correlated with yield (−0.432), total biomass (−0.403) and vegetative biomass (−0.375) and early leaf phosphorus, which was positively correlated with yield (0.281), total biomass (0.384) and vegetative biomass (0.408). component, since there was a negative relationship between protein content and seed weight per pod (−0.627). This suggests that seed yield gain for an individual plant was driven by increased oil content. There was also a strong negative relationship between pigment (chlorophyll and carotenoid) content of the pod and protein/sugars (range from −0.239-−0.579). This implies that senescence may drive the remobilisation of sugars and proteins into the developing seed and that retention of pigment in the pod during development does not necessarily imply that resources will be translocated into seeds at a later stage. Flowering window has a non-significant relationship with yield and biomass. The relationship between the majority of the leaf minerals analysed and yield was also non-significant, the exceptions being canopy leaf potassium, which was strongly negatively correlated with yield (−0.432), total biomass (−0.403) and vegetative biomass (−0.375) and early leaf phosphorus, which was positively correlated with yield (0.281), total biomass (0.384) and vegetative biomass (0.408).  Multi-Dimensional Scaling (MDS) was used to project the distances between traits in a schematic diagram ( Figure 5). The correlation between each pair of traits was converted into a distance, where a correlation of ±1 was given a distance of zero and a correlation of zero was given a distance of one. We then used MDS to show the best two-dimensional projection of these distances; traits that are positioned close together are highly correlated. Traits that the models (Table 3) deem useful for predicting yield and/or biomass are circled, and colour is used to group traits according to their type: yield, architecture, leaf chlorophyll, pod chlorophyll, progression of plant development, oil/protein content and minerals. Biomass, the most important predictor of yield, was located relatively close to the yield term, and biomass itself is positioned closely to the architectural traits nearby. The next two predictors from the linear yield model, seed protein and canopy leaf chlorophyll b content, are located further away from the yield term. Leaf mineral analysis makes no contribution to the yield models. Perhaps surprisingly, traits relating to floral development are not important for the yield model, so the date of first flower opening, photoperiod, flowering duration and time to pod shatter had no significant impact on yield.
the yield term, and biomass itself is positioned closely to the architectural traits nearby. The next two predictors from the linear yield model, seed protein and canopy leaf chlorophyll b content, are located further away from the yield term. Leaf mineral analysis makes no contribution to the yield models. Perhaps surprisingly, traits relating to floral development are not important for the yield model, so the date of first flower opening, photoperiod, flowering duration and time to pod shatter had no significant impact on yield. The correlation between each pair of traits was converted into a distance, where a correlation of ±1 was given a distance of zero and a correlation of zero was given a distance of one; a two-dimensional projection of these distances is shown in the figure. Traits that are close together in this space are highly correlated. Traits are coloured according to the type of trait: measures of yield (black), architecture (red), lower leaf and upper leaf chlorophyll (green), pod chlorophyll (blue), timing of plant development (cyan), NIRS data (pink), early leaf and canopy leaf chlorophyll (yellow) and early leaf and canopy leaf minerals (grey). Traits used in the models for predicting yield and/or vegetative biomass are circled.

Figure 5.
Multi-dimensional scaling of all traits measured from Brassica napus. The correlation between each pair of traits was converted into a distance, where a correlation of ±1 was given a distance of zero and a correlation of zero was given a distance of one; a two-dimensional projection of these distances is shown in the figure. Traits that are close together in this space are highly correlated. Traits are coloured according to the type of trait: measures of yield (black), architecture (red), lower leaf and upper leaf chlorophyll (green), pod chlorophyll (blue), timing of plant development (cyan), NIRS data (pink), early leaf and canopy leaf chlorophyll (yellow) and early leaf and canopy leaf minerals (grey). Traits used in the models for predicting yield and/or vegetative biomass are circled.

Analysis of Varieties to Determine Those Produce High Yields and Highlight Gaps in Breeding Potential
Having established which traits are important for predicting yield (Table 3), next we investigated which varieties displayed the traits contributing to high yield. For this analysis, we used the mean of both blocks for each variety for each trait contributing to the models for yield and above-ground vegetative biomass at harvest ( Table 3). Each of the traits was scaled to have a mean = 0 and variance = 1. For the cluster analysis, we took the Euclidean distance between each pair of varieties and combined varieties using Ward's method, creating three groups from the whole dataset ( Figure 6). A combined approach was then taken to include the other traits shown to be important in our earlier models and to cluster them all against the varieties studied (Figure 6b). Varieties are ordered according to the cluster analysis shown in Figure 6a. The top cluster contains all of the highest yielding varieties and the bottom cluster all the lowest yielding varieties. High seed protein, seed linolenic acid content, chlorophyll b content of the canopy leaf and early leaf calcium content were all negatively correlated with yield; however, it should be noted that canopy leaf chlorophyll b was a positive term within the statistical model for log [yield]. This is because the correlation is relatively weak with yield (0.11), biomass (0.03) and protein (0.09), so canopy leaf chlorophyll b only becomes an important term once biomass and protein are taken into account. Conversely, the architectural plant traits of raceme area and plant height were positively correlated with yield, together with lipid content and pod weight for all varieties, except Darmor, Rameses and Canard.
all of the highest yielding varieties and the bottom cluster all the lowest yielding varieties. High seed protein, seed linolenic acid content, chlorophyll b content of the canopy leaf and early leaf calcium content were all negatively correlated with yield; however, it should be noted that canopy leaf chlorophyll b was a positive term within the statistical model for log [yield]. This is because the correlation is relatively weak with yield (0.11), biomass (0.03) and protein (0.09), so canopy leaf chlorophyll b only becomes an important term once biomass and protein are taken into account. Conversely, the architectural plant traits of raceme area and plant height were positively correlated with yield, together with lipid content and pod weight for all varieties, except Darmor, Rameses and Canard.  Figure 7. The study included 35 different plant varieties. The mean of both blocks was used for each variety for each trait; each of the traits was scaled to have a mean = 0 and variance = 1, and then cluster analysis was performed by taking the Euclidean distance between each pair of varieties and combining varieties using Ward's method. The distance between varieties is defined as equal to the sum of the squares of differences across all traits analysed in Figure  5. (b) Analysis of the traits contributing to the yield and vegetative biomass models clustered by variety. The mean of both blocks was used for each variety for each trait; each of the traits was scaled to have a mean = 0 and variance = 1, and then cluster analysis was performed by taking the Euclidean distance between each pair of varieties and combining varieties using Ward's method. The distance between varieties is defined as equal to the sum of the squares of differences across all traits analysed in Figure 5. (b) Analysis of the traits contributing to the yield and vegetative biomass models clustered by variety.
In the model for explaining yield, the top two terms were vegetative biomass (positive) and total protein (negative). Total seed oil also (positively) contributed to yield. Therefore, the oil:protein ratio and biomass traits were used to illustrate the clustering of different varieties in two-dimensional space (Figure 7). Canard produced the highest yield, but had a comparatively low oil:protein ratio. It also had a very poor score for raceme stiffness and has been observed to lodge readily in the field; therefore, although Canard had a high vegetative biomass, the architecture was not sufficiently robust. The varieties from the cluster coloured grey also had high yield, but show a higher oil:protein ratio than other clusters, meaning that the yield increase is achieved by assimilating more oil in the seeds. The other groups had lower vegetative biomass, lower oil:protein ratio and lower yield, with Ningyou having the poorest phenotype of all of the varieties. Yield gaps are apparent (green squares on Figure 7) where breeding and selection, or agronomy, could be used to target plants with high oil content from a smaller vegetative biomass (by changing the harvest index to put more resources into seeds), or large plants with high yields of oil-rich seeds, or large plants with low oil content that could potentially be used as biomass crops.
seeds. The other groups had lower vegetative biomass, lower oil:protein ratio and lower yield, with Ningyou having the poorest phenotype of all of the varieties. Yield gaps are apparent (green squares on Figure 7) where breeding and selection, or agronomy, could be used to target plants with high oil content from a smaller vegetative biomass (by changing the harvest index to put more resources into seeds), or large plants with high yields of oil-rich seeds, or large plants with low oil content that could potentially be used as biomass crops.

Figure 7.
Varietal distribution across the Brassica napus diversity trial in two-dimensional space using vegetative biomass and oil:protein ratio. The yield of each variety is shown by the radius of the circle; colours are used to identify the varieties within each cluster as determined in Figure 6. Green boxes indicate yield gaps where no varieties currently exist.

Traits Measured Early in Crop Development Are Also Good Predictors of Yield
Our earlier model for explaining yield (Table 3) relied on many traits that can only be observed at, or close to, the time of harvest. While that model is useful for selecting the variety for growing in subsequent seasons, a model that forecasts yield during the growing period would also be more useful. Traditional early observations, such as date of flowering and leaf chlorophyll retention, had very poor correlation with seed yield (Figure 8).
Observations in the field of the highest and lowest yielding varieties showed that there was very little visual difference in colour of pods on the main raceme from May-July (Figure 9a), although the absolute leaf chlorophyll levels in April demonstrated clear differences between the varieties ( Figure  8). However, when the canopy was viewed as an entirety, there were clear differences in senescence Figure 7. Varietal distribution across the Brassica napus diversity trial in two-dimensional space using vegetative biomass and oil:protein ratio. The yield of each variety is shown by the radius of the circle; colours are used to identify the varieties within each cluster as determined in Figure 6. Green boxes indicate yield gaps where no varieties currently exist.

Traits Measured Early in Crop Development Are Also Good Predictors of Yield
Our earlier model for explaining yield (Table 3) relied on many traits that can only be observed at, or close to, the time of harvest. While that model is useful for selecting the variety for growing in subsequent seasons, a model that forecasts yield during the growing period would also be more useful. Traditional early observations, such as date of flowering and leaf chlorophyll retention, had very poor correlation with seed yield (Figure 8). between the high and low yielding varieties (Figure 9b) with high yielding plants having a slower senescing canopy. When the extreme lines (defined as the top ten lines for seed yield, retention of pod chlorophyll and biomass) were analysed, only two high yielding cultivars, Canard and Victor, also had high biomass and slow pod senescence. In other cases, high biomass lines overlapped with high yielding lines, but retention of pod chlorophyll did not correlate well (Figure 10a).

Figure 8.
Scores for nine traits associated with resource allocation and yield for the thirty-five varieties used in this study, ordered by yield (outer ring) and coloured from largest (red) to smallest (green) in each ring. Averages of all replicates over both blocks were used. White spaces = missing data.
It was essential to develop a model that can predict final yield prior to visual indicators of yield differences. The forward selection procedure was repeated, but only allowing covariates that were unrelated to flowers or pods in the model. The fitted model (Table 4, Figure 10b) explains 63.69% of the variance in yield, compared to 95.12% in the earlier model (Table 3). This reduction was due to Figure 8. Scores for nine traits associated with resource allocation and yield for the thirty-five varieties used in this study, ordered by yield (outer ring) and coloured from largest (red) to smallest (green) in each ring. Averages of all replicates over both blocks were used. White spaces = missing data.
Observations in the field of the highest and lowest yielding varieties showed that there was very little visual difference in colour of pods on the main raceme from May-July (Figure 9a), although the absolute leaf chlorophyll levels in April demonstrated clear differences between the varieties (Figure 8). However, when the canopy was viewed as an entirety, there were clear differences in senescence between the high and low yielding varieties (Figure 9b) with high yielding plants having a slower senescing canopy. When the extreme lines (defined as the top ten lines for seed yield, retention of pod chlorophyll and biomass) were analysed, only two high yielding cultivars, Canard and Victor, also had high biomass and slow pod senescence. In other cases, high biomass lines overlapped with high yielding lines, but retention of pod chlorophyll did not correlate well (Figure 10a).    Residual standard error: 0.2048 on 69° of freedom; multiple R-squared: 0.6369; adjusted R-squared: 0.6054; significance codes: 0.001 '***'; 0.01 '**'; 0.05 '*'. showing overlap between them. Temple is one of the most commonly-grown varieties, yet it is atypical in linking chlorophyll retention to high yield. In general, biomass and yield overlap (big plants = more seed) and varieties with a delayed senescence phenotype do not yield highly. (b) Model for predicting yield using traits that were observed at or before the time of flowering. The model was developed using nine terms and applied to Year 2 data collected from plants grown on low nitrogen (22 kg ha −1 ) plots in a randomised block design. R 2 = 0.6369.

Discussion
Seed yield is a complex trait influenced by numerous interacting variables that are governed by both genetic and environmental factors. Several studies have sought to understand the contribution of individual traits to oilseed rape yield [4,6,[32][33][34][35] or to model yield such that it can be predicted over future growing seasons [5,7,8,36]. However, there is some disagreement between the studies as showing overlap between them. Temple is one of the most commonly-grown varieties, yet it is atypical in linking chlorophyll retention to high yield. In general, biomass and yield overlap (big plants = more seed) and varieties with a delayed senescence phenotype do not yield highly. (b) Model for predicting yield using traits that were observed at or before the time of flowering. The model was developed using nine terms and applied to Year 2 data collected from plants grown on low nitrogen (22 kg ha −1 ) plots in a randomised block design. R 2 = 0.6369.
It was essential to develop a model that can predict final yield prior to visual indicators of yield differences. The forward selection procedure was repeated, but only allowing covariates that were unrelated to flowers or pods in the model. The fitted model (Table 4, Figure 10b) explains 63.69% of the variance in yield, compared to 95.12% in the earlier model (Table 3). This reduction was due to the omission of information regarding the later stages of plant development. However, the model still offers a good method for predicting yield approximately 15 weeks prior to harvest, i.e., approximately April of the same year. The model comprises two of the best architectural variables (stem area and numbers of branches on the primary stem) that were used to explain vegetative biomass in the earlier model. Canopy leaf chlorophyll b content, which was in the earlier model for predicting yield, was also included in the predictive model (Table 4). Interestingly, two leaf mineral concentration traits were also selected, canopy leaf potassium and canopy leaf magnesium.

Discussion
Seed yield is a complex trait influenced by numerous interacting variables that are governed by both genetic and environmental factors. Several studies have sought to understand the contribution of individual traits to oilseed rape yield [4,6,[32][33][34][35] or to model yield such that it can be predicted over future growing seasons [5,7,8,36]. However, there is some disagreement between the studies as to what are the most important traits that influence yield, be it the number of pods per plant [34], the time to maturity [6] or the duration of flowering [32]. One of the reasons for the discrepancies observed between the different studies is the limitations in the number or type of traits that were measured or the use of a limited number of cultivars, which constrained the genotypic variation that could be observed.

Greater Branching Density Drives Yield Improvement
Oilseed rape architectural traits accounted for the greatest percentage of variation in above-ground biomass at harvest. Significant positive correlations were observed between both total above-ground biomass and vegetative biomass and seed yield. Thus, although partitioning more resources into vegetative development could be seen to divert resources away from reproductive development, a larger above-ground biomass with increased branching, supported by a larger stem and taller plants, gives rise to more available positions upon which pods can develop. In this study all plots were planted at the same seed rate within year, but it is well established that lower seed rates can be used to produce plants with more branches than those in tightly-packed canopies [37].
Furthermore, an increase in seed yield per plant does not seem to be the result of heavier seeds, as judged by the lack of correlation between total seed yield and Thousand Grain Weight (TGW), a result that was also observed in Arabidopsis [17] and kale [38]. This also supports our hypothesis that increased yield in terms of seed mass per plant is mainly a consequence of more pods rather than a positive increase in resource accumulation within individual seeds or pods. Whilst seed yield per plant is positively correlated with plant height, this is not a trait that breeders would wish to emphasise, as tall plants are generally more susceptible to lodging [39], which can decrease seed yields by up to 16% [40]. For instance, transgenic dwarf oilseed rape lines have been found to have increased yields compared to their taller counterparts [41], and therefore, the breeding target would be to shorten the internode length whilst maintaining or increasing branch number on each plant. This is most likely to be achieved by targeting the plant hormone signalling network, which mediates the genetic regulation of responses to the environment, facilitating individual plants to change their architecture over time in response to the external conditions [42]. Cytokinins promote the emergence of axillary buds into fully-formed branches, whereas strigolactones inhibit branching. Plants are sufficiently sophisticated so that branching can be temporally or spatially regulated within an individual plant; indeed several authors regard plants as a system of competing populations of redundant organs since no individual pod, leaf or flower can be regarded as indispensable [43], and it is known that oilseed rape produces more flowers than are required to obtain maximum yield [35].

Increasing Seed Oil:Protein Ratio
The model also showed that seed oil and protein were important drivers of yield; indeed, the oil:protein ratio (derived from the seed protein and oil terms used in Model 3b) explains a further 8.1% of the variation within yield, once biomass is taken into account. Grami et al. [44] also established that seed oil content is negatively correlated with protein content; therefore, increasing the oil fraction of the seed would naturally enhance the oil:protein ratio and, according to our model, potentially help increase yields independently of changing architectural traits, although the mechanism by which this conversion occurs is presently not clear. A recent study using Arabidopsis populations established that seed carbon and N content were antagonistic and that % N was negatively correlated with yield [45], as it is in the present study. In Arabidopsis, the number of seeds and pods per plant were closely related to the concentration of seed oil and protein within the seeds [19]. It was shown for soybean that a 1-kg −1 increase in oil content will usually lead to a 2-kg −1 decrease in protein due to a negative genetic correlation between the two yield components [46]; therefore, changing the oil:protein ratio towards oil production would naturally influence seed yield when defined as the total weight of seeds per plant. None of the lines studied have both a high above-ground biomass and a high oil:protein ratio, a combination that would be predicted to produce even greater yields than those recorded. The lines that show the greatest above-ground biomass (and highest seed yields) could also have their vegetative material utilised for the production of bioethanol and biogas, easing competition with food crops for land and resources. Whilst this technology is still being developed, especially in the case of appropriate pre-treatments to extract sugars from the lignocellulosic plant cell walls, yields of 20.4 g methane 100 g −1 DM and 10.9 g ethanol 100 g −1 DM have been reported for oilseed rape [47].
Although these values are much lower than those for sugarcane (37 g ethanol 100 g −1 DM) [48], it is still a useful by-product of oilseed rape production and a technology under development.
In the current study, there was only limited variation between the lines in terms of seed total oil content (45%-58%), but the range of individual fatty acids was highly variable between varieties (Table S1). Many of the oilseed rape varieties grown commercially are High Oleic acid, Low Linolenic acid (HOLL) and the current population contained HOLL and non-HOLL varieties. Seed linolenic acid was consistently and negatively associated with yield; a reflection of the breeding programme that has already taken place to increase yield simultaneously with developing varieties that produce healthy, low trans-fat and stable frying oils and have good disease resistance [49]. It also demonstrates that our model is robust and a true reflection of the genetic basis of yield. Transgenic or mutation (TILLING) technologies would appear to be the most effective method of manipulating the composition of fatty acids within the seed such that it fits the purpose of use, be it for the production of lubricants in industry, or of edible vegetable oils [50], or as a way of producing omega 3-rich oils to enter the human food chain [51]. For example, expressing the yeast gene glycerol-3-phosphate dehydrogenase (gpd1) under the control of a seed-specific promoter increased B. napus seed oil content by 40% [52].

Early Leaf P Status Correlates with Final Yield
The predictive yield model showed that the traits that explain most of the variation associated with yield are architectural traits; beyond this, the concentration of minerals such as canopy leaf potassium (K), early leaf phosphorus (P) and early leaf N significantly influenced yield. Previous work surrounding the effect of P fertilisation on seed yields was inconclusive, with reports of increasing P having no effect on total seed oil and protein concentration [53,54], decreasing seed linoleic acid concentrations [55] or increasing total seed oil concentrations [56]. Therefore, more research is necessary to clarify the effect of P on yield. One hypothesis is that early leaf P reflects good access to this poorly mobile nutrient in the soil and reflects a strong root system capable of supplying resources during crop development. Given that early leaf P status is a trait that can be measured before the pods have begun to develop, it could provide a useful early indicator of seed yield. Equally, raceme width at the time of flowering was a good indicator of biomass and capable of predicting 36.64% of yield. Similarly, a study in maize found that measurements of raceme width early in the growing season also provided a good predictor of final yield [57]; these early indicators of yield could be used by growers to modify inputs and optimise yield in order to maximise profit margins.

Number of Pods Determines Yield
The model presented here used the weight of seeds per plant as a measure of yield; however, we also explored whether other parameters of yield, which are used commercially, such as TGW or the number of seeds per pod, could be used as a measure of final seed yield. From the correlation matrix, it became evident that traits previously used as a proxy for seed yield per plant, such as the number of seeds per pod [58], seed weight per pod [59], pod length [60] and even TGW, were not significantly correlated with seed yield per plant in this study (Figure 4). This finding indicates that current commercially-used measures, such as TGW, may not be the most appropriate indicator of yield for breeding and selection purposes. Only the numbers of seeds per plant and pods per plant were significantly correlated with seed yield per plant. This indicates that the pod in B. napus is a fairly conserved organ and that a plant will preferentially invest in the production of more pods per plant as opposed to increasing the number or mass of seeds within an individual pod. Once a seed contains the minimum amount of resources necessary to ensure germination viability, then there is no advantage for a plant in investing more resources beyond this in an individual seed, and a better survival strategy is to make more seeds in different pods, a view also reported in a study on Arabidopsis resource allocation [19]. Similarly, the addition of N was found to increase oilseed rape yields through the production of more pods as opposed to affecting individual seed or pod weight [33], something also observed in Year 1 of the current trial where plants were grown on high and low N.

Flowering Start Date and Duration Do Not Affect Seed Yield
Oilseed rape has been bred to be self-fertile, but it can also be pollinated by wind and insects; the latter was found to enhance seed oil content quality, potentially due to the ability of insects to optimise the timing of fertilisation for the plant [61]. In Arabidopsis grown under controlled conditions, initially under short days and then switched to long days to induce flowering, the plants that flowered later had a lower yield [17]. In contrast, in the current field-based study, it was found that the flowering start date and flowering window were related neither to the number of pods per plant, nor to seed yield. Thus, increasing the flowering duration does not appear to increase pollination efficiency, perhaps because the general decline in pollinators within agricultural systems [62] means insufficient exploitation of the additional flowers. The increased early flowering nature of lines with long flowering duration may also have increased the unwanted presence of pollen beetle pests (Meligethes spp.), which are attracted by the visual and olfactory properties of oilseed rape flowers [63]. Thus, within the current field trial, the unopened buds on plants of those lines that flowered earlier may have been at increased risk from feeding damage than those on plants not yet flowering, especially if pollen resources were scarce due to high intraspecific competition, and this may partly explain why it did not afford a yield advantage to be flowering earlier or over a longer period.

The Relationship between Plant Senescence and Seed Yield
In this study, neither chlorophyll concentration nor senescence of the pods were related to seed yield per plant or other yield parameters. This is in contrast to previous work on Arabidopsis [64] and wheat [65] where delaying leaf senescence led to an increase in seed yield. The majority of photosynthetic tissue in an oilseed rape plant is raceme material; therefore, a greater biomass would also provide a larger photosynthetic surface area through an increased number of stems. In the present study, canopy leaf chlorophyll b content (a proxy measure for the stage of senescence) had a positive relationship with final yield, suggesting that the leaf resources are mobilised into the developing plant prior to abscission via a route that contributes to seed yield; hence, the parameter was included in the predictive model. Chlorophyll b content is a measure of light-harvesting complex status rather than of reaction centre complexes, and light harvesting complexes (LHC) are second only to Rubisco as sources of recyclable protein [66]. LHC also tend to be in excess, so the chloroplast might be able to lose significant quantities of chlorophyll b and associated proteins without it having a negative effect on photosynthesis. In cereals, photosynthesis of the canopy is largely responsible for the carbon yield of seeds, whereas seed N is mobilized from senescing vegetative tissues [67]. Stems have also been shown to play a pivotal role in B. napus seed filling by contributing 31% of the carbon contained within seeds [68]. Experiments in which soy bean pods were removed from mature plants showed that N fixation continued to take place in the roots [69]; elevated levels of N represent a 'metabolic sink' for photosynthate, thus altering carbohydrate metabolism and sugar signalling in source leaves. Leaves therefore accumulated starch products that would normally be mobilised to the pods. De-podding Arabidopsis showed that restricting the number of sinks increased the concentration of lipids in the remaining pods [19]; since B. napus loses leaves relatively early in its reproductive cycle, it seems reasonable to conclude that canopy photosynthesis is unlikely to contribute to oil/protein content in the seed, but it may make a contribution to carbon accumulation. However, as discussed in Bennett et al. [18], Arabidopsis is a weed that treats each pod as an individual unit, and local remobilisation of photosynthates into the seeds makes sense as it is important that each pod shatters as soon as the seed attains viability in order to stand the best chance of reproductive success. In contrast, the artificial selection that occurs within crop species has sought to synchronise seed maturation across the plant; therefore, the photosynthetic window of a pod is less closely related to yield and shatter since the pod sits in 'suspended animation' waiting for the whole plant to be ready to shatter. The extent of developmental coordination is poorer in oilseed rape compared to other crop species that have been domesticated for longer, such as cereal crops [70,71], but artificial selection has started to influence the loss of weedy reproductive traits. Other authors have found that seed maturity, as opposed to pod wall maturity, drives pod shatter in oilseed rape [72], highlighting the uncoupling of pod wall senescence in modern varieties from shattering and seed yield.
An increase in the number of pods per plant does not affect the rate of pod senescence or pod development in oilseed rape, giving credence to the idea that pod reproductive development is not necessarily driving the senescence process, but it is temporally fixed under growing conditions that impart only minimal stress on the plant. Just as leaf senescence has been shown to naturally occur in an age-dependent manner [73,74], so whole plant senescence also appears to be controlled by the age of the plant, an assumption borne out by the fact that there was little variation in the lifespan of different oilseed rape lines despite their genetic diversity. Plant longevity appears to be connected to plant mass, with small plants having a shorter lifespan than larger ones; however, within-species lifespan is very similar to keep the population stable [75]. An as yet unknown biological system in B. napus controls the timing of whole plant senescence and accordingly lifespan. Thus, whilst senescence is the visible output of an internal biological clock, the number of pods, plant size and seed yield are the consequences of resource availability.

Materials and Methods
In an effort to understand better the physiological basis governing seed yield in oilseed rape, 35 cultivar lines of B. napus, showing substantive divergence in seed yield per plant were assessed over two seasons for 133 different traits, covering developmental, architectural and nutritional parameters, generating a powerful dataset for exploring the factors underpinning yield.

Site Description and Meteorological Conditions
The experiment was carried out over two growing seasons, Weather parameters (rainfall, maximum and minimum temperature) for the experimental period (2009-2011) and the 30-year average were taken from the Rothamsted Research meteorological station ( Figure S1). There was a large variation in the rainfall pattern between the two years; e.g., June rainfall in Year 2 was approximately four-fold higher than the same period in Year 1.

Crop Management and Field Trial Design
The field trials were arranged in a randomised complete block design and used two replicate plots per treatment. In Year 1 (2009-2010), two soil N supplies were investigated, low (= residual) N (~18 kg ha −1 ) and high N (148 kg ha −1 ). The high N treatment was applied in a split dose application; 30 kg ha −1 N were applied in the autumn (29 September 2009), and 50 kg ha −1 N were subsequently applied in the spring (12 March 2010 and 07 April 2010) to a maximum concentration of 148 kg ha −1 total available N. The Year 2 trial used only residual nitrogen (22 kg ha −1 ) across the whole plot. In Year 1, seeds were drilled on 28 August 2009, and each replicate plot (1 × 1.5 m) was drilled in four rows with 120 seeds per plot. In Year 2, 125 plots were drilled on 6 September 2010, but due to bad weather conditions, the drilling of the remaining 307 plots had to be postponed by one week; ANOVA showed that no significant differences were found in yield parameters as a result of this split drilling time Year 2 plots were 1.8 × 1.5 m, drilled with 160 seeds per plot in 14 rows. Pre-germination herbicides and fungicides were applied to both trials at the manufacturers' recommended rates with pests, diseases and volunteer cereals/weeds controlled according to standard agronomic practice. In Year 1, all of the 61 winter cultivars planted were sampled, and this included 48 winter oilseed rape (WOSR) lines, 8 fodder/forage/salad kales, 4 swedes and 1 synthetic line. In Year 2, of the 84 WOSR varieties planted, a subset of 35 lines was sampled; being selected if they were also present in Year 1 of the trial and had transcriptome sequencing information available.

Chlorophyll Measurements
Chlorophyll and carotenoid concentration were analysed in early and canopy leaf samples as a proxy measure of senescence, and in Year 2, additional sampling was made from leaf disks from upper and lower leaves and whole pods sampled at set intervals throughout the growing season, March-June and May-July, respectively. The early leaves represented the earliest not fully-expanded leaves, and the canopy leaves were the youngest true leaves on the plant when sampled on 2 February 2011. The lower leaf (sampled 11 March, 6 April and 3 May 2011) was the 5th leaf up from the base of the main raceme, and the upper leaf (sampled 3 May 2011 and 6 June 2011) was that immediately subtending the lowermost pod. All pods were sampled from the main raceme. Chlorophyll was extracted and analysed in the leaf and pod material using the methods described in Bennett et al. [29]. The progression of chlorophyll and carotenoid loss from the leaves and pod walls was calculated by determining the % difference from the previous sampling date. The pod photosynthetic period was calculated as the time from the start of flowering until pod wall chlorophyll fell below 10% of its maximum.

Leaf Mineral Analysis
Early and canopy leaves from ten plants per plot were dried overnight at 80 • C and stored in a controlled environment room at 15 • C, 15% humidity. Before milling, the leaf samples were re-dried overnight at 60 • C. Plots were analysed in duplicate for their mineral content, and a Kjeldahl digestion was used to analyse total calcium (Ca), potassium (K), magnesium (Mg), manganese (Mn), nitrogen (N) and phosphorus (P) using the methods described in Broadley et al. [76]. Aqueous extracts were subsequently used to measure the nitrate concentration [76].

Plant Development
The plots were walked periodically between March-July. Growth stage [77] was visually scored over the whole plot and graded accordingly based on 25% of the plants in a plot reaching a particular stage. Flowering duration was defined by the difference between the start date of flowering (when 25% of the plot had reached GS60) and the date when >95% of the plants within the plot had no more flowers.
Pod development was also visually scored across the whole plot and categorized per plot on the following scale: 0: plot still in flower; 0.5: immature pods; 1: mature green pods fully expanded; 1.5: 25% yellow pods; 2: 50% yellow pods (yellow/green); 2.5: 75% yellow pods; 3: 100% yellow pods; 3.5: 50% brown pods; 4: 100% brown pods just before shatter. A plot was recorded as shattering when 25% of the pods in the whole plot were seen to have dehisced.

Plant Architecture
Several plant architectural traits were measured at the time of harvest. Five plants per plot were selected at random from each plot and were returned to the laboratory for assessment of the following traits: plant height, length of the raceme containing pods, the total number of branches and number of branches arising from the main raceme only. Pod spacing on the main raceme was calculated by dividing the length of the raceme containing pods by the number of pod sites on the raceme, i.e., including sites where pods had subsequently aborted. Raceme area was calculated using a calliper to measure two raceme diameters 10 cm above the ground at 90 • to each other and multiplying these values together. Raceme stiffness was visually scored on a scale of 1-9 where: 1, all plants prostrate; 2, main stems leaning at 20 • -30 • to the horizontal; 3, main stems leaning at 45 • to the horizontal; 4, main stems leaning at 70 • -80 • to the horizontal; 5, main stems leaning slightly, but branched canopy bent over at 10 • to the horizontal; 6, main stems leaning slightly, but branched canopy leaning at 20 • -30 • to the horizontal; 7, main stems leaning slightly, but branched canopy leaning at 45 • to the horizontal; 8, main raceme erect with branched canopy leaning at 70 • -80 • to the horizontal; and 9, all plants completely erect. The above-ground biomass was calculated from the dry weight of 5 plants per plot at harvest; the vegetative mass was calculated from the above-ground biomass minus the seed yield per plant.

Pod Physiology
Several measures of pod physiology were taken at harvest from pods on the main raceme at Stage 4 (brown and ready to dehisce) just before the plots were harvested. In each case, ten replicate pod samples were taken per plot, and there were four plots per line. Pod length was determined by measuring from the start of the bivalve to the end of the beak, and the seed cavity length was also determined (the length of the pod minus the beak). The seeds were removed from the pod and aborted seeds discarded before the remainder was counted. The seed packing density was determined by dividing the pod cavity length by the number of seeds per pod. The weight of seeds per pod was determined by placing the seeds in a controlled environment room at 10 • C, 15% humidity until they had reached a constant weight. Seed area was calculated using a MARVIN seed scanner (Gta Sensors Gmbh-Germany).

Yield
At harvest, ten plants per plot were bagged and dried at 20 • C. The plants were then threshed and winnowed before the seeds were placed in a controlled environment room at 15 • C, 15% humidity until they had reached a constant weight. Seed yield was subsequently determined as total seed weight per plant. To calculate the MARVIN Thousand Grain Weight (TGW), 100 harvested seeds per plot (post threshing, winnowing and storage) were placed into the MARVIN seed scanner with a ±10 seed tolerance; any seed that was not representative of the batch (i.e., damaged or discoloured) was removed by hand prior to the analysis, and the seeds were subsequently weighed. The following calculations were used to determine the MARVIN TGW, TGW, number of seeds per plant, number of pods per plant and harvest index.

MARVIN TGW :
Weight of seeds (g) Number of seeds × 1000 (g) TGW : Seed weight per pod (g) Number of seeds per pod × 1000 (g) Number of seeds per plant : Seed yield per plant (g) ÷ TGW (g) 1000 Number of pods per plant : Seed yield per plant (g) Seed weight per pod (g) Harvest index : Seed yield per plant (g) Total above ground biomass (g)

Seed Composition
Seed oil, fatty acid, protein, moisture and glucosinolate content were determined by Near Infrared Spectroscopy (NIRS) using the methods described in Kelly et al. [78]. In the current study, seed samples from five plants per plot were analysed.

Statistics
Statistical analyses were performed using the statistical software package R (R Core Team, Florham Park, New Jersey, USA, 2013), except for the yield mean per plant and standard error of the mean, which were calculated using IBM SPSS version 19 (IBM Corp., Armonk, NY, USA). For the Pearson's correlation analysis of all traits, with 9180 pairwise tests, a Bonferroni multiple comparison correction was applied making the cut-off value for significant correlations ±0.496. To determine the main traits influencing seed yield per plant and above-ground biomass, linear regression models were developed using forward selection based on the 133 traits listed in Supplementary Table S1. Starting with the null model (no traits included), forward selection identifies the most significant trait to include in the model based on an F-test. The process was repeated multiple times, at each stage selecting the most significant model with the n + 1 variable compared to the current model with n variables based on the F-test. The process was stopped when none of the remaining traits significantly improved the estimation of the response variable at the 5% significance level. Some traits, including yield, underwent a log transformation prior to inclusion in the model to improve the linear relationship between the response variable and the other traits. After selecting the traits identified in Table 3 and standardising, the traits and lines underwent hierarchical clustering using Ward linkage, to compile the dendrogram, heat map and scatter diagram (Figures 4 and 6).

Conclusions
We have developed a forward stepwise multiple linear regression model to determine the traits most important in influencing seed yield per plant in B. napus. From the 133 traits recorded, that could have potentially influenced seed yield, above-ground biomass and protein were found to be the most important factors, together accounting for 94.36% of the variation in seed yield. Increasing our understanding of resource allocation strategies could lead to improved plant breeding via marker-assisted selection, and this would have important consequences for crop breeding targets and meeting future food security demands. It would appear that year-on-year environmental weather conditions have a bigger impact than soil conditions, since the Year 2 model fitted to Year 1 data was less good than the same model fitted to different soil nitrogen regimes in Year 1. However, the refined model developed in Year 2 included more informative parameters that provided more robust indicators of yield. We therefore propose that this model would require adjustment for climate and farm-specific soil mineral content, but that it could provide the basis of a model for oilseed rape yield across the country.
Overall, our findings demonstrate that there is a diverse array of resource allocation strategies within the B. napus gene pool, and different approaches will need to be applied depending on which yield components are to be targeted. Whilst biomass and protein are the main contributors towards the variation observed in seed yield, the traits that underpin these variables, such as raceme width for biomass and early leaf mineral concentration, could provide useful early indicators of seed yield, as well as potential indirect targets for improving yields.
Supplementary Materials: The following are available online at www.mdpi.com/2073-4395/7/2/31/s1: Figure S1: Monthly figures for (a) total rainfall and (b) mean maximum and minimum temperature at Rothamsted Research; Table S1: Analysis of variance table for (a) variety and nitrogen terms on Year 1 data and (b) variety and year terms on data from both years; Table S2: Mean, range (min and max), standard deviation (SD) and coefficient of variation (CV) of the physiological and biochemical traits measured in 35 B. napus lines.