Deep Phenotyping of Yield-Related Traits in Wheat

The complex formation of grain yield (GY) is related to multiple dry matter (DM) traits; however, due to their time-consuming determination, they are not readily accessible. In winter wheat (Triticum aestivum L.), both agronomic treatments and genotypic variation influence GY in interaction with the environment. Spectral proximal sensing is promising for high-throughput non-destructive phenotyping but was rarely evaluated systematically for dissecting yield-related variation in DM traits. Aiming at a temporal, spectral and organ-level optimization, 48 vegetation indices were evaluated in a high-yielding environment in 10 growth stages for the estimation of 31 previously compared traits related to GY formation—influenced by sowing time, fungicide, N fertilization, and cultivar. A quantitative index ranking was evaluated to assess the stage-independent index suitability. GY showed close linear relationships with spectral vegetation indices across and within agronomic treatments (R2 = 0.47–0.67 ***). Water band indices, followed by red edge-based indices, best used at milk or early dough ripeness, were better suited than the widely used normalized difference vegetation index (NDVI). Index rankings for many organ-level DM traits were comparable, but the relationships were often less close. Among yield components, grain number per spike (R2 = 0.24–0.34 ***) and spike density (R2 = 0.23–0.46 ***) were moderately estimated. GY was mainly estimated by detecting total DM rather than the harvest index. Across agronomic treatments and cultivars, seasonal index rankings were the most stable for GY and total DM, whereas traits related to DM allocation and translocation demanded specific index selection. The results suggest using indices with water bands, near infrared/red edge and visible light bands to increase the accuracy of in-season spectral phenotyping for GY, contributing organ-level traits, and yield components, respectively.


Introduction
In wheat, grain yield formation is influenced by various growth factors, including sowing date, fungicide intensity, N fertilization, and genotypic potential [1][2][3][4][5][6][7][8]. However, treatment effects strongly differ by environments, and interactions between these agronomic measures are relevant [2,[9][10][11][12]. Therefore, numerous field trials are required to assess the optimum level of agronomic treatments in specific cropping regions. On the other hand, plant breeders are compelled to screen large numbers of genotypes for their yield potential under field conditions [13]. However, owing to the time-and cost-intensive determination of further traits, most of the trials focus on grain yield (GY) and a few other traits, such as plant height, anthesis date, and disease incidence. Thus, the primary effects of the considered treatments are often not addressed, which, however, could optimize the breeding process [14] or allow cultivar-specific fertilization and fungicide strategies [15,16]. Tested in common trials, these strategies can be further optimized when accounting for the characteristics of the cultivar and the cultivar group, i.e., line and hybrid cultivars. The use of non-destructive high-throughput phenotyping techniques could improve the understanding of GY-related mechanisms and allow were 1.5 m wide and 6.4 m long. The cultivars differed in yield components, disease susceptibility, and phenology. The sowing date was September 28 for SD1, and October 23 for Cont and RF. Each cultivar was treated at two N levels, applied in two doses (N1: 60 kg N ha −1 ; N2: 120 kg N ha −1 ) within each of the three MPs, and in four replicates per N level, resulting in four replicates for each factor combination (MP*N*Cv). Foliar fungicide was applied twice in Cont and SD1 plots but not in RF plots. The study year was characterized by overall favorable weather conditions with above-average temperatures and radiation during March, May and June in spite of the below-average conditions in April. Conversely, precipitation was above-average in April and July but below-average in May and June, leading to mild drought stress and accelerated senescence in some cultivars during the grain filling phase. See [25] for details on the field trials.
Biomass was sampled at mid-flowering as well as at physiological maturity and manually separated into flag leaves, flag leaves-1 (second leaf layer from above), 'other leaves' (remaining leaves), culms including leaf sheaths, and spikes. After final sampling, all plots were harvested (August 01-02) using a combine harvester. In addition to organ-level DM traits, derived plant traits were calculated, including yield components, the harvest index (HI), post-anthesis assimilation (PAA), dry matter translocation (DMT), DMT efficiency (DMTEff), contribution of PAA to grain filling (CPAA) as well as N utilization efficiency (NutEff) and N use efficiency (NUE) for total dry matter and GY at maturity, respectively. Although NutEff and NUE included total N uptake (Nup) and fertilized N in the calculation, they represent normalized DM traits. Additionally, anthesis date and plant height were included. Refer to [25] for details and Table S1 for a list of all traits.

Spectral Measurements and Data Preparation
Spectral measurements were performed on 10 measurement days during the main growth stages from leaf development at end of March until hard dough ripeness in the middle of July ( Table 1). The measurements were conducted with the PhenoTrac 4 mobile sensor platform [42], using a hyperspectral bidirectional passive spectrometer (tec5, Oberursel, Germany), measuring at a nominal resolution of 3.3 nm between 300 and 1000 nm. The distance to the canopy was approximately 80 cm, and plot boundaries were excluded. A recording frequency of 5 Hz along with the RTK-based localization (real-time kinematic global positioning system; Trimble, Sunnyvale, CA, USA) allowed gapless coverage of the plots. The spectra were smoothed using a five-band moving average filter [44] to remove spectral noise. According to previous studies and similar to [47], 48 SVIs were selected from literature (Table S1). The indices were grouped per included spectral range (visible light VIS, red edge RE and near-infrared NIR), with VIS < 700 nm, 'extended' RE: 700-765 nm and NIR > 765 nm (Table S2, Figure 1). The upper RE boundary was higher than that in the common definition to include indices with a NIR/RE band closer to the RE than the normalized difference vegetation index (NDVI; NIR = 780 nm), because the . The spectra are colored, indicating differences in grain yield from the lowest yield (yellow) to the medium yield (green) and the highest yield (blue).

Statistical Analysis
For each sampling date, SVIs were tested using simple linear regression with the DM traits across the values of the four replicates. The data analysis was conducted in R 3.4, using the lmfunction (R Core Team, 2017). The coefficient of determination (R 2 ) was used to compare the relationships. Significance levels correspond to p < 0.001 (***), p < 0.01 (**) and p < 0.05 (*). To assess the influence of the contributing treatments on the trait estimation, the relationships were compared for different data subsets to consider different treatment combinations either in agronomic factor trials or in breeding yards: (i) full data, (ii) combined Cont and SD1 data ('Cont_SD1′), (iii) combined Cont and RF data ('Cont_RF'), and (iv) six main plot*N level (MP*N) combinations ( Figure 2). Considering the MP*N blocks as possible trial environments created by the main plot (MP) treatments in combination with N levels for assessing genotypic variation, as used by plant breeders, the results of these blocks were averaged and compared for the aggregated, averaged data (n = 6). Testing using different datasets aimed at assessing the potential for trait estimation under various influencing conditions. Thus, conditions in (iv) can simulate conditions for phenotyping genotypic variation, whereas those in i-iii are referred to as 'agronomic' conditions. The relationships were compared regarding the estimation potential by trait, the index ranking and optimum measurement stages. . The spectra are colored, indicating differences in grain yield from the lowest yield (yellow) to the medium yield (green) and the highest yield (blue).

Statistical Analysis
For each sampling date, SVIs were tested using simple linear regression with the DM traits across the values of the four replicates. The data analysis was conducted in R 3.4, using the lm-function (R Core Team, 2017). The coefficient of determination (R 2 ) was used to compare the relationships. Significance levels correspond to p < 0.001 (***), p < 0.01 (**) and p < 0.05 (*). To assess the influence of the contributing treatments on the trait estimation, the relationships were compared for different data subsets to consider different treatment combinations either in agronomic factor trials or in breeding yards: (i) full data, (ii) combined Cont and SD1 data ('Cont_SD1 ), (iii) combined Cont and RF data ('Cont_RF'), and (iv) six main plot*N level (MP*N) combinations ( Figure 2). Considering the MP*N blocks as possible trial environments created by the main plot (MP) treatments in combination with N levels for assessing genotypic variation, as used by plant breeders, the results of these blocks were averaged and compared for the aggregated, averaged data (n = 6). Testing using different datasets aimed at assessing the potential for trait estimation under various influencing conditions. Thus, conditions in (iv) can simulate conditions for phenotyping genotypic variation, whereas those in i-iii are referred to as 'agronomic' conditions. The relationships were compared regarding the estimation potential by trait, the index ranking and optimum measurement stages.
Agronomy 2020, 10, x FOR PEER REVIEW 5 of 20 Figure 2. Datasets used for testing trait~index relationships. The three 'agronomic' datasets correspond to the data of the whole trial (full data), the combined main plots control (Cont) and 'reduced fungicide' (RF -> Cont_RF) as well as the combined main plots Cont and sowing date 1 (SD1 -> Cont_SD1). Main plot*N (MP*N) represents the testing within the subplots as the interactions of main plots (MP) and N fertilization level (N1: 60 kg N ha −1 ; N2: 60 kg N ha −1 ). N denotes the number of included data points, which slightly differed after the removal of some outlier plots for all dates.
To overcome the influence of differing growing conditions and the date-specific index ranking, indices were quantitatively ranked by their normalized performance for each trait in each dataset. The across-dates (n = 10) mean and maximum coefficients of determination (R 2 ) values of each index were normalized to the trait-specific average mean and maximum R 2 from all SVIs within each of the three 'agronomic' datasets and the MP*N data, respectively. A value > 1 indicated a comparative advantage of the index for the trait under consideration. Thereafter, the relationships (R 2 ) and the index rankings were compared between datasets. Consequently, the within-dataset mean and maximum rankings were summed up (i) across the three 'agronomic' datasets and (ii) for the MP*N dataset, respectively, to achieve a more robust ranking across contributing treatments. The meanand maximum-based rankings were combined by summing the rankings for identifying one index per trait. Considering a selection of indices robust towards date-specific suitability as more important, the mean-based ranking was double-weighted. These weighted mean/maximum-rank sums (WMMRS) were used to identify one trait-specific optimum index from the 'agronomic' approach, irrespective of the R 2 level achieved, with WMMRS of < 9 indicating below-average and WMMRS of > 9 above-average index performance for a specific trait. The performance of the WMMRS-based best indices was compared over time both in the agronomic datasets and the MP*N dataset to validate the transferability of the index section between agronomic and breeding trials. In addition, the stability of the agronomic rankings was compared with the WMMRS-rankings of the MP*N approach using Spearman's rank correlation coefficient.

Optimized Index and Date Selection Considering the Contributing Treatments
The best relationships found for all trait * index combinations indicated considerable differences in the index suitability.

Grain Yield
Both for the full data (Figures 3a, 4a and 5a), and the average results from the six MP*N combinations (Figures 3b, 4a and 5a), a strong advantage is visible from all NIR-based water band indices for grain yield (GY; DM grain) and total DM. In contrast, several other indices performed similarly well in the Cont_SD1 and Cont_RF subsets ( Figure S1). Still, the NWI_5 ranked highestwith the weighted mean/max rank sum ranking (WMMRS) of 15 in the agronomic ranking being by two thirds better than the average of the tested index ensemble, and clearly above the best non-water band index (R780_740; WMMRS = 11; Figure 7 and 8). In contrast, the NWI_5 and the best NIR/RE indices ranked similarly for total DM at maturity with a slight advantage of the NDRE_770_750 (WMMRS = 13). In all four data subsets, one of both traits was the best estimated among the direct plant traits. The NWI_5 reached coefficients of determination (R 2 ***: p < 0.001) of 0.67, 0.47, 0.65, and 0.50 for GY in the full dataset, Cont_SD1, Cont_RF and from the average of the six MP*N combinations, respectively (Table 2; Figure 4a ), corresponding to RMSE-values of 404-549 kg ha -1 (not shown). For all subsets, the best measurement dates were either 07/01 (month/day; early dough MP*N approach RF_N1 RF_N2 Cont_N1 Cont_N2 SD1_N1 SD1_N2

Agronomic approach
Full data, n = 139 Cont_SD1, n = 91 Cont_RF, n = 96 Figure 2. Datasets used for testing trait~index relationships. The three 'agronomic' datasets correspond to the data of the whole trial (full data), the combined main plots control (Cont) and 'reduced fungicide' (RF -> Cont_RF) as well as the combined main plots Cont and sowing date 1 (SD1 -> Cont_SD1). Main plot*N (MP*N) represents the testing within the subplots as the interactions of main plots (MP) and N fertilization level (N1: 60 kg N ha −1 ; N2: 60 kg N ha −1 ). N denotes the number of included data points, which slightly differed after the removal of some outlier plots for all dates.
To overcome the influence of differing growing conditions and the date-specific index ranking, indices were quantitatively ranked by their normalized performance for each trait in each dataset. The across-dates (n = 10) mean and maximum coefficients of determination (R 2 ) values of each index were normalized to the trait-specific average mean and maximum R 2 from all SVIs within each of the three 'agronomic' datasets and the MP*N data, respectively. A value > 1 indicated a comparative advantage of the index for the trait under consideration. Thereafter, the relationships (R 2 ) and the index rankings were compared between datasets. Consequently, the within-dataset mean and maximum rankings were summed up (i) across the three 'agronomic' datasets and (ii) for the MP*N dataset, respectively, to achieve a more robust ranking across contributing treatments. The mean-and maximum-based rankings were combined by summing the rankings for identifying one index per trait. Considering a selection of indices robust towards date-specific suitability as more important, the mean-based ranking was double-weighted. These weighted mean/maximum-rank sums (WMMRS) were used to identify one trait-specific optimum index from the 'agronomic' approach, irrespective of the R 2 level achieved, with WMMRS of < 9 indicating below-average and WMMRS of > 9 above-average index performance for a specific trait. The performance of the WMMRS-based best indices was compared over time both in the agronomic datasets and the MP*N dataset to validate the transferability of the index section between agronomic and breeding trials. In addition, the stability of the agronomic rankings was compared with the WMMRS-rankings of the MP*N approach using Spearman's rank correlation coefficient.

Optimized Index and Date Selection Considering the Contributing Treatments
The best relationships found for all trait * index combinations indicated considerable differences in the index suitability.

Grain Yield
Both for the full data (Figures 3a, 4a and 5a), and the average results from the six MP*N combinations (Figures 3b, 4a and 5a), a strong advantage is visible from all NIR-based water band indices for grain yield (GY; DM grain) and total DM. In contrast, several other indices performed similarly well in the Cont_SD1 and Cont_RF subsets ( Figure S1). Still, the NWI_5 ranked highest-with the weighted mean/max rank sum ranking (WMMRS) of 15 in the agronomic ranking being by two thirds better than the average of the tested index ensemble, and clearly above the best non-water band index (R780_740; WMMRS = 11). In contrast, the NWI_5 and the best NIR/RE indices ranked similarly for total DM at maturity with a slight advantage of the NDRE_770_750 (WMMRS = 13). In all four data subsets, one of both traits was the best estimated among the direct plant traits. The NWI_5 reached coefficients of determination (R 2 ***: p < 0.001) of 0.67, 0.47, 0.65, and 0.50 for GY in the full dataset, Cont_SD1, Cont_RF and from the average of the six MP*N combinations, respectively (Table 2; Figure 4a), corresponding to RMSE-values of 404-549 kg ha -1 (not shown). For all subsets, the best measurement dates were either 07/01 (month/day; early dough ripeness) or 06/26 (milk ripeness) for GY (Table 2). Figure 5a depicts the seasonal R 2 -values in the four datasets for GY from all indices and the highlighted WMMRS-based best index NWI_5. The blue lines demonstrate a clear advantage of the NIR-based water band indices over all other indices during the most suited phase at grain filling. For all datasets, the phase of booting/ear emergence/anthesis was less suited. For the full data, Cont_SD1, and MP*N, already moderate relationships (R 2 ≈ 0.30-0.40 ***) are visible during tillering and stem elongation. However, this was not confirmed in Cont_RF, and the index ranking was not stable in this phase. For total DM (Figure 5b), all datasets except Cont_RF confirmed the advantage of the water band indices only during the later grain filling with a slight outperformance over many other indices. Instead, the rank-best index NDRE_770_750 was more suited during the vegetative phase in all datasets. R 2 -values (≈0.50 ***) were highest in Cont_SD1 already during stem elongation, whereas as for GY, only the post-anthesis phase exhibited good relationships in Cont_RF and MP*N. With GY being a multiplicative function of total DM and its relative allocation to the grain (harvest index; HI), relationships with HI were tested as well, but showed low potential (Figure 5c). Only the R787_765 yielded useful relationships at dough ripeness in Cont_RF (R 2 = 0.37 ***) and in the full data (R 2 = 0.27 ***), as well as in MP*Nat anthesis, but not within all MP*N blocks ( Figure 4c).    Maximum coefficients of determination (R 2 ) found for all index*trait combinations from 10 measurement dates. Gray lines delimit the index and trait groups. See Figure S1 for results in Cont_SD1 and Cont_RF.    Table 2.) with rank-based best indices. Colored thin lines correspond to linear regressions in the MP*N subsets, dashed blue and red lines to Cont_SD1 and Cont_RF, respectively, and dashed gray lines to the full datasets. R 2 -values are colored accordingly. Refer to Table 2 for the significance level.

Further Direct DM Traits
In all four datasets, the level of the best relationships for organ-level DM traits was comparable for almost all NIR/VIS and RE-based indices ( Figure 3 for the full data and MP*N; Figure S1 for Cont_RF and Cont_SD1). The R 2 -level was markedly higher in the MP*N subset and Cont_SD1 than in the full data and Cont_RF, and tended to be higher for maturity traits than that for anthesis traits. Notably, the DM of stems and spikes and total DM at anthesis were hardly detectable in the full data, Cont_SD1 and Cont_RF (R 2 always < 0.20; Figure 8). The DM of leaves was better detected, both in the 'agronomic' datasets and in MP*N (Figure 3 and 6; Table 2.). At anthesis, the relationships were closer for the flag leaf and flag leaf-1 than those for 'all leaves' and the lower leaf layer; however, they  Table 2. Best spectral vegetation index (SVI) for each trait based on the weighted mean/max rank sums (WMMRS) from the 'agronomic' approach' and its highest seasonal R 2 -value reached on the optimum date (month/day) in the different datasets: 'Full data', 'Cont_SD1 , 'Cont_RF' as well as the six main plot*N level combinations (MP*N). Considering the six MP*N blocks as different environments, the results for MP*N are based on the average R 2 matrices from the six MP*N subsets. Due to the slightly different number of data points, the significance for MP*N was re-calculated based on the significance thresholds in the six subsets. Trait abbreviations: dry matter (DM), anthesis (Ant), maturity (Mat), harvest index (HI), post anthesis assimilation (PAA), contribution of post anthesis assimilation to grain filling (CPostAA), DM translocation (DMT), DM translocation efficiency (DMTEff), grain number per spike (GNS), thousand kernel weight (TKW), nitrogen utilization efficiency (NutEff), and nitrogen use efficiency (NUE). The significance levels correspond to p < 0.001 (***), p < 0.01 (**) and p < 0.05 (*).

Seasonal Best R 2 -Value
Optimum Date  In all four datasets, the level of the best relationships for organ-level DM traits was comparable for almost all NIR/VIS and RE-based indices ( Figure 3 for the full data and MP*N; Figure S1 for Cont_RF and Cont_SD1). The R 2 -level was markedly higher in the MP*N subset and Cont_SD1 than in the full data and Cont_RF, and tended to be higher for maturity traits than that for anthesis traits. Notably, the DM of stems and spikes and total DM at anthesis were hardly detectable in the full data, Cont_SD1 and Cont_RF (R 2 always < 0.20). The DM of leaves was better detected, both in the 'agronomic' datasets and in MP*N (Figures 3 and 6; Table 2.). At anthesis, the relationships were closer for the flag leaf and flag leaf-1 than those for 'all leaves' and the lower leaf layer; however, they were approximately similar for maturity and anthesis leaf DM traits. Indices including green bands (GNDVI and R780_R550) ranked best for 'all leaves', 'other leaves' and 'flag leaf-1 . The index rankings for the most direct DM traits were comparable between the 'agronomic' and MP*N approaches. better relationships with respect to grain DM than for total DM and was best assessed in Cont_SD1 (R 2 = 0.46; Table 2). Thus, for a given red edge inflection point (REIP)-value, NutEff was lower in N2 than that in N1 and lower in RF than in the other main plots, respectively (not shown).

Derived DM Traits
Neither for the 'agronomic' datasets nor for MP*N, any index assessed PAA, CPostAA, DMTEff, and DMT on a useful level. Among the yield components, moderate relationships were found for grain number per spike (GNS) in all datasets from the BGI (R 2 = 0.24-0.34 from ear emergence to milk ripeness), as well as for the thousand kernel weight (TKW) only in the full dataset (R 2 = 0.25) and Cont_RF (R 2 = 43; both during early dough ripeness) from the NWI_2 ( Figure 6; Table 2). Spike density was better detected by the PSSR index in Cont_SD1 (R 2 = 0.46) and in MP*N (R 2 = 0.40) than in the full data and Cont_RF, with R 2 -values peaking always at early milk ripeness (06/21). As for GNS, the BGI ranked highest for yield per spike (Table 2.). Moderate relations were found for kernels per m 2 in all datasets, always being best during stem elongation (05/17); however, the NIR-based water band indices were better suited during grain filling in MP*N ( Figure 6). With the N use efficiency (NUE) relating GY or total DM to fertilized N, R 2 -values ( Figure 3) and index rankings were identical to those for GY and total DM, respectively, in the MP*N approach. In contrast, less close relationships were found in the full data (max. R 2 = 0.38 for NUE_grain and R 2 = 0.26 for NUE_total; Table 2)-only during later grain filling and with best relationships from the R787_765, RVSI, and TCARI_OSAVI indices. The 'internal' conversion efficiency, N utilization efficiency (NutEff), yielded better relationships with respect to grain DM than for total DM and was best assessed in Cont_SD1 (R 2 = 0.46; Table 2). Thus, for a given red edge inflection point (REIP)-value, NutEff was lower in N2 than that in N1 and lower in RF than in the other main plots, respectively (not shown).

Index Ranking According to Traits and Datasets
For most traits, a good agreement was observed between maximum-and mean-based index rankings (example for GY in Figure 7), indicating that indices with the highest potential on the best dates (maximum ranks) were also better suited on less favorable dates. Aiming at an unequivocal index ranking, both measures were combined to weighted mean-maximum-based rank sums (WMMRS) by summing both rank values.

Index Ranking According to Traits and Datasets
For most traits, a good agreement was observed between maximum-and mean-based index rankings (example for GY in Figure 7), indicating that indices with the highest potential on the best dates (maximum ranks) were also better suited on less favorable dates. Aiming at an unequivocal index ranking, both measures were combined to weighted mean-maximum-based rank sums (WMMRS) by summing both rank values.    Figure 8 shows the WMMRS-values calculated for all evaluated trait*index combinations, for the combined rankings from the three 'agronomic' datasets ('full data', Cont_SD1 and Cont_RF; a), as well as for the rankings from the MP*N data (b), which were multiplied by three for direct comparison at the same numeric level. For each trait, the value of nine corresponds to the average ranking across SVIs. Due to some strong upper outliers, all values >15 were colored in the same blue shading for a better contrast of the other values. Considered irrelevant, rankings for trait*index combinations that did not exceed a threshold of R 2 -values of 0.20 are not shown (white cells). For the following traits, no index exceeded this threshold for the agronomic datasets: The DM of spikes and stems as well as total DM at anthesis, and the derived DM traits PAA, CPAA, DMTEff, and DMT. The mean and maximum-based rankings (Figure 7) and the combined rankings confirm the clear outperformance of the NIR-based water band indices, particularly for GY (WMMRS > 13; Figure 8), but also high rankings for total DM at maturity as well as for the yield components kernels per m 2 in both rankings, TKW, and yield per spike in the agronomic ranking. However, for the other direct DM traits, most NIR/VIS indices except the EVI, MCARI1, MCARI2, and the MTVI2, and most RE indices, except the R787_765, TCARI_OSAVI, MCARI, DD, and PSRI yielded clearly higher and mostly similar rankings. The R787_765 and the TCARI_OSAVI reached high rank sums (WMMRS > 17) for the NUE traits in the agronomic ranking but not in the MP*N ranking, and the R787_R765 for HI in the agronomic ranking. Overall, only few indices reached relevant relationships (R 2 > 0.20) in both rankings for most derived DM traits. In addition, this was observed partly for the direct DM traits in the agronomic approach, but the index rankings differed less in the MP*N approach. For each trait, the rankings from both matrices were correlated against each other for comparing the possibility of transferring the index selection optimized on one dataset to the other. As visible from the correlations between rankings (Table 3), the index rankings were relatively stable between both approaches for most direct DM traits, with Spearman's ρ < 0.80 only for stem DM at both sampling dates, and flag leaf DM at maturity. Among the other traits, stable rankings (ρ > 0.86 ***) were found for the yield components GNS, spike density, and kernels per spike, as well as for plant height. Moderately stable rankings (ρ = 0.70 ***−0.82 ***) were found for PAA, NutEff total, NutEff grain, and anthesis date, but substantially differing rankings for the other trait.

Derived DM Traits
Neither for the 'agronomic' datasets nor for MP*N, any index assessed PAA, CPostAA, DMTEff, and DMT on a useful level. Among the yield components, moderate relationships were found for grain number per spike (GNS) in all datasets from the BGI (R 2 = 0.24-0.34 from ear emergence to milk ripeness), as well as for the thousand kernel weight (TKW) only in the full dataset (R 2 = 0.25) and Cont_RF (R 2 = 43; both during early dough ripeness) from the NWI_2 (Figure 6; Table 2). Spike density was better detected by the PSSR index in Cont_SD1 (R 2 = 0.46) and in MP*N (R 2 = 0.40) than in the full  Table 3. Spearman's ρ and significance level of the correlations between the agronomic and the MP*N index rankings (Figure 8)

Discussion
This study aimed at assessing the influence of the optimized selection of SVIs and of measurement stages/dates on the estimation of various DM traits, which influence the formation of GY as a function of the contributing treatments.

In-Season Estimation of Grain Yield and Contributing DM Traits
Grain yield can only be indirectly estimated by spectral measurements-both with respect to the temporal shift and the detected information at the organ level [42]. Although SVIs are primarily influenced by leaf area, chlorophyll content and leaf area distribution [48], seasonal influences are important as well, particularly post-flowering, which is the most important phase for grain filling [8], although it is influenced by progressive senescence. Therefore, this study included multiple DM traits that contribute to grain filling for elucidating the GY~SVI relationships, and multiple measurement dates, given that the trial treatments influenced the traits in different phases [25]. Because of the effects of sowing date on the early development in terms of DM traits, canopy cover, and SVIs [49], and the positive yield effect of early sowing (SD1) [25], moderate relationships were found already during leaf development and tillering, but not in the datasets without varying sowing dates, corroborating results in a breeding population [26]. Moreover, given that R 2 -values again decreased to a minimum until ear emergence/anthesis, as also found in [26], this early discrimination confirms the spectral detection of early vigor [49,50], but should not be generalized for the prediction of grain yield. This seasonal pattern is in line with the previous GY estimation across the full trial data [51] and with the spectral estimation of the N traits [47]. Concerning the seasonal pattern in Cont_SD1, the SVI*date interaction ( Figure 5) confirms the relative weakness of water band indices during the early phase [49] for 'planar', un-saturated canopies, being in line with [47]. However, already at tillering, the differentiation with a good performance of the water band indices and of most RE indices, but the failure of NIR/VIS indices, possibly indicates the advantage of a stronger sensitivity of the former groups for overcoming saturation in dense canopies, as also observed for wheat breeding lines [26], N fertilization in wheat cultivars [51], drought-stressed wheat [39] and barley [52]. The usefulness of the RE indices was in line with previous studies on cultivar discrimination during grain filling [53,54] and with the analysis of the N traits in the same experiment, for which, however, water band indices ranked relatively lower [47]. The overall suitability of the early dough and milk ripeness stages indicates that delayed senescence increased GY under favorable maturation conditions, as frequently observed [26,28,31,38,47]. The close relationships between yield and water band SVIs have been obtained on similar sites with multi-year wheat and barley experiments [39,55] and were further evidenced for wheat as well under non-drought conditions [26,51]. Unlike for GY, notably, NIR/RE indices clearly outperformed the water band indices for total DM during the vegetative phase. In turn, the better suitability of the water band indices for GY during grain filling indicates an advantage for senescence-influenced canopies. Water band indices ranked clearly behind 'structural' NIR/VIS and RE indices for leaf DM (Figure 6), indicating that the detection of leaf area index (LAI) is not sufficient for GY estimation, and the advantage for GY may be related to the detection of canopy water mass or rather senescence status. During grain filling, leaf senescence was captured by RGB-imaging and correlated with all possible spectral band combinations in normalized vegetation indices. On all dates, indices that included water band information significantly correlated with the leaf senescence status and did so mostly better than NIR/VIS and RE-combinations (not shown). Thus, lower GY in RF than in Cont and in N1 than in N2 was associated with accelerated leaf senescence.
The index rankings for both multiplicative GY components, total DM and harvest index (HI), were not indicative for the index ranking for GY. In spite of a significant reduction in HI in RF compared to Cont by 9% [25], the HI was moderately detected only by the R787_R765 during later grain filling, when the HI is actually determined as a result of the ongoing assimilation and translocation of assimilates. These findings are similar to those found for grain and total N uptake and N harvest index, respectively [47]. From a temporal perspective, the reaction in pre-and post-anthesis components of GY-i.e., DM translocation (DMT) and post-anthesis assimilation (PAA), including reduced PAA without fungicide, increased PAA but reduced DMT and translocation efficiencies in response to N fertilization [25]-could not be retraced by any index, confirming the results for genotypic differences [26]. On the one hand, these traits are 'accumulated' over time, which apparently cannot be captured by 'snap-shot' measurements on single days. On the other hand, their destructive determination was complex, including the sampling of various traits both at anthesis and maturity and possible errors in determining the exact anthesis time, so that these traits may be less stably referenced [25]. While, similarly as for PAA, post-anthesis N uptake was not sufficiently detected, moderate estimations (R 2 = 0.26-0.33) had been achieved for N translocation [47].

In-Season Estimation of Yield Components
Kernel number per m 2 was the most relevant yield component for explaining genotypic variation in GY in the six MP*N subsets (R 2 = 0.65-0.85) [25]. Its moderate estimation in all datasets already during stem elongation from red edge indices was related to a similar R 2 -pattern as observed for total DM, possibly indicating an association between both traits [56]. Thus,  reported similar correlations for total DM and kernels per m 2 for wheat genotypes, but did not find useful relationships for spike density, GNS, and TKW. Instead, a more direct estimation is conceivable for spike density through the best-ranking pigment-specific simple ratio PSSR [57], which may detect the bright-colored spikes. In contrast, another study obtained good estimates only for spike density across environments but not for GNS and TKW [58]. Overall, likely indirect detections of most yield components must be carefully interpreted considering the growing conditions and contributing treatments.

Suitability of the R787_R765 and TCARI_OSAVI for the Agronomic Approach
Owing to the low reflection difference between both bands, the R787_R765 s index values were only slightly higher than one, but the index showed high rankings in the agronomic approach for several traits. This included HI, TKW, yield per spike, NUE and NutEff-along with the similarly performing TCARI_OSAVI. Despite the similar ranking, both indices use substantially different spectral bands (Figure 1), indicating that the initial index grouping based on the spectral regions is not sufficient for predicting similarities in the suitability to estimate the tested traits. The R787_R765 was previously found useful for detecting N concentration in grassland [59] and the TCARI_OSAVI for LAI-insensitive chlorophyll detection [60]. The correlation matrix between all band combinations and image-based leaf senescence status during grain filling (data not shown) indicates that the R787_R765 detected senescence too. While the index was positively correlated with the NDVI during the vegetative stages, the relationships turned negative during grain filling (data not shown). The negative relationships of R787_R765 (and inversely the positive relationships of the TCARI_OSAVI) with HI and its high rankings demonstrate its sensitivity to senescence, as introduced by the agronomic treatments, notably reduced fungicide [25], as well as to their concomitant effect on the plant traits-findings which largely agree with the analysis of the N traits [47]. In contrast, these traits were not reliably detected for the cultivar differences in the MP*N approach.

Stability of Index Rankings According to Dataset
This study aimed at comparing the index rankings retrieved from data dominated by variation introduced by 'agronomic treatments' to the index rankings based on data dominated by the difference between cultivars within MP*N subsets. Comparing the rankings on the same datasets holds the advantage of including the same environmental effects and possible errors during sampling and analysis, considering that environmental effects, especially on the derived DM traits, can be substantial [7]. The constant index rankings found for the important traits GY and total DM at maturity are especially promising, as is the case for most 'direct' traits (Table 3; green colored traits in Figure 9) and the corresponding N traits [47].
only slightly higher than one, but the index showed high rankings in the agronomic approach for several traits. This included HI, TKW, yield per spike, NUE and NutEff-along with the similarly performing TCARI_OSAVI. Despite the similar ranking, both indices use substantially different spectral bands (Figure 1), indicating that the initial index grouping based on the spectral regions is not sufficient for predicting similarities in the suitability to estimate the tested traits. The R787_R765 was previously found useful for detecting N concentration in grassland [59] and the TCARI_OSAVI for LAI-insensitive chlorophyll detection [60]. The correlation matrix between all band combinations and image-based leaf senescence status during grain filling (data not shown) indicates that the R787_R765 detected senescence too. While the index was positively correlated with the NDVI during the vegetative stages, the relationships turned negative during grain filling (data not shown). The negative relationships of R787_R765 (and inversely the positive relationships of the TCARI_OSAVI) with HI and its high rankings demonstrate its sensitivity to senescence, as introduced by the agronomic treatments, notably reduced fungicide [25], as well as to their concomitant effect on the plant traits-findings which largely agree with the analysis of the N traits [47]. In contrast, these traits were not reliably detected for the cultivar differences in the MP*N approach.

Stability of Index Rankings According to Dataset
This study aimed at comparing the index rankings retrieved from data dominated by variation introduced by 'agronomic treatments' to the index rankings based on data dominated by the difference between cultivars within MP*N subsets. Comparing the rankings on the same datasets holds the advantage of including the same environmental effects and possible errors during sampling and analysis, considering that environmental effects, especially on the derived DM traits, can be substantial [7]. The constant index rankings found for the important traits GY and total DM at maturity are especially promising, as is the case for most 'direct' traits (Table 3; green colored traits in Figure 9) and the corresponding N traits [47]. In contrast, rankings for weakly estimated traits in either of the approaches are less reliable, including traits related to the pre/post-anthesis contribution to grain filling (DMT, DMTEff, PAA and CPostAA), and HI (bottom left quadrant in Figure 9). Several traits, which were moderately estimated (max. R 2 > 0.3) in both approaches, showed substantially differing index rankings, including In contrast, rankings for weakly estimated traits in either of the approaches are less reliable, including traits related to the pre/post-anthesis contribution to grain filling (DMT, DMTEff, PAA and CPostAA), and HI (bottom left quadrant in Figure 9). Several traits, which were moderately estimated (max. R 2 > 0.3) in both approaches, showed substantially differing index rankings, including NUE_Mat_grain, stem DM at maturity, NutEff_grain, and anthesis date, as visualized in Figure 9 by blue-colored traits (correlations between rankings < 0.67) in the upper right quadrant. Consequently, these 'more specific' traits would require an index selection more optimized by contributing treatments and/or trial environments, whereas indices selected for green-colored traits are promising for a more robust application.

Conclusions
The previous use of spectral proximal sensing in field trials either for plant breeding or agronomic factor optimization focused on the in-season estimation of the grain yield potential. Although analyzed using only the data from one year, the present study extended spectral methods to further traits, which were rarely tested before, especially for all relevant growth stages.
With respect to the initial research questions, the following conclusions are drawn:(i) The index-based estimation of notably GY and its cumulative component, total DM, is promising for applying sensor-based phenotyping. Additional information is gained on further yield-related traits, yet generally at a lower accuracy.
(ii) The comparison of many indices should not only serve for recommending specific indices, but rather 'index types', as grouped by the included spectral regions. The mostly similar performance of most indices within groups indicates close relationships between several indices, making the index selection more robust. This is relevant for transferring the results to sensors differing in band number, placement or narrowness [51]. As in previous analyses, water band indices followed by red edge indices outperformed the NDVI for most traits.
(iii) Overall, the milk ripeness stage was the most promising; however, for estimating the effects of sowing date and N fertilization, as included in the agronomic data, vegetative stages were also indicative. The suitability of the late, senescence-influenced grain filling phase may have been improved by the overall favorable maturing conditions.
(iv) The different agronomic treatments affected the target traits in different ways and in different growth stages, consequently altering the optimum measurement stages and affecting the accuracies in the different datasets.The present results can contribute to the optimization of sensors, the selection of measurement dates and specific SVIs, and should be evaluated on further datasets. Notably, the optimization of drone-based sensors can boost proximal sensing methods for field trials [61], but requires further evaluation.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4395/10/4/603/s1: Table S1: List of plant traits considered in this study, grouped by trait groups. Table S2: List of spectral vegetation indices used in this study. 'R' denotes the reflection in indicated wavebands. Figure S1: The index suitability by target trait in the 'agronomic' blocks Cont_RF and Cont_SD1: Maximum coefficients of determination (R 2 ) found for each index x trait combination from 11 measurement dates. Gray lines delimit index and trait groups.
Author Contributions: L.P. and U.S. conceived and designed the experiment, L.P. conducted the experiment, L.P. analyzed the data, L.P. and U.S. wrote the manuscript. All authors have read and agreed to the published version of the manuscript.