High-Resolution Airborne Hyperspectral Imagery for Assessing Yield, Biomass, Grain N Concentration, and N Output in Spring Wheat

: Remote sensing allows fast assessment of crop monitoring over large areas; however, questions regarding uncertainty in crop parameter prediction and application to nitrogen (N) fertilization remain open. The objective of this study was to optimize of remote sensing spectral information for its application to grain yield (GY), biomass, grain N concentration (GNC), and N output assessment, and decision making on spring wheat fertilization. Spring wheat ( Triticum turgidum L.) ﬁeld experiments testing two tillage treatments, two irrigation levels and six N treatments were conducted in Northwest Mexico over four consecutive years. Hyperspectral images were acquired through 27 airborne ﬂight campaigns. At harvest, GY, biomass, GNC and N output were determined. Spectral exploratory analysis was used to identify the best wavelength combinations, the most suitable vegetation indices (VIs) and the best growth stages to assess the agronomic variables. The relationship between the spectral information and the agronomic measurements was evaluated by the coefﬁcient of determination (R 2 ) and the root mean square error (RMSE). The ability of the indices to guide fertilizer recommendation was assessed through an error analysis based on the N sufﬁciency index. GY was better assessed from the end of ﬂowering to the early milk stage by VIs based on the combination of bands from near infrared radiation/visible and from near infrared radiation/red-edge regions (R 2 > 0.6; RMSE < 700 kg ha − 1 ). N output was efﬁciently assessed by a combination of bands from near infrared radiation/red-edge at booting (R 2 > 0.7; RMSE < 9 kg N ha − 1 ). The GNC was better estimated by VIs combining bands in near infrared radiation/red-edge at early milk, but with great variability among the years studied. Some VIs were promising for guiding fertilizer recommendation for increasing GNC, but there was not a single index providing reliable recommendations every year. This study highlights the potential of remote sensing imagery to assess GY and N output in spring wheat, but the identiﬁcation of GNC responsive sites needs to be improved.


Introduction
Wheat (Triticum sp.) responds to nitrogen (N) fertilizer applications by increasing yield and/or grain protein concentration, so fertilization is a common practice among farmers throughout the world [1,2]. In fact, N mineral fertilizers applied to wheat crops account for 17% of total world N fertilizer consumption, the largest percentage of any crop [3]. In addition, excessive N application leads to N losses that contribute to climate change by boosting greenhouse gas emissions and to environmental pollution by increasing N concentration in soil and water [4]. Particularly, at the Yaqui Valley in northwestern Mexico, Assessing grain yield (GY) and GNC months in advance offers multiple opportunities for planning harvest and storage needs since it may save costs and optimize the use of facilities [10,27]. Site-specific prediction of GNC facilitates the implementation of selective harvesting, a technology that identifies areas with higher protein content to maximize the price premium [34]. However, in order to implement all these techniques, a highly reliable pre-harvest forecast of GY and GNC by remote sensing is required. In general, the best prediction of GY and GNC is obtained in late growth stages (i.e., grain filling) but before senescence because this causes substantial reflectance decay [22,25]. Biomass prediction by remote sensing may be used to provide critical inputs to biogeochemical models, which are of particular interest to build regional carbon balance and develop greenhouse gas emission policies [35,36]. Another relevant application of remote sensing that has received little attention is the prediction of the N that is exported out of the field in the grain at harvest (i.e., N output). The N output is most important for evaluating NUE, later used for comparison of genotype performance [37,38] or N balance at the farm or regional scale [39,40]. The error associated with the prediction of these crop variables at harvest is expected to be low; however, the challenges are to determine the best timing of image acquisition and reduce uncertainty.
The general objectives of this study were to (i) optimize canopy spectral information for its application to assess grain yield, biomass, grain N concentration, and N exported in grain (N output); and (ii) improve in-season decision-making on spring wheat fertilization. Specific questions were (i) what are the best indices for estimation of grain yield, biomass, GNC and N output; (ii) when is the best growth stage for assessing these variables; and (iii) how can this information be applied to guide fertilizer recommendations.

Field Experiment and Crop Data Collection
Data were collected in a field experiment conducted by the International Maize and Wheat Improvement Center (CIMMYT) during four wheat (Triticum turgidum L. ssp. durum) cycles, from 2014 to 2017. The experiment was located (27 • Figure 1). The climate in the region is semi-arid (Köppen climate classification Bsh) with variable precipitation rates ( Figure S1). The annual average precipitation is 280 mm and the daily average temperature 24 • C. The soils in the region are coarse sandy-clay, mixed with montmorillonite clay. The topsoil has an average pH of 7.7 and low organic matter content (≈1%). A detailed description of the region is provided in Meisner et al. [41].
The experiment follows a factorial design in blocks with two irrigation levels (two and four post-plant irrigations), two tillage levels (conventional and conservation tillage), and six N treatments (N0: 0; N1: 125 kg N ha −1 split as one-third at sowing and twothirds as topdressing; N2: 250 kg N ha −1 split as one-third at sowing and two-thirds as topdressing; N1b: 125 kg N ha −1 split as two-thirds at sowing and one-third as topdressing; N2b: 250 kg N ha −1 split as two-thirds at sowing and one-third as topdressing; and N2c: 250 kg N ha −1 applied at sowing) ( Figure 1c). The conventional tillage treatment was carried out by incorporating crop residue through multiple disking and then bed formation, and after irrigating the wheat was sown. The conservation tillage is a type of conservation agriculture that consisted in leaving the crop residue on the surface, and having minimum soil disturbance when applying fertilizer in a band and sowing the wheat crop when the soil was dry, and then irrigating. Initially, the plots were replicated three times and split into two residue levels (leaving all crop residues in the field versus removing 40% of the residues), but since no effect was observed in the agronomic variables studied and due to the complexity of interpreting interactions, plots were considered as replications of the same treatment; therefore, for all treatments combinations were six replications (Figure 1c). standard methods 39-10 and 46-11A [42]. The N output (kg N ha⁻ 1 ) was calculated as th product of GY times GNC. At wheat maturity, 50 plants from each plot were harveste and separated into plant components (grain and the rest of the aerial biomass) dried a 75°C for 48 h and weighed. The harvest index (grain/(grain + aerial biomass)) was calcu lated and used to obtain the biomass from the GY of each plot. The durum wheat cultivar CIRNO C2008 was sown at 120 kg seed ha⁻ 1 , planting tw rows of wheat 25 cm apart on top of 80 cm beds. Phosphorous fertilizer was applied befor sowing to ensure P availability at a rate of 46 kg P2O5 ha −1 as triple super phosphate. Weed In the first irrigation level, two post-plant irrigations were applied during the season (flag leaf sheath extending (growth stage (GS) 41) and kernels watery ripe (GS71)) and in the second level four irrigations (tillering (GS23)), flag leaf just visible (GS37), beginning of flowering (GS61) and grain filling (GS73) (Figure 2). Irrigation water was uniformly delivered by furrows and the water applied was ≈120 L m −2 at sowing, and ≈100 L m −2 in each post-plant irrigation event. Fertilizer was applied as urea (46% N) before irrigating at sowing, and when there was a topdressing it was applied at GS41 in the two post-plant irrigation treatments and at GS37 in the four post-plant irrigation treatments ( Figure 2). Due to the differences in wheat development associated with water availability, the crop cycle under two post-plant irrigations was shorter than the four post-plant irrigations.
The plot size was 6.4 m wide (eight beds, each 0.80 m wide) by 10 m long. The sowing of all treatments occurred on the same date, at the end of November or the beginning of December, which is within the optimum planting date for the region (Table S1). Harvest of the central section of each plot (8 m 2 ) was done by an experimental combine (Wintersteiger, Ried im Innkreis, Austria) in late April or early May, depending on the year, and the wheat GY (kg ha −1 ) was recorded. A grain sub-sample was taken for laboratory quality analysis, where GNC (%N) and moisture content (%) were determined by NIR spectroscopy (NIR Systems 6500, Foss, Hilleroed, Denmark) calibrated according to official AACC standard methods 39-10 and 46-11A [42]. The N output (kg N ha −1 ) was calculated as the product of GY times GNC. At wheat maturity, 50 plants from each plot were harvested and separated into plant components (grain and the rest of the aerial biomass) dried at 75 • C for 48 h and weighed. The harvest index (grain/(grain + aerial biomass)) was calculated and used to obtain the biomass from the GY of each plot. were controlled by chemical application to control broad leaves and grassy weed the pre-plant irrigation and also by using mechanical cultivation before the first a irrigation; application of Afflix for aphid control was carried out if needed. Post-plant irrigations applied in the separatio tween indicated growth stages (GS). Fertilization refers to N application as urea, either at p ing (first application) or a as a topdressing (second application).

Hyperspectral Flight Campaign and Data Processing
A weekly/bi-weekly flight campaign took place from January until March eve and only images free of clouds from relevant GSs were selected for spectral analy ure 1b; Table S2). The images were acquired with a push-broom micro-hyperspectr era (Micro-Hyperspec VNIR model, Headwall Photonics, Bolton, MA, USA) (spe gion: 400-850 nm; 250 channels) flying on board a Cessna aircraft with heading solar plane 300 m above ground, yielding 0.3 m of ground resolution.
Georeferencing of each image was done based on corner features of the expe The corner geocoordinates were taken using a global navigation satellite system receiver with the real-time kinematic technique (Trimble R4 GNSS system, T Sunnyvale, CA, USA) and used for georeferencing all orthomosaics, ensuring ove tween images across flight campaigns by having root mean square errors (RMSEs than the image resolution. The micro-hyperspectral instrument was radiometrically calibrated in the lab using derived coefficients with a calibrated uniform light source (integrating CSTM-USS-2000C Uniform Source System, LabSphere, North Sutton, NH, USA) levels of illumination and six integration times. Hyperspectral imagery was atmo cally corrected using the total incoming irradiance at 1-nm intervals simulated w Simple Model of the Atmospheric Radiative Transfer of Sunshine (SMARTS) dev by the National Renewable Energy Laboratory, US Department of Energy. Theref aerosol optical depth was measured at 550 nm with a Micro-Tops II sun photomete Post-plant irrigations applied in the separation between indicated growth stages (GS). Fertilization refers to N application as urea, either at pre-sowing (first application) or a as a topdressing (second application).
The durum wheat cultivar CIRNO C2008 was sown at 120 kg seed ha −1 , planting two rows of wheat 25 cm apart on top of 80 cm beds. Phosphorous fertilizer was applied before sowing to ensure P availability at a rate of 46 kg P 2 O 5 ha −1 as triple super phosphate. Weeds were controlled by chemical application to control broad leaves and grassy weed before the pre-plant irrigation and also by using mechanical cultivation before the first auxiliary irrigation; application of Afflix for aphid control was carried out if needed.

Hyperspectral Flight Campaign and Data Processing
A weekly/bi-weekly flight campaign took place from January until March every year, and only images free of clouds from relevant GSs were selected for spectral analysis ( Figure 1b; Table S2). The images were acquired with a push-broom micro-hyperspectral camera (Micro-Hyperspec VNIR model, Headwall Photonics, Bolton, MA, USA) (spectral region: 400-850 nm; 250 channels) flying on board a Cessna aircraft with heading on the solar plane 300 m above ground, yielding 0.3 m of ground resolution.
Georeferencing of each image was done based on corner features of the experiment. The corner geocoordinates were taken using a global navigation satellite system (GNSS) receiver with the real-time kinematic technique (Trimble R4 GNSS system, Trimble, Sunnyvale, CA, USA) and used for georeferencing all orthomosaics, ensuring overlap between images across flight campaigns by having root mean square errors (RMSEs) lower than the image resolution.
The micro-hyperspectral instrument was radiometrically calibrated in the laboratory using derived coefficients with a calibrated uniform light source (integrating sphere, CSTM-USS-2000C Uniform Source System, LabSphere, North Sutton, NH, USA) at four levels of illumination and six integration times. Hyperspectral imagery was atmospherically corrected using the total incoming irradiance at 1-nm intervals simulated with the Simple Model of the Atmospheric Radiative Transfer of Sunshine (SMARTS) developed by the National Renewable Energy Laboratory, US Department of Energy. Therefore, the aerosol optical depth was measured at 550 nm with a Micro-Tops II sun photometer (Solar LIGHT Co., Philadelphia, PA, USA) in the study area at the time of the flights. SMARTS computes clear sky spectral irradiance, including direct beam, circumsolar, hemispherical diffuse, and total irradiance on a tilted or horizontal plane for specified atmospheric conditions. The algorithms were developed to match the output from the MODerate resolution atmospheric TRANsmission (MODTRAN) complex band models to within 2%, using aerosol optical depth as an input. Parameterizations in the SMARTS model are based on high-resolution (1 cm −1 ) MODTRAN results subsequently smoothed to the SMARTS model wavelength interval [43]. The spectral resolution was 0.5 nm for the 280-to 400-nm and 1 nm for the 400-to 1000-nm ranges of the electromagnetic spectrum. Spectral binning was performed on each mosaic into 7.5 nm FWHM (full width at half maximum) to decrease noise effects, resulting in 62 wavelengths. From these, the 751-, 759-, 766-, and 773-nm wavelengths were removed due to oxygen absorption by the sensor that led to errors in the retrieved reflectance. Finally, 58 wavelengths were used for subsequent analyses.
The imagery data extraction was done by taking the average of the pixels from the two central beds (out of eight beds) of each plot, considering a centralized longitudinal buffer-in 6 m long. The within-plot region of interest was then considered as 1.6 × 6 m. The resulting averaged spectral data set was used for the following statistical analysis.

Spectral Exploratory Analysis Using Narrow-Band Combinations
Two-by-two combinations of all available spectral bands were used to calculate the normalized difference spectral index (NDSI) as where Ri and Rj were the reflectance for wavelengths i and j, respectively [44]. Then linear regression analyses were performed using the NDSI as the predictor of the variables studied for each crop.
Contour maps of the RMSE and coefficient of determination (R 2 ) of the relationships of the NDSI and the variables studied (GY, biomass, GNC, and N output) were used to identify the spectral regions showing the highest predictive potential at different wheat GSs. According to the agronomic data analysis, different years were combined in a single data set to build the contour maps by GS when the interaction among all factors and year was not significant (GY, biomass, and N output) or they were used individually per year (GNC).

Spectral Indices and Assessment of Crop Variables
Vegetation indices based on the spectral regions from the contour maps that showed low RMSE and high R 2 were calculated. A list of the indices calculated from the reflectance bands (Table S3) is supplied as Supplementary Materials (Table S4). Since many indices were highly correlated ( Figure S6), representative indices were selected for the study (Table 1) while others are only shown in Table S4 [9,[45][46][47][48][49][50]. Indices were grouped as greenness or structural when they were based on the normalized ratio between bands from the NIR and the visible, and as chlorophyll indices when narrow bands from the red-edge region were incorporated (Table 1). Moreover, canopy chlorophyll content indices calculated by combining a structural with a pigment-related index were incorporated because they were reported to overcome some of the limitations attributed to structural indices [20,51]. Two VIs based on a direct wavelength ratio were tested because they were reported to predict the studied variables [52,53]. Linear and exponential relationships were assessed between VIs and crop variables, obtaining slightly better adjustment with the linear relationship ( Figure S7). The ability of the indices to assess crop variables was evaluated based on the R 2 and RMSE of the linear relationship at the relevant GS.

Application to N Fertilizer Recommendation
As in the treatments with two post-plant irrigations, the second fertilizer topdressing was spread right after an image acquired at booting (GS41); VIs obtained from this image were used to assess their ability for fertilizer recommendation. A procedure previously used to test hand-held chlorophyll meters [58] and airborne indices [59] was adapted to test the accuracy of the actual VIs. The N2b treatment (250 kg N ha −1 split as two-thirds at sowing and one-third in the first irrigation after sowing) was considered the N dose at which the maximum GNC was attained, and the relative GNC (GNC relative ) for a plot was calculated by dividing the GNC of the plot by the average GNC for the N2b dose in each tillage treatment. To determine the accuracy of each index at distinguishing N-responsive from nonresponsive plots, the nitrogen sufficiency index (NSI) was calculated as [60]: NSI = Index value for tested plot/Average index for N2b plots Correct fertilization occurred when NSI = 1 and GNC relative = 1. Plots in which the NSI > 1 and GNC relative < 1 corresponded to fertilizer underestimation, and plots where the NSI < 1 and GNC relative > 1 corresponded to overestimation. Both underestimated and overestimated plots were considered errors and the accuracy of VIs was reported as the percentage of errors from the total number of plots.

Statistical Analysis
Statistical analysis was performed with R software (version 3.6.3) [61]. The impact of the different factors on the crop variables at measured at harvest (GY, biomass, GNC, and N output) was analyzed using a mixed model adjusted with the package from the lme4 library [62]. The year, irrigation, tillage, and N treatments, and the interaction between them, were considered as fixed factors, whereas the subplot was considered as a random effect for the analysis of variance. Means were separated by the least significant difference (LSD) test (P ≤ 0.05).
To calculate the NDSI and perform the linear regression analysis using the NDSI as the predictor of each crop variable, the "doBy," "plyr", and "pls" packages were used. Then contour maps were generated with the "lattice" package. The correlations between the VIs and the agronomic variables at relevant GSs were assessed by calculating Pearson's correlation coefficients with the "corrplot" package and correlograms were generated with the "corrgram" package.

Agronomic Data
For the GY, biomass, and N output, the interaction among all factors was not significant; therefore, all years were analyzed together (Table S5). In contrast, a strong interaction on GNC was observed among factors and years, so years were analyzed separately.

Grain Yield and Biomass
Conservation tillage increased GY and biomass compared to conventional tillage with two post-plant irrigations, whereas with four post-plant irrigations, no differences were found between tillage treatments, suggesting that conservation tillage increased water availability, particularly during the grain filling period (Figure 3a; Table S5; Figure S9). This effect was clearly observed in three out of the four years, 2017 being the exception with small GY differences between tillage treatments and no differences for biomass. were differentiated with low irrigation, but only with higher irrigation did differen between N doses (N2 > N1) become significant for some of the years. Therefore, the res indicate that the interaction between N and the water effect was relevant.

Grain N Concentration and N Output
A strong interaction effect on GNC was observed among all factors and years (Ta S5; Figure 4). Lower irrigation tended to produce a higher GNC, but the significanc this increase varied from year to year, depending on the N treatment ( Figure 4). In ad tion, N fertilization increased GNC compared to unfertilized wheat (N0). In contrast, sp ting N fertilizer application did not have a significant effect. With low irrigation, dif ences in GNC between N1 and N2 only were present in 2017; whereas with higher irr tion, increasing the N dose (N2 vs. N1) enhanced GNC every year. Wheat grain yield (Mg ha −1 , 12% moisture) (a) and N output (kg N ha −1 ) (b) for combined tillage and irrigation factors, for each N treatment, over the 4 years. Treatment N0 has no fertilizer application; N1 received 125 kg N ha −1 split one-third at sowing and two-thirds in the first post-plant irrigation; N2 received 250 kg N ha −1 split as one-third at sowing and two-thirds in the first post-plant irrigation; N1b received 125 kg N ha −1 split as two-thirds at sowing and one-third in the first post-plant irrigation; N2b received 250 kg N ha −1 split as two-thirds at sowing and one-third in the first post-plant irrigation, and N2c received 250 kg N ha −1 applied at sowing. Small bars indicate standard error.
In general, higher irrigation led to higher GY and biomass, enhancing crop N response (Figure 3a). Unfertilized wheat (N0) and fertilized treatments (either N1 or N2) were differentiated with low irrigation, but only with higher irrigation did differences between N doses (N2 > N1) become significant for some of the years. Therefore, the results indicate that the interaction between N and the water effect was relevant.

Grain N Concentration and N Output
A strong interaction effect on GNC was observed among all factors and years (Table S5; Figure 4). Lower irrigation tended to produce a higher GNC, but the significance of this increase varied from year to year, depending on the N treatment ( Figure 4). In addition, N fertilization increased GNC compared to unfertilized wheat (N0). In contrast, splitting N fertilizer application did not have a significant effect. With low irrigation, differences in GNC between N1 and N2 only were present in 2017; whereas with higher irrigation, increasing the N dose (N2 vs. N1) enhanced GNC every year. S5; Figure 4). Lower irrigation tended to produce a higher GNC, but the significance of this increase varied from year to year, depending on the N treatment ( Figure 4). In addition, N fertilization increased GNC compared to unfertilized wheat (N0). In contrast, splitting N fertilizer application did not have a significant effect. With low irrigation, differences in GNC between N1 and N2 only were present in 2017; whereas with higher irrigation, increasing the N dose (N2 vs. N1) enhanced GNC every year.  Conventional tillage increased GNC compared to conservation tillage when the irrigation dose was low (mainly when wheat was fertilized), whereas with higher irrigation no differences were found between tillage treatments. Similar to GY and biomass, the interaction of GNC with year was due to small differences in 2017.
Higher irrigation led to a greater N output, and no differences were observed between tillage treatments, whereas with low irrigation, conservation tillage led to higher values compared to conventional tillage (Figure 3b). In general, almost all the years studied showed differences among the three N rates, which was more evident in treatments that received a higher irrigation level.
Overall, the large variability in the effect of the management factors on the variables studied and the relevance of the interaction effects suggests that there is an opportunity for remote sensing to be used as a tool for guiding N fertilizer recommendation in season.

Spectral Signatures of the Treatments
Differences in reflectance between fertilized and unfertilized treatments were observed from the beginning of stem elongation, and by the medium milk GS these differences declined, particularly for treatments with lower irrigation ( Figure 5; Figure S10). In general, fertilized treatments showed lower reflectance in the visible region and higher reflectance in the NIR than the unfertilized wheat (N0), leading to a steeper slope in the red-edge. The greatest difference between treatments was observed in the NIR (>750 nm), where higher reflectance was attained ( Figure 5; Figure S10). The NIR reflectance for N0 was always lower than for the fertilized treatments and the maximum reflectance was observed from stem elongation to flowering, with differences between spectral signatures greater at the early milk GS.
served from the beginning of stem elongation, and by the medium milk GS these differences declined, particularly for treatments with lower irrigation (Figure 5; Figure S10). In general, fertilized treatments showed lower reflectance in the visible region and higher reflectance in the NIR than the unfertilized wheat (N0), leading to a steeper slope in the red-edge. The greatest difference between treatments was observed in the NIR (>750 nm), where higher reflectance was attained ( Figure 5; Figure S10). The NIR reflectance for N0 was always lower than for the fertilized treatments and the maximum reflectance was observed from stem elongation to flowering, with differences between spectral signatures greater at the early milk GS.  The same pattern was repeated every year: reflectance for the fertilized treatments was greater for high irrigated treatments than for those with lower irrigation. Conservation tillage led to a higher reflectance compared to conventional tillage, probably because of increased water availability but to a lesser degree than irrigation. Particularly, in 2014 and 2017 the treatments with low irrigation and conservation tillage resulted in similar reflectance to high irrigated conventional tillage. For N0, differences between irrigation and tillage treatments were much lower. Regarding N fractionation, the effect in reflectance was less clear: only slight differences were observed between treatments at certain GSs, and the signatures of the same N dose followed a similar pattern. Overall, these spectral observations were consistent with the agronomic data at harvest, particularly with yield and N output, which encouraged us to undertake a detailed exploratory analysis.

Grain Yield and Biomass
Differences throughout the various GSs were observed for GY, which obtained the highest relationship between the NDSI and the variable between the beginning of stem elongation towards early milk, with the peak at flowering (Figure 6; Figure S11). The best wavelength combination was found between the broad parts of the NIR (Rj, 750-850 nm, except 751, 759, 766, and 773 nm, which were previously removed) combined with parts of the visible (Ri, 398-707 nm) and red-edge regions (Ri, 720-740 nm). These results were followed by band combinations in the visible region (Rj, 574-665 nm) with the green regions (Ri, 530-619 nm) and other peaks in the blue regions (Rj and Ri, 398-486 nm), which were present in the early growth stages (Figure 6; Figure S11). 6). The R followed a similar trend as the RMSE, reaching values > 0.8 for the comb of NIR and visible bands at flowering and 0.7 at stem elongation ( Figure S11). results were found for biomass in the same growth stages, reaching the best comb at flowering with RMSE between 800 and 1400 kg ha −1 and R 2 ranging from 0.4 to ures S12 and S13). re 6. Contour maps of the root mean square error (RMSE, kg ha −1 ) between the normalized difference spectral in SI) (Ri, Rj) and grain yield using the complete combinations of two wavelengths at i and j within the range 400- At flowering, the best relationships were observed, obtaining a RMSE as low as 400-500 kg ha −1 in the combinations of NIR with the visible regions, and between 600 and 700 kg ha −1 if only bands from the visible region were combined. At earlier GSs, the same regions of contour maps showed lower RMSE values with GY (RMSE > 600 kg ha −1 ; Figure 6). The R 2 followed a similar trend as the RMSE, reaching values > 0.8 for the combinations of NIR and visible bands at flowering and 0.7 at stem elongation ( Figure S11). Similar results were found for biomass in the same growth stages, reaching the best combinations at flowering with RMSE between 800 and 1400 kg ha −1 and R 2 ranging from 0.4 to 0.8 ( Figures S12 and S13).

Grain N Concentration and N Output
The spectral exploratory analysis was conducted for all individual years for GNC due to the significant year interaction observed in the agronomic results (Table S5). Accordingly, contour maps showed differences among years in both GS performance and best wavelength combination (Figure 7 and Figure S14). Generally, the relationship between GNC and the spectral bands improved from booting to early milk, but in 2016 it was better at heading than at flowering. A common pattern to all years was that early milk was the best GS to assess GNC, reflecting a RMSE as low as 0.05%N in 2015 and 2016, and 0.15%N in 2014 and 2017. From the medium milk stage on, the relationship declined sharply in all years. At booting, a relevant GS for fertilizer recommendation, some spectral regions presenting a significant relationship with GNC had already appeared and were highly significant in later GSs. tween GNC and the spectral bands improved from booting to early milk, but in 2016 it was better at heading than at flowering. A common pattern to all years was that early milk was the best GS to assess GNC, reflecting a RMSE as low as 0.05%N in 2015 and 2016, and 0.15%N in 2014 and 2017. From the medium milk stage on, the relationship declined sharply in all years. At booting, a relevant GS for fertilizer recommendation, some spectral regions presenting a significant relationship with GNC had already appeared and were highly significant in later GSs. The predominantly high wavelength combinations had some common distribution patterns among years, emphasizing NIR and red-edge regions (707-850 nm) on Rj combined with a broad range on Ri (398-740 nm). Narrower wavelength combinations in the visible range performed well, such as 398-530 nm on Rj and 398-486 nm on Ri as well as 574-665 nm on Rj combined with 398-486 nm and 530-619 nm on Ri. In addition, the area around the green region (Rj and Ri 530 nm) was highlighted in the contour maps of most of the GSs (Figure 7).
The relationship of N output with wavelength combinations was similar to GY and biomass, with the difference that the NIR (Rj, 750-850 nm) together with the red-edge region (Ri, 720-740 nm) showed a low RMSE (Figure 8). These results indicate the excellent predictive ability of these wavelength regions for N output, mainly between the end of stem elongation and flowering, obtaining a RMSE ranging from 8 to 12 kg N ha −1 and R 2 between 0.6 and 0.8 (Figure 8; Figure S15). 574-665 nm on Rj combined with 398-486 nm and 530-619 nm on Ri. In addition, the area around the green region (Rj and Ri 530 nm) was highlighted in the contour maps of most of the GSs (Figure 7).
The relationship of N output with wavelength combinations was similar to GY and biomass, with the difference that the NIR (Rj, 750-850 nm) together with the red-edge region (Ri, 720-740 nm) showed a low RMSE (Figure 8). These results indicate the excellent predictive ability of these wavelength regions for N output, mainly between the end of stem elongation and flowering, obtaining a RMSE ranging from 8 to 12 kg N ha −1 and R 2 between 0.6 and 0.8 (Figure 8; Figure S15).

Grain Yield and Biomass
The structural indices (NDVI and reformed difference vegetation index (RDVI)) were highly correlated with GY and biomass in all GSs ( Table 2). The best statistical performance of these indices was achieved at flowering, when the R 2 > 0.70 and RMSE < 596 and 988 kg ha −1 for GY and biomass, respectively. At the early milk stage, the correlations were

Grain Yield and Biomass
The structural indices (NDVI and reformed difference vegetation index (RDVI)) were highly correlated with GY and biomass in all GSs ( Table 2). The best statistical performance of these indices was achieved at flowering, when the R 2 > 0.70 and RMSE < 596 and 988 kg ha −1 for GY and biomass, respectively. At the early milk stage, the correlations were slightly lower than at flowering but still high in both agronomic variables. The highest correlations occurred at flowering ( Table 2). The chlorophyll indices normalized difference red-edge (NDRE) and red-edge optical reflectance (R 750 /R 710 ), and the canopy index CCC also showed a significant correlation with GY and biomass; in particular, at flowering R 2 > 0.65 in all cases, and the RMSE < 664 kg ha −1 for GY and biomass. Nevertheless, the other structural (normalized green (NG)), chlorophyll (double peak canopy nitrogen index (DCNI), modified chlorophyll absorption ratio index (MCARI)), and canopy (canopy chlorophyll content index (CCCI), and nitrogen planar domain index (NDPI)) indices did not present a good relationship with these agronomic variables. Therefore, VIs that combined NIR and the visible bands were the best GY and biomass predictors. All VIs showed worse predictive ability during the period before flowering.

Grain N Concentration and N Output
The overall VI performance for GNC was the worst among the agronomic variables evaluated, with great variability between years and growth stages (Table 3), agreeing with the results observed in the contour maps (Figure 7).
At booting, only the DCNI and CCCI showed R 2 > 0.45 in 2014 and 2017, whereas the rest of the VIs showed low R 2 in all years. In addition, the R 2 for the DCNI and CCCI was very low and the relationship with GNC not significant in 2015 and 2016. At heading, the relationship between GNC and the VIs tended to improve, performing better on the structural indices NDVI, RDVI, and NG, the chlorophyll indices NDRE and R 750 /R 710 , and the canopy indices CCC and NDPI in 2016 (R 2 > 0. 80 Table 3).
The relationship between the spectral indices and N output was better for the images acquired at the flowering and early milk stages than at the beginning of stem elongation ( Table 4). The chlorophyll indices NDRE and R 750 /R 710 , both the structural (NDVI and RDVI) and the canopy index CCC showed significant linear correlations with N output in all growth stages, obtaining R 2 > 0.7 and RMSE < 12 kg N ha −1 at flowering and early milk ( Table 4). The best performance was obtained with NDRE, which achieved an R 2 of 0.84 and RMSE of 9 kg N ha −1 at flowering. Similar to GY and biomass, the DCNI, MCARI, NG, CCCI, and NDPI indices provided lower correlations than the previous indices for N output. According to these results, the spectral indices calculated from red-edge bands and NIR and red band combinations were more consistent in assessing N output.

Application to N Fertilizer Recommendation for Increasing GNC
For three out of the four years studied, the chlorophyll indices showed lower errors than the rest of the indices in identifying plots that were responsive and nonresponsive to a topdressing N fertilizer application within the low irrigated treatments (Figure 9). The lowest error was achieved using the DCNI in 2014 and 2017 (28% and 17%, respectively) and using the MCARI in 2016 (29%). In 2015, the lowest error was obtained using NG (13%) followed by NDRE (21%).
These results agree with the linear correlations between the indices and GNC (Figure 10). The VIs that incorporated information from the red-edge bands (DCNI, MCARI, and NDRE) and NIR-green combinations (NG) obtained higher R 2 in the years in which they are better able to discriminate GNC responsive and nonresponsive sites (Figure 9). However, although the chlorophyll indices showed potential for N fertilization at booting, a great variability between VIs and years was observed and there was not a single index that performed well every year. Table 3. Coefficient of determination (R 2 ) and root mean square error (RMSE) between hyperspectral indices and Grain N concentration at booting (GS41), heading (GS51), flowering (GS65), and early milk (GS73), over the 4 years studied.

Application to N Fertilizer Recommendation for Increasing GNC
For three out of the four years studied, the chlorophyll indices showed lower errors than the rest of the indices in identifying plots that were responsive and nonresponsive to a topdressing N fertilizer application within the low irrigated treatments (Figure 9). The lowest error was achieved using the DCNI in 2014 and 2017 (28% and 17%, respectively) and using the MCARI in 2016 (29%). In 2015, the lowest error was obtained using NG (13%) followed by NDRE (21%). These results agree with the linear correlations between the indices and GNC ( Figure  10). The VIs that incorporated information from the red-edge bands (DCNI, MCARI, and NDRE) and NIR-green combinations (NG) obtained higher R 2 in the years in which they are better able to discriminate GNC responsive and nonresponsive sites (Figure 9). However, although the chlorophyll indices showed potential for N fertilization at booting, a great variability between VIs and years was observed and there was not a single index that performed well every year.

What are the Best Indices for Estimating GY, Biomass, GNC, and N Output
This study confirms the great potential that remote sensing has for GY and biomass assessment in spring wheat. The highest assessment potential came from band combina-

What Are the Best Indices for Estimating GY, Biomass, GNC, and N Output
This study confirms the great potential that remote sensing has for GY and biomass assessment in spring wheat. The highest assessment potential came from band combinations in the NIR and the red-edge as well as the NIR and visible region; therefore, the so-called structural indices such as the NDVI and RDVI were among the best predictors of GY. The high assessment potential for wheat GY of band combinations from the NIR and red-edge regions was reported in previous studies that discussed the advantage of using narrow bands that capture the abrupt changes in the red-edge region [10,11,63,64]. In the current study, the visible and NIR band combinations provided a high assessment ability. This is in agreement with some of the literature [17,22,25,65] but not with other studies that reported a low relationship in wheat and other crops [19,59,[66][67][68][69]. These low relationships were attributed to differences in canopy cover and soil background. Indices that estimated chlorophyll activity at the canopy level were able to provide a better relationship with crop status and yield in Australia, Italy, and the US [19,57,69]. In the current study, a uniform ground cover was observed in all plots, probably because they were all irrigated. This may be part of the reason for the better performance of structural and canopy indices as predictors of GY, which were consistent during the four years. It is remarkable that despite the significant impact of irrigation level and tillage practices on wheat productivity, GY, and biomass were accurately predicted over the four years. The other reason may be the combination of the genotype and planting method employed in this study. CIRNO C2008, the genotype used in this study, is durum wheat, which tends to have upright leaves as opposed to the floppy leaves in bread wheat. Moreover, this genotype is shorter than most durum cultivars, so these two canopy architecture traits limit the ability of the crop to cover the ground rapidly. In addition, the planting method utilized-two rows 25 cm apart on top of an 80-cm bed leaving a 55-cm gap between rows in different beds-also, limited the ability of the canopy to fully cover the ground and maximize yield [70]. Therefore, the use of a genotype sensitive to wide row spacing reduced the possibility of the VIs reaching saturation.
The GNC response to the N fertilization rate or splitting was more erratic among years than GY. Grain N concentration responded to the highest N rate in various years and treatment combination, whereas splitting fertilization did so only occasionally. When the response occurred, it was usually coupled to a higher reflectance value, particularly in NIR. The higher assessment potential came from band combinations in NIR and rededge at the early milk GS. Some of the years, band combinations from the visible region showed good performance (i.e., 2015, and 2016), but this was not consistent in all four years. This is in agreement with results from Rodrigues et al. [23], who obtained the best relationship between grain protein concentration and band combinations from the red-edge and green region, even if the relationship in the field study was very weak (R 2 ≤ 0.21). Prey and Schmidhalter [11] also emphasized the potential for the red-edge to assess grain N concentration, although lack of consistency between years was mentioned in their study. This variability across studies and years may be associated with the great variation in grain N concentration among genotypes and environmental conditions [30].
A novelty of this study is that it shows the potential of remote sensing for assessing N output, calculated as the product of GY and GNC. Band combinations in NIR and red-edge provided the highest assessment potential; therefore, indices such as NDRE that account for pigment content were among the best predictors. Estimation of N output is useful in forage crops and also an important indicator of N productivity of the agro-ecosystem and key to calculating NUE [40,71].

When Is the Best Growth Stage for Assessing GY, Biomass, GNC, and N Output
The best GS for yield assessment was between the end of flowering and early milk stages. The RMSE that resulted when using reflectance indices from flowering to grain development was as low as 400 kg ha −1 during a four-year period, with the mean of the highest yield obtained at 5250 kg ha −1 . At the late milk stage, the error increased, probably due to the decrease in reflectance associated with crop senescence. Increasing R 2 values between reflectance indices and wheat GY from early growth stages to flowering or grain filling is a common trend reported in the literature. However, differences during the best GSs were mainly genotype-dependent [66]. When using visible/NIR combinations, the RMSE was higher at flowering than at grain filling, probably due to changes in growth and morphological interferences associated with anther exposure [22,72]. However, the red-edge/NIR combination was not affected by these interferences and the lowest RMSE with GY was observed at flowering. These results are in agreement with observations from Prey and Schmidhalter [11], who recommended avoiding anthesis for grain yield assessment, particularly when using broad bands that cannot capture red-edge changes.
For GNC assessment, the best GS was early milk stage. The RMSE of the prediction depended on the year, whereas in 2015 and 2016 the RMSE was as low as 0.05%N, and in 2014 and 2017 was between 0.15 and 0.2%N. Given that the quality criteria for durum wheat ranges between 2.2 and 2.3%N depending on the country, a RMSE > 0.2%N implies high uncertainty in the prediction. Grain N concentration depends on the N accumulated in the biomass at flowering, but it is also strongly influenced by temperature and water access during grain filling [73,74]. High temperature and water stress during grain filling enhance GNC [75,76]. Consequently, the year effect becomes very important even for the same wheat genotype, and even if increases in GNC are often associated with a GY reduction due to a dilution effect, this relationship could be the reverse or not significant [58,77]. In the present study, differences in water availability during grain filling came from both irrigation and the tillage effect, highlighting the need to evaluate if information from crop water status could improve the assessment of GNC. This is precisely one of our current lines of research where we are evaluating crop water status using soil measurements as well as canopy temperature measured with infrared cameras. The aim of this research is to establish a better assessment of GNC by adding water related proxies to our analysis.
Finally, the best growth stages for N output prediction were stem elongation, booting, and flowering. When using reflectance indices from flowering to grain development, the RMSE was as low as 8-9 kg N ha −1 during a four-year period, accounting for about 10% of the N output. At early milk, the combination of NIR and red bands resulted in an error of 11-12 kg N ha −1 , which increased drastically at the late milk stage. Indices were able to capture the N output response to the highest N dose observed in high irrigated treatments, which was not observed in the treatments with low irrigation.

Use of Spectral Information to Guide Fertilizer Recommendation
The application of in-season measurements of crop reflectance for developing strategies that enable site-specific N fertilizer recommendation has been one of the most investigated topics to enhance N use efficiency in the last decades [14,78]. Algorithms based on spectral sensor readings have been developed and used in precision agriculture to recommend management zones with variable fertilizer application rate [17,79]. Fertilizer recommendation made using these algorithms rely on the in-season prediction of (i) GY potential, and (ii) crop response to N application. The spectral exploratory analysis conducted in our experiment proved the predictive accuracy in GY and showed limitations for prediction of GNC. Furthermore, the information obtained by the complete two by two combination of wavelengths presents the possibility to support technological progress, allowing better decision-making in the development of new sensors and indices for specific goals (i.e., improving grain quality).
With the data available in this study, we were able to evaluate the potential of remote sensing to identify plots in which GNC respond to N application from nonresponsive plots. Identification of GNC responsive plots at booting is crucial because this information would allow applying fertilizer only to responsive sites. When the objective is enhancing GNC in spring wheat, common strategies are increasing soil N availability by late applications (i.e., booting or heading) of either fertilizer to the soil or foliar N applications that can be readily taken up [80,81]. These strategies are particularly suitable for remote sensing detection of crop nutritional status given that canopies are already fully developed. Identification of responsive sites provides the opportunity for the use of variable-rate application machines that will deliver the right N rate spatially throughout the field. In this study, the lowest error in the identification of plots prone to respond to an increase in N application after booting was achieved with chlorophyll indices based on the combination of NIR/red-edge bands (DCNI, NDRE, and MCARI), emphasizing the role of the narrow-band type of sensors in precision agriculture. However, it is interesting to note that the simple ratio of NIR and green bands (NG) provided the best relationship with GNC in 2015 and performed better than the other structural indices in the rest of the years. This is in agreement with Daughtry et al. [52], who reported a good response of the NIR/green ratio with chlorophyll content in maize, and showed the potential of some structural indices, easily adaptable to relatively broad bands, to estimate GNC. Nevertheless, there was not a single index that provided reliable recommendations every year and the error in identifying responsive from nonresponsive sites was >20% in two of the years studied. Therefore, there is still a need to clarify if information from other stress factors could reduce the error consistently to levels acceptable for decision making. Furthermore, emerging machine learning techniques based on ensemble methods (i.e., random forest, and neuronal networks) that already showed potential in obtaining robust outcomes in agri-environmental studies [82,83] could be used to combine the information form other stress factors allowing better decision-making in precision agriculture.

Conclusions
Indices based on the combination of bands from NIR and visible regions (NDVI, and RDVI) and from NIR and red-edge regions (NDRE) provided the higher assessment potential of spring wheat grain yield. Grain N concentration and N output were better estimated by indices that combined bands in the NIR and the red-edge. The best growth stage for the yield assessment took place from the end of flowering to early milk, when the RMSE was below 700 kg ha −1 of the yield during a four-year period. For GNC the best growth stage was early milk across treatments. Grain N concentration response to N treatment varied with years and depended on the irrigation level and tillage. The response was usually detected by changes in reflectance, but reliable assessment of GNC in wheat by remote sensing remains challenging. Nitrogen output was better assessed at bootingheading, when the RMSE in the assessment was as low as 8-9 kg N ha −1 during a four-year period (≈10% of the average N output). Overall, broad bands from multispectral sensors could be used to assess wheat yield, but reliable assessment of grain N concentration and N output were obtained using narrow bands from a hyperspectral sensor.
The use of remote sensing to guide fertilizer recommendations for increasing GNC based on structural and canopy-related indices at booting resulted in an error greater than 20% in the identification of responsive sites. Identification of GNC responsive sites was better estimated by chlorophyll indices that combine NIR and narrow red-edge bands such as the DCNI, NDRE, and MCARI, but the simple ratio of NIR and green bands (NG) was also promising. However, the error in identifying GNC responsive sites should be reduced before the application of this approach to precision agriculture.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/rs13071373/s1, Figure S1: Monthly rainfall (blue bars), average maximum temperatures (continuous lines) and average minimum temperatures (dashed lines) during the experiment in the Yaqui Valley (Sonora, Mexico). Table S1: Critical dates (sowing, harvest, fertilization, irrigation) for the field experiment and for the two irrigation blocks during the 4 experimental years. Table S2: Selected dates of image acquisition and the corresponding wheat growth stage (GS) of the two irrigation blocks during the 4 experimental years. Names of the relevant GSs for the analysis are in parentheses in addition to the decimal code. Table S3: Reflectance used to calculate the airborne indices and the exact bands of the hyperspectral image from which the reflectance was extracted. Table S4: Airborne hyperspectral indices calculated in this study. Figure S6: Correlograms of the vegetation indices calculated as in Table S4. The value inside each square of the grid is the coefficient of correlation (r value) at P ≤ 0.01 for different years: (a) 2014, (b) 2015, (c) 2016 and (d) 2017. Figure S7: Linear and exponential relationship between spectral indices (NDVI, R 750 /R 710 , and CCC) and crop variables (yield, biomass and N output) at flowering. The X-axis represents the spectral index values and the Y-axis the crop variable values. Dotted lines are the adjusted linear (black) and exponential (red) model and R 2 its coefficient of determination. Table S5: Results (P-value) of the analysis of variance between irrigation (irr), tillage (till), N treatment (nlevel) and year, and their combined interactions on the wheat grain yield, total above-ground biomass, grain N concentration (GNC) and N output at harvest. Figure S9: Biomass for combined tillage and irrigation factors, for each N treatment, over the 4 years. Small bars indicate SE. Figure S10: Spectral reflectance (%) from 400 nm to 850 nm, for the different N treatments (N0, N1, N1b, N2, N2b, N2c) with conventional tillage, either with two post-plant irrigations (top row), or with four post-plant irrigations (bottom row), at five growth stages, in 2014. Figure S11: Contour maps of the coefficient of determination (R 2 ) between the NDSI (Ri, Rj) and grain yield using the complete combinations of two wavelengths at i and j nm within the range 400-850 nm. At each growth stage, data from the different years were merged to build the figure (4 years for all the GSs and 2 for the end of stem elongation and early milk stages). Figure S12: Contour maps of the root mean square error (RMSE, kg ha −1 ) between the NDSI (Ri, Rj) and biomass using the complete combinations of two wavelengths at i and j nm within the range 400-850 nm. At each growth stage, data from the different years were merged to build the figure (4 years for all the GSs and 2 for the end of stem elongation and medium milk stages). Figure S13: Contour maps of the coefficient of determination (R 2 ) between the NDSI (Ri, Rj) and biomass using the complete combinations of two wavelengths at i and j nm within the range 400-850 nm. At each growth stage, data from the different years were merged to build the figure (4 years for all the GSs and 2 for the end of stem elongation and medium milk). Figure S14: Contour maps of the coefficient of determination (R 2 ) between the NDSI (Ri, Rj) and Grain N concentration using the complete combinations of two wavelengths at i and j nm within the range 400-850 nm, at different growth stages for different years: (a) 2014, (b) 2015, (c) 2016, and (d) 2017. Figure S15: Contour maps of the coefficient of determination (R 2 ) between the NDSI (Ri, Rj) and N output using the complete combinations of two wavelengths at i and j nm within the range 400-850 nm. At each growth stage, data from the different years were merged to build the figure (4 years for all the GSs and 2 for the end of stem elongation and medium milk stages).