Leaf and Canopy Level Detection of Fusarium Virguliforme (Sudden Death Syndrome) in Soybean

: Pre-visual detection of crop disease is critical for food security. Field-based spectroscopic remote sensing offers a method to enable timely detection, but still requires appropriate instrumentation and testing. Soybean plants were spectrally measured throughout a growing season to assess the capacity of leaf and canopy level spectral measurements to detect non-visual foliage symptoms induced by Fusarium virguliforme ( Fv , which causes sudden death syndrome). Canopy reﬂectance measurements were made using the Piccolo Doppio dual ﬁeld-of-view, two-spectrometer (400 to 1630 nm) system on a tractor. Leaf level measurements were obtained, in different plots, using a handheld spectrometer (400 to 2500 nm). Partial least squares discriminant analysis (PLSDA) was applied to the spectroscopic data to discriminate between Fv -inoculated and control plants. Canopy and leaf spectral data allowed identiﬁcation of Fv infection, prior to visual symptoms, with classiﬁcation accuracy of 88% and 91% for calibration, 79% and 87% for cross-validation, and 82% and 92% for validation, respectively. Differences in wavelengths important to prediction by canopy vs. leaf data conﬁrm that there are different bases for accurate predictions among methods. Partial least square regression (PLSR) was used on a late-stage canopy level data to predict soybean seed yield, with calibration, cross-validation and validation R 2 values 0.71, 0.59 and 0.62 ( p < 0.01), respectively, and validation root mean square error of 0.31 t · ha − 1 . Spectral data from the tractor mounted system are thus sensitive to the expression of Fv root infection at canopy scale prior to canopy symptoms, suggesting such systems may be effective for precision agricultural research and management.

Soybean (Glycine max (L.) Merr.) is the fourth largest commodity crop in the world in terms of area harvested and production. The anticipated increase in demand for soybean over the next 20 years is expected to result in 50% expansion of soybean cultivated land area [17]. Sudden death syndrome (SDS) was among the top yield-suppressing diseases of soybean in the United States from 1996 to 2007 [18],

Study Area and Experimental Design
Field experiments were conducted during April to October 2016 at the University of Wisconsin Arlington Agriculture Research Station (43 • 32 7 W). Soils at Arlington are Plano fine-silty, mixed, super active, mesic Typic Argiudoll, and Hancock are Plainfield sand, mixed, mesic Typic Udipsamment [23]. The Hancock site was irrigated to mitigate water stress while the Arlington site was not. No fertilization treatments were applied. The Arlington site included two adjacent experiments, one used for the canopy level (proximal) analyses and the second for the leaf measurements used for comparison, with the leaf study replicated at Hancock. The canopy level experiment was a split-split-plot arrangement with four replications arranged in completely randomized blocks, with three planting dates (Table 1) resulting in non-uniform development stages for most of the measuring dates. Fields were planted with two soybean cultivars (i.e., RS213NR2 and AG2136) and three seed treatments (i.e., untreated control, Poncho ® /VOTiVO ® , and Poncho ® /VOTiVO ® /ILeVO ® ) as detailed by Vosberg et al. [23] at a uniform seed density of 346,000 seeds·ha −1 . Each sub-subplot consisted of two inoculation treatments (i.e., inoculated with Fv and untreated control) and encompassed two rows 7.6 m long spaced 76 cm apart. The inoculation treatment in the sub-subplot was the experimental unit and from here on will be designated as "plot". The total number of plots was 144. The rows were in a north-south orientation and each row had eight plots along it, with 18 plots in the east-west orientation. There were eight alleys (four rows wide each) for the tractor mounted with Piccolo Doppio. To minimize border effects, alleys and borders and any side of the experiment that did not have soy plants next to it and the rest of the field was planted with filler soybean plants cultivar Channel 0906R2 with seed density of 346,000 seeds·ha −1 . Each of the two leaf level experiments (Arlington and Hancock) had four rows 7.6 m long spaced 76 cm apart. Two of the rows were inoculated [23] with Fv and two rows were not inoculated and defined as untreated control. Both sites were planted with AG2136 cultivar and uniform seed density of 346,000 seeds·ha −1 . A total of 500 soybean plants were tagged to allow repeated measurements through the growing season, with 125 plants per inoculation treatment per site.

Canopy Spectral Data Collection and Processing
The canopy spectral data were obtained through the growing season (Table 1) by a tractor mounted Piccolo Doppio dual field-of-view two-spectrometer system [41] equipped with two spectrometers: Flame and NIRQuest (Ocean Optics, Inc., Dunedin, FL, USA). The Flame had a spectral range of 340 to 1022 nm with an optical resolution of 1.33 nm full-width half-maximum (FWHM) interpolated to 1 nm spacing. The range of 400 to 900 nm was used due to noisy data at the longest and shortest wavelengths. The NIRQuest has a spectral range of 900 to 1713 nm with an optical resolution of 6.6 nm FWHM interpolated to 1 nm resolution, with 900 to 1630 nm used due to noisy data at the longest wavelengths. The ability to obtain upwelling and downwelling data almost simultaneously derives from the use of two fiber optics with diameters of 400 µm and 600 µm, respectively, which together completely cover the 1 mm long and 25 µm wide inlet slit of the Flame spectrometer ( Figure 1). An identical set of fiber optics was used for the NIRQuest that had a 1 mm long and 100 µm wide inlet slit, also fully covered by the two fiber optics.
The upward and downward facing fibers were attached to a custom two-axis gimbal fitted onto the boom to maintain nadir orientation. As detailed by [41], the downwelling (upward looking) fiber viewed the sky through a cosine corrected polytetrafluoroethylene fore-optic and glass dome, while the upwelling (downward looking) bare fiber (view angle was 25 degrees) was fit with only an optical glass dome (Figure 1). Each optic was fitted with a shutter to obtain data via one of the fibers at a time. Synchronizing shutter activity and integration times of the spectrometer was controlled by a Raspberry Pi (Raspberry Pi Foundation, Cambridge, UK) single-board computer [41]. The Raspberry Pi time-stamped each spectral measurement. For spectra geo-referencing location and time every second, a GPS receiver (Geo 7x, Trimble, Sunnyvale, CA, USA) in the tractor's cabin connected to a GNSS antenna mounted on the boom, behind the shutter tube ( Figure 1). The GPS data were differentially corrected in Pathfinder software (Trimble Inc., Sunnyvale, CA, USA) using the reference base UNAVCO, Iowa County, WI located at 42 • 54 51 N, 90 • 14 53 W, 294 m above sea level. The estimated accuracy ranged from 5 to 15 cm for 99.96% of the corrected positions. Spectral data were collected within two hours of solar noon from a single row per pass with a total of two passes (i.e., two rows) per plot and collection date. The upwelling fiber was centered on the soybean row 25 cm above the top of the canopy resulting in~12 cm diameter at top of canopy. The system was on for at least 40 min prior to starting measurements. The Flame integration times were 5 and 15 ms and the NIRQuest integration times were 700 and 1000 ms for downwelling and upwelling, respectively. The integration times were set based on visual examination of the obtained spectral reflectance to maximize the signal while avoiding saturation under variable sky conditions (defined as 75% of maximum DN count). To minimize the time gap between upwelling and downwelling data collection and between spectrometers, the eight-measurement sequence was: (1) Flame upwelling dark current; (2) NIRQuest upwelling dark current; (3) upwelling radiance by Flame; (4) upwelling radiance by NIRQuest; (5)  Pi time-stamped each spectral measurement. For spectra geo-referencing location and time every second, a GPS receiver (Geo 7x, Trimble, Sunnyvale, CA, USA) in the tractor's cabin connected to a GNSS antenna mounted on the boom, behind the shutter tube ( Figure 1). The GPS data were differentially corrected in Pathfinder software (Trimble Inc., Sunnyvale, CA, USA) using the reference base UNAVCO, Iowa County, WI located at 42°54′51′′N, 90°14′53′′W, 294 m above sea level. The estimated accuracy ranged from 5 to 15 cm for 99.96% of the corrected positions. Spectral data were collected within two hours of solar noon from a single row per pass with a total of two passes (i.e., two rows) per plot and collection date. The upwelling fiber was centered on the soybean row 25 cm above the top of the canopy resulting in ~12 cm diameter at top of canopy. The system was on for at least 40 min prior to starting measurements. The Flame integration times were 5 and 15 ms and the NIRQuest integration times were 700 and 1000 ms for downwelling and upwelling, respectively. The integration times were set based on visual examination of the obtained spectral reflectance to maximize the signal while avoiding saturation under variable sky conditions (defined as 75% of maximum DN count). To minimize the time gap between upwelling and downwelling data collection and between spectrometers, the eight-measurement sequence was: (1) Flame upwelling dark current;    ArcGIS 10.3 (Environmental System Research Institute, Redlands, CA, USA) was used to spatially relate each 1 s GPS time stamp to a plot relying on plots polygon shape file. Time stamps with distances less than 0.22 m from the border of the plot were excluded to assure that both Flame and NIRQuest obtained targets within the experimental plot setup. Relative reflectance values for each plot were obtained using a custom Python script (https://github.com/prabu-github/tracolo). Dark current was subtracted from the upwelling radiance and downwelling irradiance spectra, and then the radiance was divided by the irradiance, while accounting for fiber diameter, integration time and field of view: where R is relative reflectance; DN is the spectral data of one sample (vector) and the units are digital number; dark stands for dark current spectral data; up and do stands for upwelling and downwelling, respectively; IT stands for integration time in ms; r stands for the radius of the fiber optic in µm; and FOV stands for field of view of the fiber optic in degrees. The process presented in Equation (1) was applied for spectral data from each of the spectrometers separately. The 1 nm relative reflectance data were averaged per plot, smoothed by using the Savitzky and Golay [44] filter with a 5 nm window and polynomial degree of 2 for the 1 nm resolution, then, the reflectance were resampled to 5 nm resolution for each of the spectrometers separately. Resampling resulted for the Flame in reflectance values for 101 bands ranging from 400 to 900 nm and for the NIRQuest in reflectance values for 147 bands ranging from 900 to 1630 nm. Spectral data, from each of the spectrometers, were then vector normalized by instrument to minimize the disparity in intensity levels [45]. The vector normalized spectra from the same plot and measuring date of both spectrometers were merged at 900 nm. Merging was performed by subtracting the NIRQuest reflectance value in 900 nm from the averaged reflectance of 880-900 nm and adding this scalar to the normalized reflectance values of all NIRQuest wavelengths, resulting in merged spectra with 247 bands in 5 nm resolution in the range of 400 to 1630 nm.

Leaf Spectral Data Collection and Processing
The leaf spectral data were acquired in vivo throughout the growing season (Table 1) using an ASD FieldSpec3 full-range (350 to 2500 nm) spectrometer with leaf clip (Analytical Spectral Devices, Boulder, CO, USA) and a tungsten halogen light source. The spectrometer and light source were on for at least 40 min prior to measuring and white reference measurements were obtained every 10 min using a spectralon panel. The adaxial side of the terminal leaflet of each trifoliate leaf was measured three times: right side, left side and far from the petiole at the center (where the leaf vein is relatively thin), with each measurement consisting of an average of 10 spectra. For the vegetative stages, the newest fully developed leaf was measured, until 5 July 2016. During the reproductive stages that followed, the same terminal leaflet was tagged and remeasured (Table 1) each date, until 8 August 2016. The newest fully developed leaf was measured again in the last measuring date for each of the sites because of damage to the relatively old leaves. In case the middle leaflet was missing or damaged, one of the other two leaflets was measured. To avoid time of day biasing in sampling, measurements were alternated between rows of different treatments. Near the end of the growing season, outburst natural outbreak of SDS occurred in hundreds of plants~100 m north of the Arlington experiment, in the filler of the same field, and were also measured to get example spectra of symptomatic plants (Table 1), including both chlorotic and green areas on the leaflet. These leaves measured at the top of the canopy. Leaf spectral data were processed using the SpecDAL python package (https://github.com/EnSpec/SpecDAL), with the final data set using 400 to 2500 nm at 5 nm spacing.

Root Samples for Quantitative Real-Time Polymerase Chain Reaction (qPCR)
Root samples were collected from the leaf level experimental plots to determine the absolute abundance of Fv (Table 1). Root samples were separated from the plant, washed, and dried at 65 • C for at least 48 h. Samples were homogenized to pass through a 0.5 mm sieve and stored at 4 • C until DNA extraction. DNA was extracted, quantified and qPCR was conducted to obtain absolute quantity of Fv DNA in root samples, resulting in Fv DNA (pg) per root DNA (ng) as conducted by Wang et al. [31]. A total of 79 root samples analyzed included: 16 Fv-inoculated samples from Arlington with, 20 control samples from Arlington, 6 samples from Arlington fill, 21 Fv-inoculated samples from Hancock, and 16 control samples from Hancock.

Normalized Difference Spectral Indices
Normalized difference spectral index (NDSI) is the normalized difference of all possible two-band combinations in a defined spectral range (Equation (2); [46]). The NDSIs were calculated to assess the relation between two bands combinations of canopy spectral data to discriminate inoculation status and to assess Fv root abundance by leaf spectral data. NDSI values can range from −1 to 1 while the normalization is effective in standardizing the spectral response to observed targets [46].
where R is reflectance in the subscripted wavelength i or j that represents all possible wavelengths between 400-1630 nm for the Piccolo (247 bands at 5 nm resolution; 30,381 combinations) and 400-2500 nm for the ASD (421 bands; 88,410 combinations). Regression was used to identify the NDSI's best correlated with Fv root abundance, with the coefficient of determination (R 2 ) and root mean square error (RMSE) used as quality metrics. The t-Test was used to identify NDSI's that discriminated inoculation status (i.e., 0 for untreated control and 1 for inoculated). The analyses were conducted in R 3.4.1.

Partial Least Squares
Partial least squares regression (PLSR; [47][48][49]) was applied to assess the ability of canopy spectral data to predict soybean seed yield, and to assess Fv abundance based on qPCR of soybean root powder. PLS discriminant analysis (PLSDA; [50]) was implemented to classify Fv inoculation status of soybean plants on each of the canopy and leaf level by measurement date and site.
The canopy data had four replicates, one of them was set aside for independent validation. The samples from the other three replicates were internally cross-validated on 100 permutations of the data, using a random 70%/30% split, resulting with 100 calibration and cross-validation models. These 100 calibration models were implemented on the set aside independent validation data resulting in 100 independent validations. The calibration, cross-validation and independent validation results were each presenting averaged quality evaluation means. For the PLSR models the quality evaluation was by R 2 and RMSE were obtained to evaluate the models. For the PLSDA models the quality evaluation was by full confusion matrices, total accuracies and Cohen's Kappa. For both model types, PLSR and PLSDA, the accuracy of quality evaluators was assessed by standard deviation, over the 100 permutations. The leaf data had 25% of the samples randomly set aside for independent validation. The rest of the samples were internally cross-validated on 100 permutations of the data, using a random 70%/30% split, resulting with 100 calibration and cross-validation models. These 100 calibration models were implemented on the set aside independent validation data resulting in 100 independent validations. The calibration, cross-validation and independent validation results were each presenting averaged quality evaluation means and their standard deviation as done for the canopy data. The only exception is the PLSR, leaf data, analysis for Fv root content that did not have independent validation. The standardized coefficients and variable importance in projection (VIP; [48]) statistic of the PLSDA and PLSR models were to determine the relative importance of different wavelength regions to prediction, and as a basis to interpret chemical and physical traits driving the empirical models. Canopy symptoms were identified for the first time on 10 August and a total of 25 of the 144 plots had visual symptoms during the rest of the season. Therefore, canopy level PLSDA models for 15, 22 August, and 2 September were calibrated, cross-validated and validated exclusively using non-canopy symptomatic plots. For the PLSDA models in cases where models performed similarly for the same (or nearly the same) growth stage, data were merged (leaf and canopy were not mixed) and analyzed to test whether general predictive models could be obtained. For all classification analyses using PLSDA we also implemented randomized models in which the classification variable was assigned randomly to the actual spectra. Classification were run on these with 100 permutations and in no cases, did the accuracies of these randomized datasets generate spurious accurate classifications (data not shown). All PLSR and PLSDA analyses were conducted in R 3.4.1. [51].

Spectral Identification of Inoculation-Canopy Level
Data from all canopy level measuring dates (Table 1) were analyzed per date as well as pooled in several combinations. The relevant dates and combinations are presented in Table 2. The confusion matrices indicate reasonable and consistent performance across all treatments, suggesting stable spectral models. The full confusion matrix for all canopy measuring dates in Table 1 and models in Table 2 are in Supplementary Materials S1. The validation total accuracies seasonal trend is discussed in Section 3.3. The best PLSDA classification of inoculation status overall was for plants during all vegetative stages resulting in 77% validation total accuracy ( Table 2). The top PLSDA classification of inoculation status for specific dates was based on canopy spectra utilized data from 18 July and 26 July both prior to appearance of canopy symptoms, producing 82% validation accuracy ( Table 2). Of note, the two models (18 and 26 July) show very similar patterns ( Figure 2), with the red-edge, in 745 and 750 nm in particular having similar VIP scores ( Figure 3). The model presenting the data from 18 and 26 July together (Table 2) shows similar validation total accuracy of 75% and similar standardized coefficients as well as VIP scores (not presented). By the quality of classification, the best stages to obtain spectral data for inoculation analysis are late vegetative and early reproductive.
Averaged canopy spectra for inoculated and control plants ( Figure 2) show higher reflectance for inoculated plants starting around 860 nm and to the longer wavelengths, especially in the 1400 nm water absorption band. This is related to lower water content in the inoculated plants. From 750-860 nm, the inoculation treatment generally has lower reflectance values which we attribute to a smaller, less green, canopy caused by Fv [52]. The standardized PLSDA coefficients (Figure 2) corroborate the importance of red-edge and near infrared (NIR) wavelengths (750-900 nm) to the prediction of inoculation status, also evident in the VIP statistic ( Figure 3). In support with the NIR importance, NDSI t-Test of all 2-band combinations between inoculated and control treatments was applied for spectral data from all dates together (Table 1). The highest t values, significant at p < 0.05, were for band combinations using the NIR at wavelengths between 755 to 1100 nm (summarized in supplemental Figure S2). The NIR importance was expected, at the canopy level, the NIR is influenced primarily by cumulative plant biomass [53] and Fv root infection was negatively affecting root and aboveground biomass development [52,54]. The variability in the standardized coefficients indicates the importance of narrow spectral sampling. At the canopy level, the red-edge provides indication of plant condition that might be related to a variety of traits such as chlorophyll content, water content, leaf area index, seasonal patterns and canopy biomass [55][56][57][58]. The averaged spectrum of non-symptomatic plants in inoculated plots show higher relative reflectance values in the chlorophyll absorption band [59] which is explained to a lesser extent by chlorophyll content as a result of chlorophyll reduction by root infection and possibly phytotoxins in the canopy [27,60]. The vegetative model might be the correct stage, with multiple measuring dates, to assess Fv infection if the pathogen is already established and affecting the root and canopy systems and other canopy influences (e.g., disease, insect damage and senescence) have lesser chance to reduce accuracy of the model than in later stages.   Table 2 and Figure 2. VIP values greater than a threshold = 1 are considered meaningful.  Table 2.

Spectral Identification of Inoculation-Leaf Level
Data from all leaf level measuring dates and both sites (Table 1) were analyzed to classify inoculated and control treatments per date and site, as well as pooled across several combinations. The confusion matrices indicate reasonable and consistent performance across treatments, suggesting stable spectral models. The full confusion matrix for all leaf measuring dates in Table 1 and models  in Table 3 are in Supplementary Materials S3. PLSDA models based on leaf level data were able to discriminate inoculated plants with high level of accuracy on individual dates across sites starting at late vegetative stages to the end of the season in Arlington and to mid-development stages in Hancock. The validation total accuracies seasonal trend is discussed in Section 3.3. Figure 2. Averaged canopy spectral reflectance and PLSDA standardized coefficients per inoculation status: entire spectral range (a-c) and zoom to 750 to 1000 nm (d-f) for models with highest accuracies in Table 2.   (Figure 4a,c). Therefore, data of the same development stages from the two sites were pooled together and analyzed, resulting in total accuracies of~50% to classify the two Fv inoculation status (data not presented), likely because of fundamental differences between the two field sites that affected overall phenotype of the plants. For example, water and soil dynamics are dramatically different between Hancock (sandy soils, therefore, irrigated) and Arlington (fine silty and not irrigated). To test this, we used PLSDA to discriminate between plants at Arlington and Hancock, resulting in total accuracies that are all higher than 91% for calibration, cross-validation and validation models. The important VIP bands are mainly in the visible, red-edge and SWIR water absorptions, showing the influence of environmental conditions including Fv abundance in soybean roots (full confusion matrices and VIP values in Supplementary Materials S4). As such, under different environmental conditions, phenotypic variation within the same genotype can obscure the ability to implement a general algorithm for disease detection [61,62]. Some of the phenotypic variation in spectral response and resulting equations likely also related to the Hancock plants are irrigated while Arlington plants are rain fed. Although there is no visual change in reflectance values in 670 nm, chlorophyll reduction by phytotoxins translocated to the canopy [27] or by root infection affecting the plant health [52] may explain the difference between the reflectance values of the inoculation status around 550 nm and of the slope between 550 to 670 nm. Gitelson and Merzlyak [59] reported higher reflectance in the 550 to 670 nm slope for minor chlorophyll reduction and changes in 670 nm reflectance only after relatively high changes in chlorophyll. These distinctions in the visible wavelengths were apparent despite no visual SDS canopy symptoms present at the time of sampling, and no visual differences between the treatments in the field. Examining the presented models in Figure 4e, the visible blue and red and red-edge spectral regions were most important in all the cases. The water bands at 1400, 1900 and 2500 nm also show minor relevance. The leaf level analyses in general result in different models by date and study site, using VIP and standardized coefficients as diagnostics. The models show general similarities in the regions that are important, and the analyses indicate sensitivity of leaf spectra to Fv inoculation status. In order to obtain the highest quality of classification there is a need to analyze specific and sequential development stages for specific site induced environmental conditions. One of the differences between sites is also Fv natural abundance. values of the inoculation status around 550 nm and of the slope between 550 to 670 nm. Gitelson and Merzlyak [59] reported higher reflectance in the 550 to 670 nm slope for minor chlorophyll reduction and changes in 670 nm reflectance only after relatively high changes in chlorophyll. These distinctions in the visible wavelengths were apparent despite no visual SDS canopy symptoms present at the time of sampling, and no visual differences between the treatments in the field. Examining the presented models in Figure 4e, the visible blue and red and red-edge spectral regions were most important in all the cases. The water bands at 1400, 1900 and 2500 nm also show minor relevance. The leaf level analyses in general result in different models by date and study site, using VIP and standardized coefficients as diagnostics. The models show general similarities in the regions that are important, and the analyses indicate sensitivity of leaf spectra to Fv inoculation status. In order to obtain the highest quality of classification there is a need to analyze specific and sequential development stages for specific site induced environmental conditions. One of the differences between sites is also Fv natural abundance.  Table 3 (e).

Spectral Assessment of Fv Abundance in Root by Leaf Measurement
We further analyzed the qPCR results to assess whether disease abundance in the root system expressed a range of response in foliar spectra, using PLSR. Spectra appear to show some sensitivity to degree of infection, with a cross-validation R 2 of 0.47 and RMSE of 3.2 pg/ng (calibration R 2 of 0.53 and RMSE of 2.8 pg/ng; Figure 5a). The highest standardized PLSR coefficients of the model ( Figure   Figure 4. The partial least squares discriminant analysis (PLSDA) standardized coefficient and averaged spectra of inoculated and control for Arlington late vegetative model (a) and zooming to visible range (b); Hancock late vegetative model (c) and zooming to visible range (d). Variable importance in projection (VIP) values obtained from PLSDA classification models for some of the models in Table 3 (e).

Spectral Assessment of Fv Abundance in Root by Leaf Measurement
We further analyzed the qPCR results to assess whether disease abundance in the root system expressed a range of response in foliar spectra, using PLSR. Spectra appear to show some sensitivity to degree of infection, with a cross-validation R 2 of 0.47 and RMSE of 3.2 pg/ng (calibration R 2 of 0.53 and RMSE of 2.8 pg/ng; Figure 5a). The highest standardized PLSR coefficients of the model (Figure 5b) were in the visible wavelengths, peaking around the chlorophyll absorption band at 670 nm [59]. The standardized coefficients in the visible range were mainly positive, indicating that reflectance in the red wavelengths correlated positively with Fv abundance, meaning higher reflectance values are related to lower chlorophyll content and higher Fv in roots. The NDSI analysis (Figure 5e) identified the index with the highest relation to Fv root abundance at 690 and 1390 nm, wavelengths strongly influenced by chlorophyll and water absorption [63], respectively. The differences in Fv abundance between Arlington and Hancock (Figure 5d) are suggested to be partly related to a larger difference between the inoculated and control averaged spectra for Arlington than for Hancock (Figure 4b,d) as seen in the 550 to 670 nm slope. The area of the natural outbreak of Fv 100 m from our experimental plots had much higher qPCR detected disease in the roots (Figure 5c), so it is likely that larger ranges of infection than shown in our experiment would yield even greater spectral responses due to effects on foliage. Of note, the presence of a natural infection near our experiment makes it possible that Fv infection was present at a low level in the control plants (see Figure 5c), which makes our results more remarkable that were able to discriminate and validate treatments. between the inoculated and control averaged spectra for Arlington than for Hancock (Figure 4b,d) as seen in the 550 to 670 nm slope. The area of the natural outbreak of Fv 100 m from our experimental plots had much higher qPCR detected disease in the roots (Figure 5c), so it is likely that larger ranges of infection than shown in our experiment would yield even greater spectral responses due to effects on foliage. Of note, the presence of a natural infection near our experiment makes it possible that Fv infection was present at a low level in the control plants (see Figure 5c), which makes our results more remarkable that were able to discriminate and validate treatments.
To the best of our knowledge this is the first study showing quantitative assessment of Fv abundance in soybean roots by leaf full range (400 to 2500 nm) hyperspectral data. This is one of the few studies to detect Fv infection via foliage measurements prior to visual sudden death syndrome foliar symptoms [60,64]. Roth et al. [60] demonstrated reduced chlorophyll content in soybean leaves prior to visual sudden death syndrome symptoms, using the newly developed MultispeQ sensor [64], which provides credence to our findings. Using hyperspectral sensor and MultispeQ data together, there is the potential to improve the ability to detect Fv abundance prior to visual symptoms.

Piccolo Performance
The ability to spectrally discriminate between Fv inoculated and control soybean plants varies during the growing season ( Figure 6). However, the time of season with the best discrimination quality is consistent. The season starts with low discrimination accuracy likely because Fv had not To the best of our knowledge this is the first study showing quantitative assessment of Fv abundance in soybean roots by leaf full range (400 to 2500 nm) hyperspectral data. This is one of the few studies to detect Fv infection via foliage measurements prior to visual sudden death syndrome foliar symptoms [60,64]. Roth et al. [60] demonstrated reduced chlorophyll content in soybean leaves prior to visual sudden death syndrome symptoms, using the newly developed MultispeQ sensor [64], which provides credence to our findings. Using hyperspectral sensor and MultispeQ data together, there is the potential to improve the ability to detect Fv abundance prior to visual symptoms.

Piccolo Performance
The ability to spectrally discriminate between Fv inoculated and control soybean plants varies during the growing season ( Figure 6). However, the time of season with the best discrimination quality is consistent. The season starts with low discrimination accuracy likely because Fv had not yet established itself on the root system sufficiently to affect detectable aboveground properties [52]. As Fv establishes itself in the root system, its effect becomes detectable, perhaps at the leaf level a bit sooner than the canopy. Later in the season in late reproductive stages the total accuracies of Arlington canopy and Hancock leaf plots drop, as the Fv effect on the canopy is masked by processes affecting both control and inoculated plants, such as general senescence, other diseases, pests or physical damage to the foliage. Interestingly, the Arlington canopy models excluding plots with canopy symptoms had higher accuracies than the models including symptomatic plots. This supports the ability to discriminate prior to visual canopy symptoms using NIR bands (Figures 2 and 3), and due to the differences in the models that this capacity for detection is a consequence of the non-visual cues. Processes affecting both control and inoculated plants can affect the leaf level as well. For example, the Hancock leaf plot suffered from white mold (Sclerotinia sclerotiorum) infection starting late in the season, while the Arlington leaf plot maintained relatively high total accuracies through the last measuring date. As well, the qPCR analyses showed Fv abundance significantly higher for the Arlington compared to Hancock plot (Figure 5d), suggesting that it was much less likely that the Fv effect was not masked and, thus, was spectrally detected at Arlington in late reproductive stage. Both measurement strategies, leaf and canopy, have resulted in high quality of discrimination between inoculated and control plots. Discrimination by each strategy was based on different biophysical properties and captured by different spectral features. The canopy level captures differences in above ground canopy biomass. The leaf level is capturing the non-visual differences in pigmentation that are not detected by canopy level data. Both strategies capture some relative importance of bands related to water content. Leaf level measurements can readily be used for specific locations or to predict level of root Fv infection, while the Piccolo is spectrally suitable for Fv infection detection in field and being tractor mounted made it a potential tool also for other precision agriculture applications.
The major effect of disease on crops is reduced yield, so we tested whether the spectral data also could predict the concomitant effects of Fv on yield. Seed yield ranged from 2.02 to 5.21 t·ha −1 , with the average seed yield of the inoculated (3.68 t·ha −1 ) 6% smaller than control (3.91 t·ha −1 ) treatment (statistically significant at p < 0.05; [23]). Canopy spectral data from each of the measurement dates were used to calibrate, cross-validate and validate PLSR models of seed yield (Supplementary Materials S5), with the best model based on the 15 August data having a reasonable R 2 values of 0.71, 0.59 and 0.62 and RMSE of 0.35, 0.41 and 0.31 t·ha −1 for calibration, cross-validation and validation, respectively (Figure 7a). The important spectral wavelengths for prediction were at the red-edge (Figure 7b,c), supporting observations by Christenson [65]. The red-edge was also important for predicting inoculation status, but the NIR bands were more prominent for disease detection, whereas yield prediction was driven largely by red-edge reflectance (Figure 7b,c). There is a long history of field spectroscopy use for yield detection. In soy, Ma et al. [66] used a handheld multispectral spectrometer to correlate vegetation indices to soybean seed yield, with R 2 from 0.44 to 0.80 on a range of yield 1 t·ha −1 to > 4 t·ha −1 (but no cross-validation or validation were applied, nor RMSE reported). Yu et al. [67], resulted in correlation coefficient (r) of 0.82 for airborne multispectral imagery data and soybean seed yield correlation. Christenson [65] obtained canopy spectral data and resulted in R 2 of 0.58 with RMSE of 0.7 t·ha −1 for a yield ranging from less than 1 t·ha −1 to > 5 t·ha −1 . The smallest RMSE of validation obtained for the best PLSR model (Supplementary Materials S5) in the current study is 0.31 t·ha −1 less than half of presented by Christenson [65]. These and other studies show that spectral data are effective for reliable and rapid yield assessments, with reasonable results here in the context of disease infestation. The tractor mounted Piccolo system provides the spectral range needed to feasibly detect Fv root infection and predict seed yield. example, the Hancock leaf plot suffered from white mold (Sclerotinia sclerotiorum) infection starting late in the season, while the Arlington leaf plot maintained relatively high total accuracies through the last measuring date. As well, the qPCR analyses showed Fv abundance significantly higher for the Arlington compared to Hancock plot (Figure 5d), suggesting that it was much less likely that the Fv effect was not masked and, thus, was spectrally detected at Arlington in late reproductive stage. Both measurement strategies, leaf and canopy, have resulted in high quality of discrimination between inoculated and control plots. Discrimination by each strategy was based on different biophysical properties and captured by different spectral features. The canopy level captures differences in above ground canopy biomass. The leaf level is capturing the non-visual differences in pigmentation that are not detected by canopy level data. Both strategies capture some relative importance of bands related to water content. Leaf level measurements can readily be used for specific locations or to predict level of root Fv infection, while the Piccolo is spectrally suitable for Fv infection detection in field and being tractor mounted made it a potential tool also for other precision agriculture applications. The major effect of disease on crops is reduced yield, so we tested whether the spectral data also could predict the concomitant effects of Fv on yield. Seed yield ranged from 2.02 to 5.21 t·ha −1 , with the average seed yield of the inoculated (3.68 t·ha −1 ) 6% smaller than control (3.91 t·ha −1 ) treatment (statistically significant at p < 0.05; [23]). Canopy spectral data from each of the measurement dates were used to calibrate, cross-validate and validate PLSR models of seed yield (Supplementary Materials S5), with the best model based on the 15 August data having a reasonable R 2 values of 0.71, 0.59 and 0.62 and RMSE of 0.35, 0.41 and 0.31 t·ha −1 for calibration, cross-validation and validation, respectively (Figure 7a). The important spectral wavelengths for prediction were at the red-edge (Figure 7b,c), supporting observations by Christenson [65]. The red-edge was also important for predicting inoculation status, but the NIR bands were more prominent for disease detection, whereas yield prediction was driven largely by red-edge reflectance (Figure 7b,c). There is a long history of field spectroscopy use for yield detection. In soy, Ma et al. [66] used a handheld multispectral spectrometer to correlate vegetation indices to soybean seed yield, with R 2 from 0.44 to 0.80 on a range of yield 1 t·ha −1 to > 4 t·ha −1 (but no cross-validation or validation were applied, nor RMSE reported). Yu et al. [67], resulted in correlation coefficient (r) of 0.82 for airborne multispectral imagery data and soybean seed yield correlation. Christenson [65] obtained canopy spectral data and resulted in R 2 of 0.58 with RMSE of 0.7 t·ha −1 for a yield ranging from less than 1 t·ha −1 to > 5 t·ha −1 . The smallest RMSE of validation obtained for the best PLSR model (Supplementary Materials S5) in the current study is 0.31 t·ha −1 less than half of presented by Christenson [65]. These and other studies show that spectral data are effective for reliable and rapid yield assessments, with reasonable results here in the context of disease infestation. The tractor mounted Piccolo system provides the spectral range needed to feasibly detect Fv root infection and predict seed yield. There is some remaining noise in canopy spectral data (Figures 2 and 4b) the effect it has on the PLS models is visual in the VIP data (Figures 3 and 4c). This effect is assumed not to interfere with the importance of the spectral regions of the discrimination and prediction models but to have more of an effect on the importance of specific narrow bands as for the NDSI selection. Therefore, for future reflectance data preprocessing, the Whittaker smoother [68,69] will be explored for its build in filter optimization. It is expected that the Whittaker smoothing algorithm will improve analyses accuracy.
The Piccolo system provided an efficient method for the collection of plant reflectance by eliminating the need for frequent reference standard measurements. The ability to obtain relative reflectance under changing atmospheric conditions makes the Piccolo system a reasonable approach for field sampling, especially in a research setting. The system utilized in this study collected There is some remaining noise in canopy spectral data (Figures 2 and 4b) the effect it has on the PLS models is visual in the VIP data (Figures 3 and 4c). This effect is assumed not to interfere with the importance of the spectral regions of the discrimination and prediction models but to have more of an effect on the importance of specific narrow bands as for the NDSI selection. Therefore, for future reflectance data preprocessing, the Whittaker smoother [68,69] will be explored for its build in filter optimization. It is expected that the Whittaker smoothing algorithm will improve analyses accuracy.
The Piccolo system provided an efficient method for the collection of plant reflectance by eliminating the need for frequent reference standard measurements. The ability to obtain relative reflectance under changing atmospheric conditions makes the Piccolo system a reasonable approach for field sampling, especially in a research setting. The system utilized in this study collected reflectance from a single row during each pass; additional efforts to streamline the system for more rapid data acquisition would be required to make this practical in a commercial application. For example, future modifications and deploying more than one system would enable the measurement of plant reflectance from a larger area such as the entire width of a ground sprayer. Alternately, deployment from an unmanned aerial vehicle would also expand the utility of the system. Simultaneous acquisition from both spectrometers would also reduce the minimum time between data acquisition.

Conclusions
Crop phenotyping is a bottleneck in plant research, and rapid methods to assess disease, yield and other traits for large numbers of samples are needed. Seed yield is one of the most critical traits that can be spectrally assessed [2,70], although-unlike disease which is detectable due to physiological effects on foliage-detection of yield is based on indirect correlations with physiology. Our study showed promising results for use of spectral sensing of the canopy and leaves to identify Fv infection on soybean roots where no canopy symptoms are visually present. To the best of our knowledge this is the first publication that documents ability to discriminate Fv-inoculated soybean plants from control plants by visible, NIR and SWIR hyperspectral data in canopy and leaf level prior to canopy symptoms. We conclude:

•
Fv inoculation that accrues in the roots can be spectrally detected in the soybean foliage at the canopy and leaf scales as demonstrated by distinguishing between inoculated and control plots and plants, respectively. • Early reproductive stage is the recommended timing for canopy level measurements to distinguish between inoculated and control plots. • Late vegetative and early reproductive stages are the recommended timings to distinguish between inoculated and control plants at the leaf level. • Fv abundance in soybean roots can be spectrally assessed by leaf hyperspectral data.

•
The dual field-of-view system produced canopy spectral data resulting in our ability to spectrally distinguish between Fv inoculation treatments and assess seed yield, thus, showing feasibility of operation for precision agriculture research and potential commercial applications.
For future work, qPCR samples should be collected over the course of the growing season to provide a better understanding of the spectral discrimination over time.