Comparison of Spectral Reﬂectance-Based Smart Farming Tools and a Conventional Approach to Determine Herbage Mass and Grass Quality on Farm

: The analysis of multispectral imagery (MSI) acquired by unmanned aerial vehicles (UAVs) and mobile near-infrared reﬂectance spectroscopy (NIRS) used on-site has become increasingly promising for timely assessments of grassland to support farm management. However, a major challenge of these methods is their calibration, given the large spatiotemporal variability of grassland. This study evaluated the performance of two smart farming tools in determining fresh herbage mass and grass quality (dry matter, crude protein, and structural carbohydrates): an analysis model for MSI (GrassQ) and a portable on-site NIRS (HarvestLab TM 3000). We compared them to conventional look-up tables used by farmers. Surveys were undertaken on 18 multi-species grasslands located on six farms in Switzerland throughout the vegetation period in 2018. The sampled plots represented two phenological growth stages, corresponding to an age of two weeks and four to six weeks, respectively. We found that neither the performance of the smart farming tools nor the performance of the conventional approach were satisfactory for use on multi-species grasslands. The MSI-model performed poorly, with relative errors of 99.7% and 33.2% of the laboratory analyses for herbage mass and crude protein, respectively. The errors of the MSI-model were indicated to be mainly caused by grassland and environmental characteristics that di ﬀ er from the relatively narrow Irish calibration dataset. The On-site NIRS showed comparable performance to the conventional Look-up Tables in determining crude protein and structural carbohydrates (error ≤ 22.2%). However, we identiﬁed that the On-site NIRS determined undried herbage quality with a systematic and correctable error. After corrections, its performance was better than the conventional approach, indicating a great potential of the On-site NIRS for decision support on grazing and harvest scheduling.


Introduction
For many years, indoor housing has been preferred by dairy producers because feeding and caretaking on an individual animal level was simpler and more labor-saving compared to grazing-based systems [1,2]. Currently, due to modern technology that addresses both labor requirements and of the plant material, its particle size, and the moisture content affect the light reflectance [19,21]. The spectral signature of water can mask signals, which are important for the quantification of dry matter quality parameters, such as crude protein concentration [15,24].
Methods that are based on remote sensing are recent alternatives to NIRS for estimating herbage mass and quality of grassland [25][26][27][28]. Most of these methods use multispectral imagery collected from satellites or unmanned aerial vehicles [26,29]. Analyzing grasslands with remote sensing is particularly promising because it is non-destructive, can potentially capture large areas, and resolves spatial variability typically better than methods based on point sampling. Unfortunately, multispectral imagery only captures the surface of the vegetation with limited information on the subjacent material. Estimating grassland properties from multispectral imagery might, therefore, be error-prone, particularly for high-biomass swards [26].
Both NIRS and multispectral imaging methods indirectly determine herbage quantity and quality and require calibration against a reference method, such as oven-drying or wet chemical analysis. However, the calibrations need to cover a large heterogeneity in grassland types and a large variability of grassland conditions during the vegetation period to establish high accuracy. If calibrations are applied to grassland types and conditions not included in the training data set, it is not always applicable to generalize the results.
The aim of this study was to evaluate two spectral-reflectance-based smart farming tools for determining herbage mass and quality of multi-species grasslands-a portable NIRS and a model to analyze multispectral imagery-and to compare them to the conventional approach of estimating herbage quality with look-up tables. Using grasslands of contrasting botanical composition and multiple harvests during the whole vegetation period, the accuracy of the tools by comparison with laboratory measurements was examined. Further, the relationships between the apparent errors and the grassland characteristics were investigated to identify potential causes. The smart farming tools were put into context with the conventional approaches for assessing grassland quality.

Grassland Study Sites
Herbage mass and quality parameters were measured on 18 grassland plots, which were distributed over six commercial farms in Switzerland. The studied grasslands are described in Figure 1 and Table S1. Prior to this study, the grasslands were used for intensive silage or hay production, with four to six defoliations per year and regular fertilization. With the exception of two grasslands, they were permanent grasslands that were sown less than five years ago (MT3 and SI2, Table S1). The plant communities comprised between 13 and 27 species (Table S1), with relative abundances of 6-75% ryegrasses, 13-59% other grasses, 1-15% clover species, and 2-48% herbs ( Figure 1). Both defoliation regimes were planned to be observed on five dates each within the vegetation period. Therefore, we aimed for a total number of 180 observations. However, due to a severe drought during summer and autumn, herbage growth was reduced, and three sampling dates had to be omitted at farms BL and BB. Hence, we obtained a total of 162 herbage samples.
All sampling dates were subsequently assigned to one of the four categories to study the effect of the growth period on herbage mass and quality: April to May (representing the first regrowth after winter dormancy), May to June (representing the second cut), June to August (representing the third and fourth cuts), and August to October (representing the fourth and fifth cuts).

Herbage Sampling
To sample the subplots with the four-to six-week old herbage, a double-knife motor mower was used to cut the herbage at 5 cm above-ground, and fresh herbage mass (HM) of 6.5 m 2 (central 1.3 m of the 2.2 m wide subplots × 5 m) was determined ( Figure 2). A representative sample was taken by stabbing a metal cylinder with a diameter of 5 cm into the pile of cut herbage.

Experimental Design
On each commercial farm, three plots were distributed spatially over the intensively managed grassland area of the farms. On each plot, measurements were performed on two different subplots with dimensions of 2.2 m × 5.0 m ( Figure 2). On the two subplots, we simulated two different defoliation regimes, which correspond to intensive grazing conditions and silage production. The first subplot, simulating grazing conditions, was defoliated every two to three weeks, at an early phenological stage. Sampling was conducted every second defoliation and always at an age of two weeks. The second subplot, simulating silage production, was defoliated every four to six weeks. The defoliation dates corresponded with the sampling dates. These dates were scheduled to meet the early heading phenological stage; thus, a more mature stage compared to the grazing simulation. The dates of defoliation, therefore, differed among farms according to the climatic conditions of the sites. Therefore, the two defoliation regimes are subsequently referred to as "two weeks of growth" for simulated grazing conditions and "four to six weeks of growth" for the silage production system. Both defoliation regimes were planned to be observed on five dates each within the vegetation period. Therefore, we aimed for a total number of 180 observations. However, due to a severe drought during summer and autumn, herbage growth was reduced, and three sampling dates had to be omitted at farms BL and BB. Hence, we obtained a total of 162 herbage samples. Both defoliation regimes were planned to be observed on five dates each within the vegetation period. Therefore, we aimed for a total number of 180 observations. However, due to a severe drought during summer and autumn, herbage growth was reduced, and three sampling dates had to be omitted at farms BL and BB. Hence, we obtained a total of 162 herbage samples.
All sampling dates were subsequently assigned to one of the four categories to study the effect of the growth period on herbage mass and quality: April to May (representing the first regrowth after winter dormancy), May to June (representing the second cut), June to August (representing the third and fourth cuts), and August to October (representing the fourth and fifth cuts).

Herbage Sampling
To sample the subplots with the four-to six-week old herbage, a double-knife motor mower was used to cut the herbage at 5 cm above-ground, and fresh herbage mass (HM) of 6.5 m 2 (central 1.3 m of the 2.2 m wide subplots × 5 m) was determined ( Figure 2). A representative sample was taken by stabbing a metal cylinder with a diameter of 5 cm into the pile of cut herbage.
For the first sampling of the two-week-grown herbage, the same method was applied as described above. For the subsequent sampling dates, a randomly selected 1 m 2 area of the subplot was sampled at 5 cm above-ground with an electronic hand shear (2 m 2 were sampled in case of very low HM). Fresh HM was determined, and the whole sample was used for further analyses.

Multispectral Imagery Model
The multispectral imagery model (MSI-model) was provided on the open access platform GrassQ (www.grassq.com). According to Murphy et al. [30], GrassQ is a holistic precision herbage measurement and analysis system that analyzes the reflectance data captured by unmanned aerial vehicles (UAVs) or satellites above grasslands on a farm parcel level in nearly real time. For this study, the platform was employed between 2 January 2019 and 2 July 2019. The MSI-model development was based on data of UAV surveys above field plots and grazed paddocks containing mainly perennial ryegrass and clover mixtures at Moorepark (Teagasc Research Centre, Fermoy, Cork, Ireland) on six days in 2017 and 2018 [31]. The calibration range for HM is 304.6 to 2435.7 kg dry matter (DM) ha −1 and for crude protein (CP) 126.3 to 247.3 g kg −1 DM. The models were derived by stepwise multiple linear regression analysis and are referred to as BM-5 and CPM-5 in Askari et al. [31].
In this study, we chose comparable flight characteristics and used the same sensor utilized by Askari et al. [31]. MSI was acquired two hours before cutting the subplots at maximum. We used a Parrot Sequoia multispectral camera (Parrot SA, Paris, France) that was mounted on a UAV (quadcopter, DJI Phantom 4 Pro+, DJI, Shenzhen, China). Both a sequoia sunshine sensor and a calibrated reference panel (Airinov, Paris, France) were employed for radiometric correction. Images were taken during autonomous flights over the experimental plots. The flight missions were planned in the field using the smart device App Pix4D Capture (Version 4.5.0, build 2348, Pix4D, Lausanne, Switzerland). The flight and imagery settings were as follows: flight height of 50 m, flight speed of 5 m s −1 , double grid mission, image capturing every 1.9 s, and image overlap of ≥ 80%. Spectral information was recorded for the green, red, red-edge, and near-infrared (NIR) bands that were centered at 550 nm, 660 nm, 735 nm, and 790 nm, respectively. The bandwidth was 40 nm, except for the red-edge, which had a bandwidth of 10 nm. The average ground sampling distance was 5 cm. Due to dense fog and rainy weather, images could not be taken at three sampling dates. Additional data points were not considered for analysis because of the low image quality, resulting in 15 excluded and 147 remaining MSI data points.
The raw MSI was processed with the photogrammetry software Pix4D Mapper with default settings and radiometric correction based on reference panel images (Version 4.3.31, Pix4D, Lausanne, Switzerland). The resulting reflectance maps were georeferenced using at least three of the eight ground control points at each experimental plot (Figure 2), which were measured using a real-time kinematic global navigation satellite system (RTK GNSS), Trimble R8, Sunnyvale, CA, USA). After georeferencing, the absolute horizontal accuracy of the reflectance maps was less than 6 cm. The reflectance maps of the four bands (green, red, red-edge, and NIR) were uploaded to the GrassQ platform (www.grassq.com), and the model that uses UAV-acquired MSI reflectance maps was executed (hereafter referred to as MSI-model). The CP results (as g kg −1 DM) and HM results (as kg DM ha −1 ) were given as mean values per subplot by the MSI-model, which we extracted by uploading the spatial boundary coordinates.

On-Site NIRS
The portable NIRS instrument (HarvestLab TM 3000, Deere & Company, Moline, IL, USA), hereafter referred to as On-site NIRS, can be operated as a mobile laboratory either in farm offices or car boots or mounted onto harvest machinery to analyze feed quality in the field and in nearly real time. The instrument consisted of a sensor body with the diode array spectrometer and a sampling unit with a rotating sampling dish above a halogen light source. The system operated with internal black and white references and measured wavelengths between 950 nm and 1530 nm at a spectral resolution of 3-2 nm. Selectable calibration models were available for different fodder crops. We evaluated the system for measuring fresh herbage quality using the calibration version 2017/21/03. The instrument was delivered and put into operation by a service technician approximately six weeks before the start of the study. The calibration models, therefore, were based on the reference methods described in VDLUFA [32] and were analog to the wet-chemical analyses in our study. During field sampling, fresh herbage was analyzed using the On-site NIRS in the open boot of a car. The measurement was performed immediately after herbage sampling as described above. Due to the missing manufacturer's specifications, a particle length of approximately 5 cm was chosen and obtained by stabbing with a metal cylinder. A representative part of the sampled herbage was filled in the instruments' sampling dish. Care was taken to assure that the sampling dish was clean and dry. The measurement was repeated three times with mixing of the subsample in between. Mean values of DM concentration and mean values of CP, crude fiber (CF), acid detergent fiber (ADF), and neutral detergent fiber (NDF) as concentrations of DM were recorded.

Conventional Method
Additional to the two smart farming tools, the herbage quality parameters (DM, CP, CF, ADF, and NDF) were determined by using look-up tables for herbage quality of multi-species grasslands by Daccord et al. [10], hereafter referred to as Look-up Tables. For this reason, the number and relative abundance of plant species were visually surveyed in April/May and a second time in July/August. The average of both surveys was used to determine the herbage categories according to Table 13.1 in Daccord et al. [10], where the seven categories distinguish between grass-rich, legume-rich, herb-rich, and grass-legume-herb-balanced multi-species grasslands. At the first sampling date in spring, the phenological stage was determined by direct visual observation. The following determinations of the phenological stage were performed using Table 13.2 in Daccord et al. [10], considering the time period since last defoliation.

Laboratory Analysis
The herbage samples were oven-dried at 60 • C for 48 h and weighed to determine DM as a percentage of fresh matter (FM). The HM in kg DM ha −1 was calculated for each subplot based on the area that was cut and weighed during field sampling. A dried subsample was milled to pass a 1 mm sieve (Brabender, Duisburg, Germany). Between 100 g and 200 g of a milled sample were analyzed using a laboratory NIRS measuring wavelengths in the range of 1000-2500 nm at a spectral resolution of 8 cm −1 (Fourier-transform NIR; NIRFlex N-500 system; Büchi, Flawil, Switzerland). The calibration model for CP, CF, NDF, and ADF is shown in Table S2. The laboratory NIRS was used as a reference method to evaluate the performance of the herbage quality determination by the On-site NIRS and the MSI-model (Table 1). To validate the laboratory NIRS, a milled subsample of one third of the herbage samples was subjected to wet-chemical analyses. The selected samples covered all farms and sampling dates and represented the observed variation in herbage composition well (Table S1). The DM determination was based on ISO 6496:1999 with the following modifications: 1-2 g of sample were heated in a prepASH system (Precisa instruments AG, Dietikon, Switzerland) at 105 • C for 3 h to reach constant weight. The CP was determined using the Dumas method (ISO 16634-1:2008), and CF was analyzed based on ISO 6865:2000 with the modifications that 0.5 g of the sample were treated in a FIBRETHERM FT12 system (C. Gerhardt GmbH & Co. KG, Königswinter, Germany). The concentration of ADF and NDF were measured using the FT12 system following the methods AOAC 973.18 and ISO 16472:2006, respectively, and expressed based on the organic matter content (ash-free dry weight). Prior to the NDF determination, the sample was treated with a heat stable amylase. Table 2 gives an overview of the wet chemical methods that were used to validate the laboratory NIRS herbage parameters.

Statistical Analysis
Agreement of the On-site NIRS and the MSI-model with the reference methods was assessed with Lin's concordance correlation coefficient (CCC) [33], which ranges from −1 to +1. CCC considers both the linear correlation between the methods and the distance between the line of best fit to the line of identity (1:1 line). Agreement was considered negligible (x ≤ 0.30), slight (0.30 < x ≤ 0.50), minor (0.50 < x ≤ 0.70), moderate (0.70 < x ≤ 0.90), strong (0.90 < x ≤ 0.95), and very strong (x > 0.95).
Linear relationships were determined with Pearson's correlation coefficient (r p ). We used ordinary least squares multivariate linear regression to explain the errors of the MSI-model (reference − MSI-model). Significance of the explanatory variables was determined with the F-test using Type III sums of squares (significance when added last). The relative importance of the explanatory variables, i.e., individual contributions to the explained variance of the linear model (R 2 ), was calculated with Lindemann, Merenda, and Gold (LMG) metrics according to Lindemann et al. [34].
To determine the systematic error components for the On-site NIRS versus the reference method, we applied Passing-Bablok linear regression, which takes into account the imprecision of both X and Y [35]. The 95% confidence band and the two-sided 95% confidence intervals of slope and intercept were determined using bootstrapping. A significant systematic error between the comparison method and the reference method is indicated when the confidence interval of the intercept does not contain 0 (= constant error) and/or the confidence interval of the slope does not contain 1 (= proportional error).
To investigate the potential for improvement of the On-site NIRS method, we corrected the systematic errors using the Passing-Bablok regression results as follows: where Y corrected and Y are the corrected values and the original On-site NIRS values, respectively; I is the intercept; S is the slope; and X ref is the value of the reference method. In addition to the correction using the regression results from fitting the complete dataset, we corrected in a leave-one-farm-out fashion, where the regression model fitted to the data from five farms was used to correct the data of the left-out sixth farm (all regression parameter estimates in Table S3). The software R was employed for all statistical analyses (Version 3.5.3, R Foundation). The R package epiR [36] was employed for CCC analysis. Passing-Bablok regression was performed using the mcr package [37]. Multivariate linear regression models were fitted with the base R function lm. The significance of individual explanatory variables was calculated using the base R function drop1, and the R package relaimpo [38] was used to calculate the LMG relative importance metrics.

Validation of the Laboratory NIRS as a Reference Method
The laboratory NIRS results showed a strong concordance with the wet-chemical analysis results for the herbage quality parameters CP, CF, and NDF ( Figure S1). The values of r p for CP and CF were 0.943 and 0.930, respectively, and the values of CCC were 0.94 and 0.93, respectively. For ADF and NDF, r p and CCC were slightly lower than for CP and CF (ADF: r p 0.896, CCC 0.89; NDF: r p 0.906, CCC 0.91), which shows a moderate concordance for ADF. Nevertheless, the mean absolute percentage error of the laboratory NIRS remained below 5.2% for all parameters (5.15% for CP, 4.06% for CF, 4.90% for ADF, and 3.76% for NDF), which justifies the use of the laboratory NIRS as a reference method for evaluation of the smart farming tools. Figure 3 shows the range of HM and quality parameters as measured with the reference methods (Table 1); the boxplots visualize their distribution, as observed for the two growth stages and the four growth periods, respectively. Across all measurements, HM ranged from 186 to 5770 kg DM ha −1 . The herbage quality ranged from 104 to 443 g DM kg −1 FM and from 86 to 331 g CP kg −1 DM, with structural carbohydrate concentrations from 146 to 306 g CF kg −1 DM, 300 to 567 g NDF kg −1 DM, and 177 to 334 g ADF kg −1 DM. As expected, the average HM for each cut was much higher with the 4/6 weeks defoliation regime than with the two weeks regime (Figure 3, boxplots show the stages). Conversely, the herbage quality was lower in older grasslands, as indicated by the lower CP concentration and the increased concentrations of structural carbohydrates (CF, ADF, and NDF) in the advanced growth stage (4/6 weeks).

Sample Characteristics
CP correlated negatively with HM and with CF, ADF, and NDF ( Figure 3). The mean CP increased with progression of the growth period. Conversely, the mean CF, NDF, and ADF concentrations peaked in spring and then decreased as the growth period progressed.  Table 3 shows linear models that explain the variation in the observed absolute errors (|reference -MSI-model|) within the calibration range using HM, growth period (four nominal levels), plot (18 nominal levels), and DM. The variation in the errors of the MSI-model, which measures HM and CP within the calibration range, could be explained by 29% and 59%, respectively (R 2 in Table 3). The highest contributions to the explained variances in HMerror and CPerror were attributed to the plot variable and HM (HMref), respectively. The variable HMref was positively related to CPerror and HMerror (mathematical sign of the regression coefficient in Table 3). Thus, an increase in HMref was related to the higher error in CP and HM, respectively. The growth period was the second-most important explanatory variable for CPerror.

Performance of the MSI-Model
The MSI-models for determining HM and CP were shown to have poor agreement with the reference method. As shown in Figure 4, we measured a slight concordance between the MSI-model and the reference data for HM (CCC: 0.31, r p : 0.39) and no concordance for CP (CCC: −0.21, r p : −0.26), even though the data outside of the calibration range of the MSI-model was excluded.
Separately considering the two growth stages, we found a slightly higher concordance for the 2-week herbage (n = 62, CCC: 0.35, r p : 0.64) compared to the 4/6-week herbage (n = 36, CCC: 0.21, r p : 0.41) in measuring HM. The HM of the younger herbage (two weeks) was generally overestimated, whereas the older and more mature herbage (4/6 weeks) was mostly underestimated (Figure 4). Table 3 shows linear models that explain the variation in the observed absolute errors (|reference − MSI-model|) within the calibration range using HM, growth period (four nominal levels), plot (18 nominal levels), and DM. The variation in the errors of the MSI-model, which measures HM and CP within the calibration range, could be explained by 29% and 59%, respectively (R 2 in Table 3). The highest contributions to the explained variances in HM error and CP error were attributed to the plot variable and HM (HM ref ), respectively. The variable HM ref was positively related to CP error and HM error (mathematical sign of the regression coefficient in Table 3). Thus, an increase in HM ref was related to the higher error in CP and HM, respectively. The growth period was the second-most important explanatory variable for CP error .

Performance of the On-Site NIRS
The On-site NIRS determined the fresh herbage quality parameters in large disagreement with the reference methods, oven-drying and the laboratory NIRS ( Figure 5). Low concentrations were mostly overestimated, and high concentrations were underestimated. The generally low concordance increased in the order CP < NDF < DM < ADF < CF (CCC: 0.22-0.58). Correlations with the reference were moderate for DM, CP, CF, and ADF (rP: 0.71-0.85) and slight for NDF (rp: 0.45). These results indicate that the random contribution to the error was greater for NDF than for other quality

Performance of the On-Site NIRS
The On-site NIRS determined the fresh herbage quality parameters in large disagreement with the reference methods, oven-drying and the laboratory NIRS ( Figure 5). Low concentrations were mostly overestimated, and high concentrations were underestimated. The generally low concordance increased in the order CP < NDF < DM < ADF < CF (CCC: 0.22-0.58). Correlations with the reference were moderate for DM, CP, CF, and ADF (r P : 0.71-0.85) and slight for NDF (r p : 0.45). These results indicate that the random contribution to the error was greater for NDF than for other quality parameters determined by the On-site NIRS. Passing-Bablok regression analysis identified significant systematic errors for all parameters. The confidence intervals of the intercepts and slopes did not include 0 and 1, respectively, which indicates that both a constant and a proportional systematic error was apparent ( Figure 5).

Performance of the Look-Up Tables
The agreement between the selected herbage parameters DM and CP, which was evaluated using the Look-up Tables and reference measurements, is presented in Figure 6. As expected from the herbage categories available in the Look-up Tables [10], this method clearly differentiated the two growth stages with respect to herbage quality. Nevertheless, the variability in herbage quality within the growth stages was very poorly represented by this method. The herbage quality parameters CF, NDF, and ADF showed similar patterns and are presented in Figure S2.   [10]; * Correction of the systematic error based on Passing-Bablok regression fitted to the full dataset ( Figure 5; coefficients in Table S3); † Correction of the systematic error based on Passing-Bablok regression fitted in leave-one-farm-out fashion (coefficients in Table S3); ‡ n = 147 due to 15 missing values; § |reference -comparison method|; # |(reference -comparison method) × reference −1 |.  Table 4 compares the performance of all applied methods in HM and quality determination. Absolute and absolute percentage errors are listed, where the latter represents a relative error. The identified systematic error of the On-site NIRS was successfully corrected using two approaches: a correction based on the Passing-Bablok regression that was fitted to the full dataset and fitted in a leave-one-farm-out fashion.

Comparison Between Methods
The mean absolute percentage error in measuring the HM and quality decreases in the following order of the determining tools: MSI-model > Look-up Tables ≥ On-site NIRS (Table 4). In determining CP, CF, NDF, and ADF, both the On-site NIRS and the Look-up Tables performed comparably with respect to the relatively large standard deviations. The On-site NIRS determined DM better than the Look-up Tables.
After correcting the systematic error in the On-site NIRS measurement (slopes and intercepts shown in Table S3), the mean absolute percentage errors for quality parameters decreased from 9.3-22.2% to 2.4-7.7% (Table 4). The MSI-model disagreed most with the reference methods presenting mean absolute percentage errors of 99.7% and 33.2% in determining HM and CP, respectively. The measurement errors of the tool decreased when considering only values inside the calibration range (HM: n = 98, CP: n = 113, Figure 4), which reveals absolute percentage errors of 86.3% (HM) and 29.6% (CP) and absolute errors of 562.5 ± 266.6 kg DM ha −1 and 49.0 ± 43.2 g CP kg −1 DM.

Generalizability of the MSI-Model
Our results emphasize the importance of a comprehensive evaluation of spectral reflectance-based smart farming tools for determining the HM and grass quality. The MSI-model was the tool that showed the largest disagreement with the reference measurements (Table 4, Figure 4). Limited generalizability of the model was presumably the main cause for this discrepancy. The model was calibrated with data obtained from a very small range of grassland types. Indeed, all the grasslands that were used for calibration were perennial ryegrass-white clover mixtures (70:30 ratio) in an early growth stage and located within a single research farm [31]. Thus, the calibration dataset ranged to a maximum of 2436 kg DM ha −1 and 247.3 g CP kg −1 DM. These grasslands represent the major grassland type observed on agriculturally managed land in Ireland [39]. Conversely, the great majority of multi-species grasslands that were considered in our study included three to six grass species, with a distinct set of species at different sites, as well as two clover species and three herb species with relative abundances ranging from 1 to 15% and 2 to 48%, respectively. Observed HM ranged to 5770 kg DM ha −1 , which is more than double the maximum HM in the calibration data. Less pronounced, CP exceeded the calibration range by 84 g kg −1 DM.
When extrapolating beyond the calibration range in terms of HM and CP values, the MSI-model increasingly failed to accurately determine the HM. Therefore, we analyzed the potential sources of error of the MSI-model for the calibration range only (Table 3). Askari et al. [31], who developed the model, obtained reasonable performance when they validated the models for HM and CP with a subset of their ryegrass-clover mixtures excluded during calibration. Conversely, even considering values within the calibration range only, the model performance was very poor for the Swiss multi-species grasslands of this study. The plot variable, which represents the 18 grasslands that contain the sampled subplots (2 subplots × 5 samples each; Figure 2), was attributable for 83.6% and 15.7% of the explained variance in the errors in HM and CP, respectively. This finding indicates that the discrepancy between the two studies is due to a different and more diverse botanical composition and larger environmental variability in the current study.
Furthermore, the performance of the MSI-model was slightly better for the two-week-grown herbage than for the four-to-six week grown herbage (CCC: 0.35, versus CCC: 0.21). The two-week grown herbage was presumably morphologically more similar to the calibration data representing mainly young and leafy plant biomass, which is typical for grazing conditions. This might explain the better performance. While the described generalizability problem can be overcome with calibration data that better represent the grasslands to which the model will be applied, there are sources of error inherent to the method that need to be addressed differently.

Difficulty of Remotely Sensing High-Biomass Grasslands by MSI
Remote sensing techniques mainly capture signals from canopy surfaces and increasingly less signals from lower vegetation layers [26,40]. The poor sensing of the lower layers increasingly introduces greater uncertainty in grassland property determination with taller stands and vertical heterogeneity. In our study, the error in determining CP was to a large extent explained by the standing HM, which shows increased error of CP with an increase in HM (Table 3). Consistent with this rational of disproportionately higher sensing of the upper, protein-rich leafy layers compared to that of the lower layers with more low-protein, fiber-rich stems, CP was overestimated at higher standing biomass ( Figure S3). The effect of decreasing signal strength from lower vegetation layers can also partially explain the frequently reported asymptotical approaching of a reflectance saturation with increasing biomass or leaf area index [26,41,42]. Other effects, such as decreasing chlorophyll contents when herbage matures might also contribute to this saturation effect [43,44].
The means of the single bands and indices that were used in the MSI-model [31] were calculated for the sampled areas and plotted against HM ( Figures S4 and S5). There was a pronounced flattening of the curves at approximately 1300 kg DM ha −1 and low or, in case of the red and green bands, near zero slope at higher HM. The observed missing sensitivity of the MSI-model at high HM was, therefore, likely (co)determined by the saturation effect. Because the saturation effect is more or less pronounced for different wavelengths, selecting less affected bands and vegetation indices for use in models might mitigate the saturation issue [40]. The red-edge narrow band between 730 nm and 740 nm and indices that include red-edge information are particularly suited for this purpose [42]. Multispectral sensors that are designed for vegetation monitoring, such as the Parrot Sequoia sensor, therefore included this band. In stepwise variable selection for the MSI-model, Askari et al. [31] also included indices and bands that are particularly suitable to address the saturation problem, i.e., difference-indices instead of ratio-indices and the green and red-edge band [40,42]. However, the saturation effect might not have had considerable weight toward the selection of variables because the variable selection was based on training data with a limited range in biomass. In case the calibration range of the MSI-model is extended toward higher biomass, a re-selection of the variables might be preferable to a simple recalibration of the model.

Perspectives of MSI on Grasslands
The grass height can be modelled using structure-from-motion photogrammetry on overlapping aerial images that are taken from different angles [45,46]. Considering such three-dimensional information complementary to the spectral information was shown to enhance the performance of models that determine the HM and nitrogen content [47]. Because the reconstruction of the three-dimensional sward scenery is less affected by spectral saturation, including this information might increase the model performance particularly for grasslands with high biomass. A lot of photogrammetry software incorporates structure-from-motion [48] and can output a digital surface model as a byproduct of the creation of two-dimensional reflectance maps that are needed for using the MSI-model. Subtracting the digital surface model from a digital terrain model, which represents the soil surface, will yield the map with grass height estimates. Incorporation of grass height estimates to improve the MSI-model would, thus, not make the overall workflow for the user more complex but would create the additional requirement of an accurate enough digital terrain model. This information can be obtained with the same structure-from-motion technique during periods of bare soil or very low grass canopy after harvest.

Improving the Decision Support Platform for Remote Grassland Assessment
The technical realization of how the calibrated model is employed might introduce additional errors. On the GrassQ platform www.grassq.com (as of 2 July 2019), area means are calculated by taking the arithmetic mean of the corresponding HM or CP pixel values. This process will introduce errors for concentrations, such as CP, which are not simply additive but should instead be aggregated as the weighted mean with the biomass distribution as weights. The error that arises from using the arithmetic mean instead of a weighted mean might be reasonably small for very homogeneous grasslands but will rise quickly with increasing heterogeneity and might considerably contribute to the observed error of the MSI-model in this study. On the GrassQ platform, HM and CP are calculated for each pixel of the reflectance data, which in our study represented an area of approximately 5 cm × 5 cm grassland. However, the applied MSI-model was calibrated on a much coarser spatial scale, using mean reflectance values of 1.5 m × 5 m plots [31]. The heterogeneity and range of expected reflectance values is presumably broader across the pixels than the range of mean reflectance values observed on the coarse spatial scale of calibration. For example, pixels that represent bare soil will show reflectance values far beyond the calibration range (of mean values). This may introduce errors when applying the MSI-model to heterogeneous multi-species or thin grassland.

Potential for Correcting Systematic Errors of the On-Site NIRS
The On-site NIRS showed absolute percentage errors between 9.3% and 22.2% in determining grass quality parameters ( Table 4). The relatively low performance was largely due to a systematic error. However, the correlations between the quality parameters and the reference measurements were high (r P ≥ 0.71), except for NDF (r P 0.45, Figure 5). The On-site NIRS generally overestimated low concentrations and underestimated high concentrations ( Figure 5). Long et al. [22] determined DM concentrations of alfalfa-grass mixtures using the HarvestLab TM 3000, both as a mobile instrument, same as in this study, and mounted onto a forage harvester. Similar to our study, they observed a high correlation with the DM measurements obtained by oven-drying and identified systematic errors. However, they reported that on average the high DM concentrations were overestimated and the low DM concentrations were underestimated in case of the mobile NIRS application (within the DM range from 213-497 g kg -1 FM). Evaluating three mobile NIRS tools in measuring several grass quality parameters, including DM, CP, and ADF, Patton et al. [23] also observed systematic over-and underestimation.
Instrument response shifts are a known problem in NIRS analysis and have been reported to cause systematic errors [19,49]. A response shift means a difference in wavelength or reflectance measurements by NIRS instruments that occur either after a given operation time or between NIR instruments even if they are technically identical. Response shifts are particularly problematic when transferring NIR calibrations from one instrument to another, for example when producing instruments in quantity. The systematic errors identified in our study are likely explained by these shifts. To address this problem, Long et al. [22] proposed performing frequent wavelength standard measurements via the HarvestLab TM 3000 software, a process that is typically conducted during maintenance service. Chen et al. [50] suggested a simple method for correcting response differences between instruments using a few standardization samples. In our study, the remaining error of the On-site NIRS after correction of the systematic error using the Passing-Bablok linear regression was ≤ 7.7% (Table 4). This remaining error is surprisingly small, in particular for a mobile instrument employed outside controlled laboratory conditions and using a calibration that does not differentiate grassland types. To put this finding in perspective, relative differences in the range of 3.8% (NDF) to 5.2% (CP) were observed between well-established laboratory NIRS with calibration tailored for Swiss grasslands and wet-chemical analyses ( Figure S1).
The remaining error might partially be caused by limited generalizability to grasslands that are not well represented by the calibration [17]. The question has been raised whether NIRS calibrations should be global or developed for local or plant family-and species-specific applications [51,52].
Interregional calibrations have been successfully employed on dried herbage and were found to be robust when grassland characteristics were similar to the calibration data [17,51]. However, our results indicate that calibrations might not need to be grassland-type-specific. Certainly, some error could be reduced by sample preparation processes, such as the processes performed for laboratory NIRS analyses i.e., homogenizing of the plant material by drying and milling [19]. However, the saving of these working processes renders the mobile NIRS attractive for farmers in the first place, because it enables near real-time results to be obtained on-farm [6].

Benefits of Smart Farming Tools
Until aerial MSI for grassland assessment supersedes conventional approaches, such as look-up tables or costly laboratory analyses, methods such as the MSI-model still need to be intensively developed. However, the On-site NIRS performed equally well than the Look-up Tables by Daccord et al. [10] and even substantially better after correction of the systematic error (Table 4). Unlike the Look-up Tables, the On-site NIRS was able to detect seasonal and short-term variation in grassland quality in our study (analysis not shown). In particular, the parameter DM, which reacts to recent weather conditions, was better captured by the On-site NIRS. Besides the better temporal resolution, the On-site NIRS does not require any botanical knowledge. Compared to the laboratory analysis, farmers receive results in a timely manner, which is a clear advantage over sending samples to a commercial laboratory with similar accuracy.

Conclusions
The two smart farming tools for grassland assessment evaluated in this study were in different development stages. The smart farming tool that is based on a model for analyzing MSI performed worse regarding grass quality assessment than the conventional Look-up Tables but showed potential in estimating HM on grasslands with low biomass, i.e., in situations that often pertain to grazing management. It was identified that the saturation of reflectance indices is likely to limit the performance for high-biomass grasslands, and evidence was obtained that indicates that the model performance was co-determined by grassland characteristics. Further model development might focus on extending the calibration dataset with respect to spatiotemporal diversity and include three-dimensional information, such as grass height, to mitigate limitations due to saturating reflectance. The second smart farming tool, the On-site NIRS, performed better or comparable to the Look-up Tables. We found that the error for the On-site NIRS was mainly systematic, and therefore, predictable and correctable, indicating great potential of the On-site NIRS to become a tool that is accurate enough for practical use by farmers in the near future.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/19/3256/s1, Table S1 Characteristics of studied grasslands and observations.; Table S2 Characteristics of the calibration model of the laboratory NIRS (Fourier-Transform NIR, N-500, Büchi, Flawil, Switzerland; all herbage parameters as g kg −1 DM).; Table S3 Passing-Bablok regression result from fitting the full dataset or fitting five of six farms (left-out farm indicated).; Figure S1 Laboratory NIRS measurements versus the validation methods (n = 54, Table 2) for quality parameters in dried herbage. Dashed line is the line of identity (1:1). The open and filled dots represent the two growth stages of the plants: "2 weeks" of growth and "4/6 weeks" of growth. The values of Pearson's correlation coefficient (r P ), Lin's concordance correlation coefficient (CCC), and the mean absolute percentage error (MAPE) are given. CA was not further used to evaluate smart farming tools in the study, because sampling was prone to soil contamination. Data points far from the line of identity were confirmed to not explain measurement errors of other parameters.; Figure S2 Herbage quality parameters determined with the Look-up Tables by Daccord et al. [10] versus the reference methods (n = 162). The dashed line is the line of identity (1:1). The open and filled dots represent the two growth stages of the plants: "2 weeks" of growth and "4/6 weeks" of growth.; Figure S3 The error of the MSI-model (reference − MSI-model) in determining crude protein (CP) versus the herbage mass (n = 147). The open and filled dots represent the two growth stages of the plants: "2 weeks" growth and "4/6 weeks" of growth.; Figure S4 Reflectance in four spectral bands determined by UAV multispectral imagery versus the herbage mass (n = 147). Values are plot-area means of the corresponding pixel values. Solid line is a smooth LOESS trend line. The open and filled dots represent the two growth stages of the plants: "two weeks" of growth and "4/6 weeks" of growth. Nine and one far outlying data points are not shown for red-edge and red, respectively.; Figure  Funding: This research was funded by the Swiss Federal Office for Agriculture within the project "GrassQ" under the ICT-Agri ERA-Net Call "Enabling Precision Agriculture", Project ID 35779.