Nutrient Prediction for Tef ( Eragrostis tef ) Plant and Grain with Hyperspectral Data and Partial Least Squares Regression: Replicating Methods and Results across Environments

: Achieving reproducibility and replication (R&R) of scientiﬁc results is tantamount for science to progress, and it is also necessary for ensuring the self-correcting mechanism of the scientiﬁc method. Topics of R&R have sailed to the forefront of research agenda in many ﬁelds recently but have received less attention in remote sensing in general and speciﬁcally for studies utilizing hyperspectral data. Given the extremely local environments in which many hyperspectral studies are conducted (e.g., agricultural ﬁeld plots), purposeful attention to the repeatability of ﬁndings across study locales can help ensure methods are generalizable. This study undertakes an investigation of the nutrient content of tef ( Eragrostis tef ), an understudied plant that is growing in importance due to both food and forage beneﬁts, but does so within the context of the replicability of methods and ﬁndings across two study sites situated in di ﬀ erent international and environmental contexts. The aims are to (1) determine whether calcium, magnesium, and protein of both the plant and grain can be predicted using hyperspectral data with partial least squares (PLS) regression with waveband selection, and (2) compare the replicability of models across di ﬀ ering environments. Results suggest the method can produce high nutrient prediction accuracy for both the plant and grain in individual environments, but selection of wavebands for nutrient prediction was not comparable across study areas. The ﬁndings suggest that the method must be calibrated in each location, thereby reducing the potential to extrapolate methods to di ﬀ erent areas. Our ﬁndings highlight the need for greater attention to methods and results replication in remote sensing, speciﬁcally hyperspectral analyses, in order for scientiﬁc ﬁndings to be repeatable beyond the plot level.


Introduction
The reproducibility and replication (R&R) of scientific findings has recently moved to the forefront of research agenda in many fields [1][2][3][4][5] since it has been discovered that findings often cannot be reproduced or replicated [5,6]. While the two "R's"-reproducibility and replicability-are intertwined, there are key differences between their goals. Adopting the definitions from the National Science Foundation [7] and the National Academy of Science, Engineering, and Medicine [8], we define reproducibility as the ability of a researcher to duplicate the results of a prior study using the same data and methods as the original investigator. In short, if a researcher makes the data and methods/code available, another researcher should be able to produce the exact same results. In comparison, replicability is the ability of a researcher to duplicate results using similar methods but with new data.
Achieving R&R is critical for advancing scientific discoveries, yet neither topic has received much attention in geography and the spatial sciences, where investigations tend to be observational instead of experimental or theoretical [9]. R&R has received even less attention in remote sensing (but see [10] for an early take), even though the field is uniquely positioned to contribute to R&R on several fronts. First, there is a rich archive of publicly available remote sensing datasets (e.g., Landsat), supporting opportunities for reproducibility [9,11]. Second, remote sensing studies, and in particular hyperspectral studies, are often situated in extremely local contexts (e.g., agricultural plots) due to the need for ground reference data and the high labor and time costs of operating equipment. Yet, an implicit goal of science is to develop widely applicable methodologies and generalizable findings that can be applied in different contexts. Thus, working toward the replicability of methods and findings across different study areas is important for advancing remote sensing science.
Despite the myriad opportunities for remote sensing scientists to explore R&R issues, very few formal efforts have been documented. One reason is likely because remote sensing scientists often work with large datasets and perform complex spectral and spatial manipulations [12][13][14][15][16], which makes R&R difficult if processing code is not made available. Until recently, many scientific publications did not require code to be submitted as part of the manuscript review process, although this is changing. Replication in remote sensing is also hindered by attributes of local environments, which makes the transfer of results from one landscape to another difficult. However, if we are to develop methodologies that are transferrable across space, it is necessary to begin developing and implementing protocols for testing the R&R of remote sensing studies. One way to do this is to incorporate multi-field, multi-environment analyses into studies to self-test the replicability of methods and results.
Precision agriculture is one field where immediate gains can be made toward testing the replicability of methods while also contributing a larger understanding of the extent of R&R issues in remote sensing. Since the overall goal of precision agriculture is to decrease the ambiguity of decisions required on agricultural lands that are often highly variable [17], the ability to transfer methods and findings from one environment or location to another requires them to be replicable [18]. However, most studies capture data in a single region or location (often in a single crop field) under uniform conditions [12,15], thus limiting their generalizability across environmental or geographical contexts. Furthermore, the implicit assumption is that methods and findings are extendable beyond the single field in which they were tested, but in most cases, no such evidence is provided. Many studies lack basic explanation for environmental variances such as soil, hydrology, and topography that can cause reflectance variations, thereby altering results across space [16]. Ultimately, remote sensing methodologies are of little practical value for precision agriculture if they are developed, tested, and applicable in a single location where these multiple and often confounding factors are held constant.
Partial least squares (PLS) regression has become an accepted technique in vegetation studies using hyperspectral data for estimating a range of biophysical and biochemical properties [19][20][21][22][23]. In situations where the number of independent variables is large and the variables are collinear, which is common with hyperspectral data, multiple linear regression will often overfit the model [24,25]. PLS regression standardizes model construction from the preprocessed hyperspectral data via latent variables, from which the predictive capabilities of the model can be tested. Recently, variations of PLS regression using a waveband selection procedure [13] have been proposed and adopted, but there has been little effort to test the replicability of these methods across environments to determine whether results might be transferable.
The objective of this study is to investigate the replicability of PLS regression methods, including PLS with waveband selection, for predicting nutrient content in plant and grain material across multiple environments. This study addresses gaps in the remote sensing R&R literature by replicating a methodological workflow using hyperspectral data and PLS regression for predicting nutrients in a single crop but in two varying environments in different international contexts to determine the degree to which the methods are replicable. The focus is on Eragrostis tef (tef), a cereal crop primarily grown in Ethiopia, although production has been expanding to other parts of the world due to its versatility and resistance to drought. Tef is ideal for studying replicability because it is grown in different international contexts and is known for being successfully cultivated across differing environments. Additionally, very few hyperspectral analyses have been performed on non-milled grains [26], so this study contributes knowledge in that realm as well.
Tef is a grass (Family: Poaceae) that has received little attention from the remote sensing and precision agricultural communities despite its versatile cultivation characteristics. Tef is thought to be one of the earliest domesticated plants [27], with the center of origin and diversity in Ethiopia/Horn of Africa [28]. It is drought and heat resistant, has a high nutrient content, and is grown for animal feed as well as a staple food crop [29]. While tef can be cultivated across many environments, it is primarily grown in Ethiopia, where it is the most commonly harvested crop, popular for its highly nutritious, gluten-free grain [30][31][32][33][34][35]. Recently, cultivation has been spreading outside the region; in the United States, tef is planted as a sequential forage crop for livestock feed but is currently only grown in a handful of locations [29,36].

Study Sites
This study focuses on four sites in two countries. The two US sites (US1: 21.45 ha; US2: 24.19 ha) are located in Hydro, Oklahoma, which is part of the Central Great Plains ecoregion. The region experiences cold winters (average temperature minimums from 4 to −12 • C) and hot summers (temperatures greater than 38 • C). Precipitation is variable, and temperature changes can be considerable across all seasons. The US sites are located within three kilometers of each other, so the environmental characteristics are similar. Both sites have similar soils (vertisols) and are located at the same elevation (474 m). The two Ethiopia sites (ET1: 0.77 ha; ET2: 1.23 ha) are also in close geographic proximity ( Figure 1). The first site is located in the warm, sub-moist lowlands, while the second site is in the warm, humid lowlands. Soil composition at both sites is similar (vertisols), but the sites are at different elevations (1919 m and 2201 m, respectively).

Spectral Data Collection and Nutrient Processing
Data collection and processing methods included in situ spectral data collection and ex situ laboratory nutrient analyses. The methodology proceeded in four general phases (Figure 2), which are briefly described here and detailed below. First, canopy spectral measurements and plant samples were collected at the four sites immediately prior to peak crop maturity (seed head stage). Next, plant samples were dried and separated in the laboratory where the grain was imaged and both the plant material and grain were subjected to nutrient analyses. Third, the relationship between the nutrients and spectral components were modeled using PLS regression with both the full spectrum and a waveband selection method. Lastly, the replicability of the PLS regression methods were tested through statistical comparisons of model fits and similarity of results. In Phase 1, canopy-level hyperspectral measurements and plant material were collected at each site immediately prior to harvest during peak maturity, which was late June/early July 2016 in the US

Spectral Data Collection and Nutrient Processing
Data collection and processing methods included in situ spectral data collection and ex situ laboratory nutrient analyses. The methodology proceeded in four general phases (Figure 2), which are briefly described here and detailed below. First, canopy spectral measurements and plant samples were collected at the four sites immediately prior to peak crop maturity (seed head stage). Next, plant samples were dried and separated in the laboratory where the grain was imaged and both the plant material and grain were subjected to nutrient analyses. Third, the relationship between the nutrients and spectral components were modeled using PLS regression with both the full spectrum and a waveband selection method. Lastly, the replicability of the PLS regression methods were tested through statistical comparisons of model fits and similarity of results.

Spectral Data Collection and Nutrient Processing
Data collection and processing methods included in situ spectral data collection and ex situ laboratory nutrient analyses. The methodology proceeded in four general phases (Figure 2), which are briefly described here and detailed below. First, canopy spectral measurements and plant samples were collected at the four sites immediately prior to peak crop maturity (seed head stage). Next, plant samples were dried and separated in the laboratory where the grain was imaged and both the plant material and grain were subjected to nutrient analyses. Third, the relationship between the nutrients and spectral components were modeled using PLS regression with both the full spectrum and a waveband selection method. Lastly, the replicability of the PLS regression methods were tested through statistical comparisons of model fits and similarity of results. In Phase 1, canopy-level hyperspectral measurements and plant material were collected at each site immediately prior to harvest during peak maturity, which was late June/early July 2016 in the US In Phase 1, canopy-level hyperspectral measurements and plant material were collected at each site immediately prior to harvest during peak maturity, which was late June/early July 2016 in the US and October 2017 in Ethiopia. For each of the four sites, 40 random points were generated in ArcGIS, providing 80 possible sampling location in each country. These points were then located in the field using a hand-held GPS unit (Trimble Juno 3B, Corvallis, OR, USA). In some cases, locations could not be accessed due to sampling on the days of harvest and attempting to not interfere with the harvesting practices of the respective farmers, so the actual number of samples is below 40 for each field and below 80 for the region. At all sampling sites, canopy spectral data were captured using the same spectroradiometer (FieldSpec Pro FR: Analytical Spectral Devices [ASD], Boulder, CO, USA), measuring reflectance from 350-2500 nm with a spectral sampling width of 1.4 nm from 350-1000 nm and 2.0 nm from 1000-2500 nm. The spectroradiometer was calibrated using a Spectralon diffuse reference panel (Analytical Spectral Devices [ASD], Boulder, CO, USA) approximately every 15 min. The spectralon was held at a distance from the spectroradiometer fiber to ensure no shadows were measured. The spectroradiometer fiber was held 1.2 m above the ground; since the top of canopy was about 0.3 m above ground, the cone of acceptance was 25 • , resulting in a footprint with a diameter of 0.40 m for each sample. This diameter ensured a sufficient mass of plant/grain matter was collected for nutrient testing (10 g of grain; SSSA, 1990; [37]). Reflectance data were captured between 11:00 a.m. and 2:00 p.m. local time under a cloud-free sky. It is important to mention that the same spectroradiometer was operated by the same individual in all locations to ensure the exact same data collection process was reproduced. Each location was imaged five times in succession, and spectra were averaged. Plant material in the footprint of the imaging fiber was then clipped at the base, stored in plastic bags, and placed on ice in a cooler for transport back to the laboratory.
In the laboratory, samples were dried to remove excess moisture, and the grains were separated from the plant using a traditional method of hand threshing with the assistance of a basket weaved surface. The grains from each sample, which measure approximately 1 mm in length, were aggregated in a petri dish to generate a sufficient amount to fully cover the lens of the imaging equipment. The grains were spectrally imaged in a dark room using a contact probe (Contact Probe: Analytical Spectral Devices [ASD], Boulder, CO, USA) with a halogen light source, while not equivalent to the sun, still emitted spectral wavelengths (350-2500 nm) capable of being identified using the same spectroradiometer used in the field ( Figure 3). As with the in situ samples, five spectral readings were collected for each sample, and the values averaged. and October 2017 in Ethiopia. For each of the four sites, 40 random points were generated in ArcGIS, providing 80 possible sampling location in each country. These points were then located in the field using a hand-held GPS unit (Trimble Juno 3B, Corvallis, OR, USA). In some cases, locations could not be accessed due to sampling on the days of harvest and attempting to not interfere with the harvesting practices of the respective farmers, so the actual number of samples is below 40 for each field and below 80 for the region. At all sampling sites, canopy spectral data were captured using the same spectroradiometer (FieldSpec Pro FR: Analytical Spectral Devices [ASD], Boulder, CO, USA), measuring reflectance from 350-2500 nm with a spectral sampling width of 1.4 nm from 350-1000 nm and 2.0 nm from 1000-2500 nm. The spectroradiometer was calibrated using a Spectralon diffuse reference panel (Analytical Spectral Devices [ASD], Boulder, CO, USA) approximately every 15 min. The spectralon was held at a distance from the spectroradiometer fiber to ensure no shadows were measured. The spectroradiometer fiber was held 1.2 m above the ground; since the top of canopy was about 0.3 m above ground, the cone of acceptance was 25°, resulting in a footprint with a diameter of 0.40 m for each sample. This diameter ensured a sufficient mass of plant/grain matter was collected for nutrient testing (10 g of grain; SSSA, 1990; [37]). Reflectance data were captured between 11:00 am and 2:00 pm local time under a cloud-free sky. It is important to mention that the same spectroradiometer was operated by the same individual in all locations to ensure the exact same data collection process was reproduced. Each location was imaged five times in succession, and spectra were averaged. Plant material in the footprint of the imaging fiber was then clipped at the base, stored in plastic bags, and placed on ice in a cooler for transport back to the laboratory. In the laboratory, samples were dried to remove excess moisture, and the grains were separated from the plant using a traditional method of hand threshing with the assistance of a basket weaved surface. The grains from each sample, which measure approximately 1 mm in length, were aggregated in a petri dish to generate a sufficient amount to fully cover the lens of the imaging equipment. The grains were spectrally imaged in a dark room using a contact probe (Contact Probe: Analytical Spectral Devices [ASD], Boulder, CO, USA) with a halogen light source, while not equivalent to the sun, still emitted spectral wavelengths (350-2500 nm) capable of being identified using the same spectroradiometer used in the field ( Figure 3). As with the in situ samples, five spectral readings were collected for each sample, and the values averaged.

Spectral Processing
The raw spectral curves from both the plant (in situ) and grain (ex situ) were smoothed using a Savitsky-Golay filter [38] to reduce noise (SG; Figure 4a). A third-order, 10-band moving polynomial was fitted upon the original spectral signatures [39]. Data within each 5 nm window were averaged (e.g., the value at 600 nm is average of 598-602 nm). First derivatives (hereafter, FD) were computed from the smoothed spectra ( Figure 4b). Computing derivatives allows minor differences in reflectance values to be exploited and permits discrimination of key points along the spectral curves (i.e., inflections and maxima) corresponding to biophysical and biochemical components that would otherwise be difficult to detect [40,41]. Lastly, wavebands associated with atmospheric noise (1290-

Spectral Processing
The raw spectral curves from both the plant (in situ) and grain (ex situ) were smoothed using a Savitsky-Golay filter [38] to reduce noise (SG; Figure 4a). A third-order, 10-band moving polynomial was fitted upon the original spectral signatures [39]. Data within each 5 nm window were averaged (e.g., the value at 600 nm is average of 598-602 nm). First derivatives (hereafter, FD) were computed from the smoothed spectra ( Figure 4b). Computing derivatives allows minor differences in reflectance values to be exploited and permits discrimination of key points along the spectral curves (i.e., inflections and maxima) corresponding to biophysical and biochemical components that would otherwise be difficult to detect [40,41]. Lastly, wavebands associated with atmospheric noise (1290-1495, 1705-2045, and 2355-2500 nm) and splicing points within the spectroradiometer (350-395 and 1005-1015 nm) were removed, resulting in 277 wavebands between 400-2350 nm, at a spectral resolution of 5 nm. The reflectance/derivative values at these wavebands ultimately serve as the independent variables for the statistical analyses discussed below.
Remote Sens. 2019, 11, x FOR PEER REVIEW 6 of 24 1495, 1705-2045, and 2355-2500 nm) and splicing points within the spectroradiometer (350-395 and 1005-1015 nm) were removed, resulting in 277 wavebands between 400-2350 nm, at a spectral resolution of 5 nm. The reflectance/derivative values at these wavebands ultimately serve as the independent variables for the statistical analyses discussed below.

Nutrient Analysis
Since tef serves as both a food (grain) and forage (plant) crop, nutrient analyses were performed on both the plant and grain. All samples were analyzed for calcium (Ca), magnesium (Mg), and protein content (Table 1). For details on Ca and Mg laboratory procedures, readers are directed to the Soil Science Society of America [42] plant analysis guidelines. For details on protein analysis, readers are directed to the National Forage Testing Association's (NFTA) Forage Analysis Procedures [37]. For the US samples, nutrient analyses were performed at the Oklahoma State University Soil, Water, and Forage Analytical Laboratory. For the ET samples, analyses were performed by the Ethiopian Public Health Institute of Addis Ababa. These labs were chosen for their rigor and location as samples could not be transported across country boundaries. Moreover, the procedures used by both institutions followed these established standards [37,42], with aim to minimize impact on results. Ca and Mg values are expressed in ppm mg/kg, while protein values are expressed in percent (%) of total sample weight. All are expressed as dry matter weight. These nutrient data from the plant material and grains ultimately serve as the dependent variable in the PLS regression analyses (discussed below).

Nutrient Analysis
Since tef serves as both a food (grain) and forage (plant) crop, nutrient analyses were performed on both the plant and grain. All samples were analyzed for calcium (Ca), magnesium (Mg), and protein content (Table 1). For details on Ca and Mg laboratory procedures, readers are directed to the Soil Science Society of America [42] plant analysis guidelines. For details on protein analysis, readers are directed to the National Forage Testing Association's (NFTA) Forage Analysis Procedures [37]. For the US samples, nutrient analyses were performed at the Oklahoma State University Soil, Water, and Forage Analytical Laboratory. For the ET samples, analyses were performed by the Ethiopian Public Health Institute of Addis Ababa. These labs were chosen for their rigor and location as samples could not be transported across country boundaries. Moreover, the procedures used by both institutions followed these established standards [37,42], with aim to minimize impact on results. Ca and Mg values are expressed in ppm mg/kg, while protein values are expressed in percent (%) of total sample weight. All are expressed as dry matter weight. These nutrient data from the plant material and grains ultimately serve as the dependent variable in the PLS regression analyses (discussed below).

Analytical Methods
Partial least squares (PLS) regression was implemented to assess the relationship between reflectance (independent variable) and nutrient content (dependent variable) of both the plant and grain. PLS, which can also stand for projection to latent structures [43], was selected over other forms of regression because it accounts for overfitting errors common when analyzing hyperspectral data [13,44]. Briefly, PLS regression finds a set of components (called latent factors) from X, a matrix of predictors collected on the observations, that best predict Y, a matrix of dependent observations [43]. These latent factors, or latent vectors, are orthogonal, and thus explain as much of the covariance between X and Y as possible, often resulting in a smaller number of variables than principal component regression. PLS regression extracts X-scores from the latent variables to construct a model to predict the Y-scores. In PLS, the Xand Y-scores are subject to redundancy analysis that seeks directionality in factor space until the most accurate prediction is found [25,45]. When implementing PLS regression with hyperspectral data, it is important to ensure the number of latent variables does not far exceed the number of independent variables being used, as overfitting can occur [13].

Specification of the PLS Regression Model Using the Full Spectrum
The PLS equation follows a standard regression (Equation (1)): where the response variableŷ is the nutrient value, and the predictor variables x 1 to x k are the reflectance (SG) or derivative (FD) values for bands 1 to k (here, 277). β 1 to β k are the estimated weighted regression coefficients computed directly from the PLS loadings corresponding to the model with the optimal number of latent variables, and ε is the error vector. The optimal number of latent variables (NLV) is determined through a leave-one-out (LOO) cross-validation and assessed through the minimum root mean square error: whereŷ c i represents the predicted response when the model is built without sample i, y c i represents the measured nutrient value for sample i, and n represents the number of samples used in the calibration [46]. Twelve, standard PLS regressions were performed, each corresponding to a dataset in Table 1, and each including the full set of 277 bands (PLS-Full).

PLS Regression with Waveband Selection
A modified form of PLS regression, known as the waveband selection (or iterative stepwise elimination) method, was developed to eliminate noisy and unhelpful predictors in hyperspectral studies [13,44]. Instead of including all 277 wavebands, the number is reduced iteratively [44] by dropping the least important wavebands (similar to stepwise linear regression). Waveband importance (v k ) is determined as: where β k and s k are the regression coefficient and standard deviation corresponding to waveband k. The selection begins with all 277 wavebands, and the waveband contributing least to the model (lowest v k ) is removed. The model (Equation (1)) is then re-run with 276 variables, and so on, until the maximum predictive capability is achieved [47]. A representation of the iterative processes to determine the maximum predictive capability is shown in Figure 5. This version of the model is hereafter referred to as PLS-Wave.

Predictive Ability of PLS Regression
To test the predictive capabilities of the PLS-Full and PLS-Wave models, we implemented a bootstrapping procedure by dividing the data into calibration (65-75%) and validation (25-35%) sets replacing the data of these sets = 1000 times (following [48,49]). After each separation, the models were calibrated (assessed through RMSECV) and then validated using root mean square error of prediction (RMSEP): where represents the predicted nutrient values, represents the measured nutrient values, and n represents the number of samples in the validation subset [46]. Mean coefficient of determination (R 2 ), R 2 standard deviation (R 2 std.), and RMSEP standard deviation (RMSEP std) are also reported for the validation. PLS regressions were performed in Matlab v2016a (MathWorks, Sherborn, MA, USA).

Predictive Ability of PLS Regression
To test the predictive capabilities of the PLS-Full and PLS-Wave models, we implemented a bootstrapping procedure by dividing the data into calibration (65-75%) and validation (25-35%) sets replacing the data of these sets n = 1000 times (following [48,49]). After each separation, the models were calibrated (assessed through RMSE CV ) and then validated using root mean square error of prediction (RMSE P ): whereŷ v i represents the predicted nutrient values, y v i represents the measured nutrient values, and n represents the number of samples in the validation subset [46]. Mean coefficient of determination (R 2 ), R 2 standard deviation (R 2 std.), and RMSE P standard deviation (RMSE P std) are also reported for the validation. PLS regressions were performed in Matlab v2016a (MathWorks, Sherborn, MA, USA).

Replication across Environments
To assess the replicability of the PLS-Full and PLS-Wave regression methodologies for predicting nutrient content across environments, we compared model performance between the US and ET using a difference of means (t-test) for each component-nutrient combination (e.g., grain-calcium, plant-calcium, etc.) between the two sites from the bootstrapped results. To compare the similarity of wavebands selected by the PLS-Wave model, the Jaccard index [50] was used to measure the overlap in selected wavebands compared to the total number of wavebands selected for each site (Equation (5)): where A and B are the set of selected wavebands in the two locations, respectively. A Jaccard value of 1.0 indicates that the models for the two locations overlap completely in terms of the wavebands selected as important for prediction; 0 indicates the two locations share none of the same wavebands.

Plant and Grain Nutrient Analyses
Plant nutrient values for Ca, Mg, and protein were considerably higher for the US compared to ET ( Table 2). Mean Ca was five to six times higher; mean Mg almost four times higher; and mean protein nearly 2.5 times higher in the US. Results were similar for grain nutrients (Table 2), with values in the US typically two to four times higher than ET. Additionally, the standard deviation and ranges for Ca and protein at both the plant and grain level were higher in the US, except for Mg, which showed similar standard deviation and ranges across both locations for plant and grain.

Results for the PLS-Full Model
At the plant level (Table 3, top), the PLS-Full models varied considerably in performance between the two sites with cross-validated calibration coefficients of determination (R 2 CV ) ranging from 0.03 (Mg US) to 0.88 (Mg ET) at the plant level. The validation coefficients of determinations (Mean R 2 ) closely matched the calibration values, except in the case of ET (Ca and Protein (SG)) where the validation R 2 was approximately double that of the calibration. Interestingly, when comparing the US and ET, one generally outperformed the other, but the relationship changed depending on the nutrient. Differences in filtering (SG vs. FD) were largely negligible, with the exception of protein in ET, where the model fit for the FD was about twice that for the SG data (0.41 vs. 0.18). The bootstrapped R 2 values were normally or semi-normally distributed in all cases except for Mg in the US and Ca in ET ( Figure S1, Supplementary Material). Importantly, t-test comparisons for replicability indicate the PLS-Full regression model did not replicate well across the two environments, with significant differences in fit for every nutrient at the plant level (Table 3, top). At the grain level (Table 3, bottom), results were more consistent across the study sites and nutrients. This is believed to be the case in-part to grain spectral values measured in a controlled setting, while plants were measured in a field setting. Calibration R 2 ranged [0.47, 0.93], and validation R 2 ranged [0.49, 0.90] ( Table 3). The results for ET protein using the SG data were a slight outlier though, and when considering only the FD values, the minimums increase to 0.64 and 0.62, respectively. The differences between the SG and FD filters were again negligible, except in the Mg ET case. The t-test comparisons for grain indicate the PLS-Full regression method did not replicate well across the environments.

Results for the PLS-Wave Model
At the plant level, the PLS-Wave models varied in performance in both environments with calibration R 2 ranging [0.11, 0.94] ( Table 4, top). Differences between the SG and FD were negligible in most cases. Fits for Mg in ET and Protein in the US were high (0.88-0.94), but there was little consistency across locations. T-test comparisons show significant differences in model performance for all nutrients/processing methods between the two locations indicating the PLS-Wave regression model did not replicate well in terms of predicting plant nutrients across these two environments. At the grain level, the PLS-Wave models performed generally well for all nutrients across the two locations, with calibration R 2 ranging [0.52, 0.95] ( Table 4, bottom). Fits for protein were lower for ET than US, but fits for Ca and Mg were similar. Bootstrapping resulted in normal and semi-normal distributions for all nutrients and processing ( Figure S2, Supplementary Material). For Ca ET, a bimodal distribution was observed. Upon further investigation of the original Ca ET data, the distribution deviated from a normal distribution and instead was bimodal and skewed to the right, explaining the bimodal and skewed distribution of the R 2 bootstrapping values ( Figure S2, Supplementary Material). T-test comparisons again indicate significant differences in all cases, despite similar R 2 values for some of the nutrient-component combinations, indicating the PLS-Wave regression model did not replicate well across these two environments.
In addition to significant differences in the model fits, the PLS waveband selection procedure also indicated considerable differences in the number and position of important wavebands for nutrient prediction (Figure 6). At both the plant and grain level, Jaccard indices were low (<0.2), with three instances having no overlapping bands (Table 5). For instance, nine wavebands were selected as important for predicting plant-level Ca in the US (SG): One at 760 nm, and eight from 1020-1060 nm ( Figure 6). In contrast, the important bands for predicting Ca from the ET samples (SG) included fifteen bands positioned from 1045-1115 nm. The Jaccard index for this set was 0.069 (Table 5) indicating minimal overlap in the band positioning. In summary, the results from this study indicate that the PLS regression, using both the full model and the waveband selection process, did not generate replicable findings across the two study areas. We found statistically significant differences in model fits for 11 of the 12 comparisons; only the model for grain Mg using the SG filtering (Table 4) was not statistically different. Even though model fits were not comparable, it is still possible for the wavebands identified as important for prediction to be similar. However, the Jaccard index ( Table 5) and plots of the important wavebands for each model ( Figure 6) indicate very little overlap in the important wavebands between the US and ET samples, further diminishing the case for replicability across these two study sites.

Combined Samples for Multiple Environments
Following the single environment analytics, we tested the performance of a PLSR for the two environments (US and ET) combined. We re-ran the analytics with the same methodologies for the PLS-Full and PLS-Wave models using the combined samples (named USET), and found this generally resulted in better model fits compared to the single sites for both calibration and validation for both the PLS-Full (Table 6) and PLS-wave models (Table 7). Calibration and validation coefficients of determination (R 2 ) were all greater than 0.9 for all plant and grain nutrients using both SG and FD data and for both the PLS-Full and PLS-Wave models. RMSE values from the validation were slightly higher than those from the calibration, which is expected, and RMSE p values were all within 11 percent, so the models do not appear to be overfit.

Replicability of Scientific Methods and Findings
Demonstrating that methods and findings can be replicated across studies is critical for the self-correcting mechanism of the scientific method to function properly and is imperative for generating solutions that can be applied widely. Hyperspectral data is commonly used in precision agriculture for predicting biochemical properties of vegetation and nutrients [18,26,[51][52][53], yet the replicability of findings is rarely tested across different study sites. Scalable science is needed to develop solutions that can be applied globally. The adoption of PLS regression has provided a solution to some of the computational challenges when working with high dimensional, hyperspectral data in remote sensing in general and precision agriculture more specifically, but whether these solutions are transferable has not been widely explored. This study used hyperspectral data and in situ samples to build PLS models to predict plant and grain nutrients for tef and test the replicability of those models for predicting across different environments.
We found significant differences in model fits for both the PLS-Full and PLS-Wave models along with differences in the number and locations of wavebands deemed important for prediction with the PLS-Wave models. Differences in the optimal number and location of wavebands for predicting nutrients via plant canopy measurements may be influenced by varying management practices, such as differing irrigation practices, which may lead to variable water content in the crops [41]. The ET fields were rain fed while the US fields were irrigated through to harvest; therefore variable plant water content may have influenced which wavebands were selected, as water can cause access noise in spectral signatures. This noise is particularly prevalent in wavebands that are sensitive to O-H bonds, including the spectral range 971-1400 nm [54][55][56]. While we removed a portion of this range from our data (see Figure 4), it is possible remaining wavelengths were affected by noise making replication difficult.
The number of wavebands selected in the PLS models for ET was often less than the number selected for the US ( Figure 5). This difference may reflect how the PLS models incorporate water-induced noise that may have been present in the US samples. These findings suggest that understanding how hyperspectral remote sensing methods, such as PLS regression, replicate across agricultural environments may require greater controls on the conditions under which crops are being cultivated. In addition to irrigation and other management differences, changes in latitude and sun angle could have led to differences in scattering and light absorption during field data collection [57], which would impact replication comparisons. However, hyperspectral readings for the grain samples were collected in a controlled laboratory setting with the same halogen lamp, so we can assume that any differences in wavebands were not the result of external factors (i.e., sun angles, latitudes, etc.).
When comparing the nutrient content of the grains, there were clear differences between the study areas ( Table 2). These large differences likely result in varying chemical property relationships within the grain, which in turn can result in differential absorption and scattering of electromagnetic energy. The large variance amongst biochemicals within the grain may result in noise for some nutrients as nutrient reflectance properties are often associated with near or similar portions of the electromagnetic spectrum [41].
In short, had we completed this study only in the US, the PLS regression method (both with and without waveband selection) would have produced favorable findings for all three nutrients at the grain level, and for protein at the plant level. Similarly, had we completed this study only in ET, PLS regression would have also produced favorable findings for all three nutrients at the grain level, and for Mg at the plant level. Yet, even where favorable results were found in both environments (all three nutrients at the grain level), the model fits were statistically different, and the wavebands selected were not similar. This finding serves as a cautionary tale that urges researchers to refrain from placing too much confidence on R 2 values, which may not translate to different areas. Even when the R 2 values were superficially high, there were statistical differences in their values. This is not to suggest that hyperspectral investigations with PLS regression are not useful or valuable but rather that caution should be used when translating findings from one site to another.

Combining Samples from Multiple Regions
A valid question arises as to whether combining samples from multiple regions can make models more robust for prediction across environments. Higher performance was obtained when the US and ET samples were combined. However, these higher R 2 values are likely artifacts of a larger spread among the dataset (larger range of levels of nutrients across locations).
It is worth noting that in three instances, the RMSE P from the combined USET dataset was greater than the minimum value of the nutrient in ET (see Table 2). These included Ca and Mg at the plant level for the PLS-Full model ( Table 6) and Ca at the plant level for the PLS-Wave model (Table 7). Upon further investigation, it was determined that two of these samples were outliers (PLS-Full Mg and Ca), where the value was more than two standard deviations from the mean. Thus, these issues do not appear to be widespread or have a large impact on predictive capabilities of the models. However, it is important to keep this point in mind when combining data from differing regions with varying environmental factors and agricultural practices, where there may be large disparities in nutrient content. These disparities may result in the identification of between-site variation, requiring a greater variation of nutrient values to fully establish a generalized model. Nevertheless, the results from the combined dataset are promising because they suggest that increasing within-sample variation can improve PLS model predictability across study areas. It is also worth mentioning that this practice could result in a reduction of accuracy of measurement, especially for field measurements, as the data may take on more noise. In this case, the use of single region analytics may be more beneficial. Nonetheless, improvement of predictability has been a similar finding in past studies that have referred to a need for global (i.e., USET) versus local (i.e., US and ET) modeling within similar measurement collections such as those in soil geochemistry studies [58,59]. Since it is unlikely though that this range of variation would be captured within a single site in many cases, it becomes necessary to combine samples from a variety of locations, and likely from a variety of study sites and research groups. To this point, we recommend building a more open and transparent culture of data sharing within the remote sensing community that can permit data sharing to advance our modeling capabilities and promote development of more generalizable predictive models. We recognize that a shift is already underway in many disciplines to promote data and code sharing, with some journals mandating these components accompany manuscript submission to allow reproducibility checks. Our findings here suggest that sharing data may have broader impacts beyond simply reproducing tests, but rather could result in more robust predictive models.

Non-Milled Grains
Since very few hyperspectral analyses have been performed on non-milled grains [26], we comment briefly on those aspects here. We generally found positive results using hyperspectral data with PLS regression to predict nutrient content of non-milled grains. While the reflectance measurements for these grain samples were collected in a controlled environment (i.e., a dark room using a contact probe), the coefficients of determination were noticeably higher for the grains compared to the plant canopy measurements collected in situ.
An interesting finding emerging from this research is that in the US, several blocks of wavebands corresponding to the VIS-NIR regions were identified as important for protein prediction when using FD values (Table S1, supplementary material). We compared these findings to other studies predicting protein from hyperspectral data [26,56,60]. We do see a considerable amount of overlap in the bands identified in this study for tef, and the bands identified in those studies for forage quality, cooked hams, and cereal grains (rice, oats, and maize), particularly for the FD transformation (Table 8). However, as noted by Talens et al. [56], several of the band ranges found to be important across multiple studies for protein prediction do fall in the range impacted by O-H bonds. So, it is difficult to determine whether these consistencies are true positives or are being affected by other factors.

Potential Limitations
Environmental factors such as sun angle, soil types/moisture, agronomic practices, etc., can contribute to the reflectance and nutrient differences at the plant the level. Given the number of samples collected at each location, this study is not meant to be a representative sampling for all tef varieties at varying growth stages around the world. To establish more generalized models that can be applied to a broader range of locations/tef varieties, future studies should focus on obtaining a greater number of samples across a greater number of location for multiple tef varieties.

Conclusions
This study investigated the replicability of PLS regression methods, including PLS with waveband selection, for predicting nutrient content in plant and grain material across multiple environments. Using Eragrostis tef (tef) as a target crop, this study compared PLS model fits and selected wavebands across two environments in the U.S. and Ethiopia to determine the extent to which the methods and finding are replicable. Three main findings emerge from this study: First, the model fits and wavebands selected as important for nutrient prediction were not replicable across the two study sites in the US and Ethiopia. Eleven of the 12 comparisons had statistically different model fits across 1000 bootstrapped iterations, and the Jaccard index for similarity indicated very low similarities in the wavebands selected.
Combining samples from both environments improved model fits, suggesting that increasing within-sample variation may improve the predictability of PLS models across study areas, though caution is reserved if great disparities in sample values are great. Our recommendation is to build a more open and transparent culture of data sharing within the remote sensing community that will permit data sharing in order to advance modeling capabilities and promote development of more generalizable predictive models.
Results using PLS regression with hyperspectral data from non-milled grains were generally positive, and wavebands for protein prediction generally agreed with other studies. While more research is needed to determine whether these consistencies are true positives or are affected by other factors, this study contributes to the gap in the literature related to non-milled grains.
Supplementary Materials: The following are available online at http://www.mdpi.com/2072-4292/12/18/2867/s1, Figure S1: Bootstrapping results for plant and grain analyses with the PLS-Full model, Figure S2: Bootstrapping results for plant and grain analyses with the PLS-Wave model, Table S1: Selected wavebands from the partial least square regression (PLS) using waveband selection (PLS-Wave) for protein in plant and grain for the combined United State and Ethiopia samples (USET).