Yield Prediction in Soybean Crop Grown under Different Levels of Water Availability Using Reﬂectance Spectroscopy and Partial Least Squares Regression

: Soybean grain yield has regularly been impaired by drought periods, and the future climatic scenarios for soybean production might drastically impact yields worldwide. In this context, the knowledge of soybean yield is extremely important to subsidize government and corporative decisions over technical issues. This paper aimed to predict grain yield in soybean crop grown under different levels of water availability using reﬂectance spectroscopy and partial least square regression (PLSR). Field experiments were undertaken at Embrapa Soja (Brazilian Agricultural Research Corporation) in the 2016/2017, 2017/2018 and 2018/2019 cropping seasons. The data collected were analyzed following a split plot model in a randomized complete block design, with four blocks. The following water conditions were distributed in the ﬁeld plots: irrigated (IRR), non-irrigated (NIRR) and water deﬁcit induced at the vegetative (WDV) and reproductive stages (WDR) using rainout shelters. Soybean genotypes with different responses to water deﬁcit were distributed in the subplots. Soil moisture and weather data were monitored daily. A total of 7216 leaf reﬂectance (from 400 to 2500 nm, measured by the FieldSpec 3 Jr spectroradiometer) was collected at 24 days in the three cropping seasons. The PLSR ( p ≤ 0.05) was performed to predict soybean grain yield by its leaf-based reﬂectance spectroscopy. The results demonstrated the highest accuracy in soybean grain yield prediction at the R5 phenological stage, corresponding to the period when grains are being formed (R 2 ranging from 0.731 to 0.924 and the RMSE from 334 to 403 kg ha − 1 —7.77 to 11.33%). Analyzing the three cropping seasons into a single PLSR model at R5 stage, R 2 equal to 0.775, 0.730 and 0.688 were obtained at the calibration, cross-validation and external validation stages, with RMSE lower than 634 kg ha − 1 (13.34%). The PLSR demonstrated higher accuracy in plants submitted to water deﬁcit both at the vegetative and reproductive periods in comparison to plants under natural rainfall or irrigation.


Introduction
Brazil is responsible for over one third (124 million tons) of soybean produced worldwide (341 million tons) and plays an important role in the world's food production and financial market [1,2]. Although expressive yields are often obtained, Brazilian soybean crop production is regularly impaired by drought periods. Battisti et al. [3] demonstrated the influence of water availability on soybean yield in different fields in Brazil. According to Sentelhas et al. [4], drought periods have impaired around 30% of the Brazilian soybean production, which led to financial losses of over USD 79 billion in 38 years [5]. Furthermore, the future climatic scenarios for soybean production might drastically impact yields worldwide [6].
In this context, the understanding of soybean production areas and their development conditions is extremely important to subsidize government and corporative decisions over technical issues, which directly affect supply regulation, food security, financial market and strategical planning in relation to social, environmental and economic policies [7][8][9][10]. Hence, a rising need for precise methods capable of predicting soybean yield prior to the harvesting period has been observed, assisting better management of agronomic, logistic and economic practices.
Soybean grain yield prediction has been successfully addressed by several researches at orbital and aerial (UAV-based) levels of data acquisition [11][12][13][14][15][16]. However, according to Sakamoto [13], most of yield prediction models are based on the direct relation between biomass and vegetation indexes and the indirect relation between biomass and yield at specific phenological stage. According to Braga et al. [17], soybean plants submitted to different levels of water availability present different physiological responses, and those physiological responses are differently expressed across the spectrum. Thus, considering that yield is the result of multiple physiological iterations during the cropping season, fieldbased investigation addressing soybean grain yield prediction via hyperspectral response, studying a larger number of spectral bands at fine spectral resolution, can contribute to better understand, across multiple wavelengths, the relation between reflectance and yield, comprising a larger number of physiological iterations across the spectrum.
Recognized to be a useful method when the number of predictor variables (e.g., wavelengths) is larger than the number of response variables (e.g., grain yield- [18]), partial least squares regression (PLSR) has been successfully applied to yield prediction in several crop types: soybean [19], maize [20,21], winter wheat [22,23], barley [24], oilseeds [25] and grassland [26]. Moreover, Barmeier et al. [24] described the PLSR approach to have stronger predictive capacity of yield compared to vegetation indices in several crop systems. Developed by Wold et al. [27], the PLSR is a multivariate statistical analysis method that combines principal component regression (PCR) and multiple regression, overcoming the multicollinearity among independent variables [28,29].
In Brazil, due to the regular threat of unfavorable weather events, especially drought periods, soybean yield prediction models should comprise the evaluation of plants both under water shortage and under good conditions of water availability, allowing the yield monitoring across cropping seasons with variable weather characteristics.
Based on the current progress, this paper aimed to predict grain yield in soybean crop grown under different levels of water availability using reflectance spectroscopy and partial least squares regression. Our hypothesis is that soybean plants present different spectral responses according to their levels of water availability and that the spectral response might be related to grain yield. Our research question addresses: (1) the detection of the best time across soybean cropping season to predict grain yield; (2) the development of a multi-year prediction model, composed by hyperspectral reflectance collected in multiple cropping seasons; (3) the evaluation of the effect of crop water status on the accuracy of soybean grain yield prediction.

Experimental Site
The experiment (Figure 1  The climate of the experimental site is classified as Cfa climate according to the Köppen climate classification, i.e., subtropical climate, with a mean temperature in the hottest month higher than 22 • C, and rainfall concentrated in the summer months, corresponding to the periods of soybean production, albeit with no defined dry season [30,31]. Although dry season is not observed in the entire soybean crop season, periods of water deficit often cause large yield losses [4]. The soil of the experimental area is characterized as Udox Oxisol [32], with 75 mm of water holding capacity, and the results from soil analysis (March 2016) are presented in Table 1. The data collected were analyzed following a split plot model in a randomized complete block design, with four blocks and the experimental practices followed soybean production technologies [33]. The following water condition treatments were distributed in the field plots: irrigated (IRR, receiving rainfall and irrigation when necessary, with a soil water matric potential between −0.03 and −0.05 MPa); non-irrigated (NIRR, receiving only rainfall); water deficit induced at the vegetative stages (WDV); water deficit induced at reproductive stages (WDR). Soybean genotypes (commercial cultivars and genotypes with drought tolerance genes), with different response to water deficit, were distributed in the subplots.
WDV and WDR plots were established under rainout shelters to simulate water deficit. Shelters automatically covered the subplots (at the vegetative or reproductive stage) when rainfalls above 0.1 mm were recorded and automatically uncovered plants once rainfalls had ceased. To prevent water lateral movement from outside into the soil, the plots had in their perimeter vertical concrete barriers (buried up to 90 cm depth).
During the period to which WDV plots were deprived of rainfall, WDR was kept under natural conditions of water availability. From the flowering period to the harvesting period, the WDR treatment was deprived of natural rainfall, and in turn, the WDV plots began to be rain watered, simulating, thus, water deficit periods both at the vegetative and reproductive stages of development. Specifically, in the 2018/2019 cropping season, due to the severe natural drought conditions, WDR was only deprived from natural rainfall until 14 January 2019, when plants begun to receive natural rainfall until the harvesting period.
Plots from IRR and NIRR treatments were composed by 10 subplots, and plots from WDV and WDR were composed by 5 subplots. Only genotypes sown in all the four experimental treatments were considered in the analysis. In the 2016/2017 and 2017/2018 cropping seasons, five genotypes (1Ea15, 2Ha11, 2Ia4, BR16 and BRS 184) were sown in all experimental treatments, while in the 2018/2019 cropping season, five different genotypes (1Ea2939, 3Ma2, BRS 283, BRT18-0089 and BRT18-0201) were sown in those plots.
In IRR and NIRR treatments, the subplot dimensions were 4 m width × 5.5 m length composed by eight rows spaced 0.5 m from each other. In WDV and WDR treatments, the subplot dimensions were 1.5 m width × 6 m length composed by three rows spaced 0.5 m from each other. To minimize potential external effects on soybean plants, both extremities from each row (0.5 m) were not considered for data acquisition. Table 2 displays the sowing dates and periods of inducement of water deficit both at the vegetative and reproductive periods in the three evaluated cropping seasons. Soil moisture (0-20 and 20-40 cm depths) in IRR plots were daily monitored by tensiometers, contributing to the determination of the amount of irrigation needed to keep the soil water matric potential between −0.03 and −0.05 MPa. Tensiometers are composed by a porous and permeable ceramic tip, placed in contact to the soil, and connected to a vacuum gauge by a tube filled with water. When the water flows from the tube to the soil (since it is not always saturated), a negative pressure is created and can be measured by the vacuum gauge. Tensiometers were installed in each one of the four blocks, and the irrigation schedule performed in the 2016/2017 and 2018/2019 cropping seasons is described in Table 3. In the 2017/2018 cropping season, there was no need for irrigation, and therefore, plants of IRR and NIRR treatments were under the same water availability. Soil moisture was monitored in all plots by gravimetric analysis at two periods across each cropping season: transition between vegetative and reproductive stages ( The growth stages of the soybean plants were weekly monitored from emergence to maturation according to Fehr and Caviness [34]. Recorded by the weather station located within the experimental area, weather data (air temperature, relative air humidity and rainfall) were monitored according to Sibaldelli and Farias [35][36][37], and the climatic water balance for each experimental treatment of each cropping season was calculated according to Thornthwaite and Mather [38].
Grain yield was calculated and corrected for 13% grain moisture, according to Equation (1): in which GY is the grain yield (kg ha −1 ), HGM the harvested grain moisture (%), DGM the desired grain moisture (%), HGW the harvested grain weight (kg) and HPA the harvested plot area (m 2 ). Harvest grain moisture was measured using the G810 grain moisture meter (Gehaka Inc., São Paulo, Brazil).

Spectral Data Acquisition and Processing
The FieldSpec 3 Jr spectroradiometer (Analytical Spectral Devices, Boulder, CO, USA), with spectral resolution of 3 nm between 350 and 1400 nm and 30 nm between 1400 and 2500 nm (Figure 2), was used to collect soybean leaf reflectance. Each spectral reading was averaged by 20 internal automatic spectral readings, and the output spectra are given in single bands of 1 nm width, 2151 contiguous spectral bands. The leaf reflectance plant probe device, connected to the FieldSpec by a one-meter bare fiber (Figure 2c), was used to prevent illumination interferences of adjacent targets and atmospheric scattering and attenuation, not requiring, therefore, the application of spectral filters for noise removal and data smooth [39][40][41][42][43]. With an internal 99% reflectance board (Spectralon ® ), used as reflectance standard, and a 1% reflectance opaque and black board, this device was used during the spectral assessment to ensure pure leaf reflectance spectra collection.
Spectral data acquisition was performed at the central leaflet of the fullest expanded third trifoliate leaf from the top. Leaf reflectance spectra were collected from four plants within each subplot and then averaged, resulting in the values used on data processing, minimizing, thus, the spectral variability within the same subplot.

Statistical Analysis
Once the assumptions of analysis of variance (ANOVA) had been met, soil moisture and grain yield in each cropping season were submitted to ANOVA and means compared by the Tukey's test (p ≤ 0.05) via the software Sisvar [46].

Partial Least Squares Regression-PLSR
The partial least squares regression (p ≤ 0.05) was performed by The Unscrambler ® (CAMO Software-Norway) to develop a soybean grain yield prediction model by its leaf-based hyperspectral reflectance.
The PLSR method correlates the spectral data (matrix "X") to the grain yield (analytical data-matrix "Y") and creates a new dataset of orthogonal base vectors (latent variables or PLSR factors), which account for most of the variation in a trait variable, generating a linear model consisting of waveband scaling coefficients to transform full-spectrum data [47]. The number of orthogonal base vectors is considered a key process in the PLSR and deeply affects its prediction capacity [42]. The ideal PLSR model should use the number of orthogonal base vectors that presents the lowest value of root mean square error (RMSE) through the "leave-one-out" cross-validation method, highest coefficient of determination of multivariate regression (R 2 ) and value of systematic error (BIAS) close to zero [48].
At the first stage, PLSR was applied using the hyperspectral data from each day of spectral assessment and the corresponding cropping season's grain yield.
At the second stage, the possibility of developing a multi-year soybean grain yield prediction model was evaluated. To do so, the spectral assessment at R5 phenological stage The hyperspectral reflectance of soybean plants at R5 stage (224 samples) were randomly split in two subsets: calibration/cross-validation (containing 75% of data-168 spectral samples), used to develop the PLSR model, and external validation (containing the remaining 25% of the data-56 spectral samples), used to test the developed PLSR model.
At the third stage, aiming at investigating the individual effect of the experimental treatments on the accuracy of grain yield prediction, the spectral samples from each treatment (IRR, NIRR, WDV and WDR) at R5 phenological stage, comprising the three cropping seasons, were fitted into separated models.
The fit quality of each developed PLSR model was assessed by the coefficient of determination (R 2 ), the root mean squared error (RMSE), the root mean squared error expressed in percentage in relation to the amplitude of observed values (RMSE%) and the systematic error (BIAS) at the calibration and cross-validation (using the leave-one-out cross-validation method) steps. Specifically, at the second stage, the predictive accuracy of the obtained PLSR model (generated using 75% of the soybean spectral data at R5 stage) and tested using external samples (25% of the soybean spectral data at R5 stage) was also assessed by the R 2 , RMSE, RMSE% and BIAS obtained in the external validation (predicted Vs measured yield) step.
Before adjusting the PLSR models, reflectance spectra within each spectral dataset were normalized (subtraction of mean reflectance from the actual reflectance at each wavelength), allowing, thus, the comparison among the fitted coefficients in PLSR. Outliers and homogeneity of the spectral data were assessed by the Leverage and Hotelling's T 2 tests.

Effect of Experimental Treatments on Climatic Water Balance, Soil Moisture, Grain Yield and Leaf Reflectance
The climatic water balance, calculated according to Thornthwaite and Mather [38] for each experimental treatment of each cropping season, is presented in Figure 3. The experimental treatments demonstrated to be efficient in promoting water deficit at the vegetative (at December 1 and 2 10-day periods- Figure 3a,e,i) and at the reproductive (3 December and 1, 2 and 3 January 10-day periods- Figure 3b,f,j) stages in the three cropping seasons. However, the water deficit induced at reproductive stages revealed to be more severe, most likely because of the longer period to which plants were submitted to water withholding.
In the 2016/2017 cropping season, the 69.6 mm of irrigation between 24 and 38 DAS were enough to maintain plants of IRR treatments under good conditions of water availability at 3 November and 1 December 10-day periods (Figure 3d) even under the absence of severe water deficit across this cropping seasons (Figure 3c). Natural water deficit periods could be observed in the 2018/2019 cropping season, both at the vegetative and reproductive periods (Figure 3k), and the irrigation across crop development (106.1 mm) sustained IRR plants under good water status at 1, 2 and 3 December, 2 January and 2 February 10-day periods (Figure 3l).
The soil moisture assessed on the transition between vegetative and reproductive stages and close to the maturity period at the three evaluated cropping seasons is presented in Figure 4. At the three cropping seasons, the soil moisture on WDV plots on the transition between vegetative and reproductive stages (Figure 4a) demonstrated to be the lowest among all treatments. Even though in the 2017/2018 cropping season, statistical differences could not be detected, most likely because of the large amount of rainfall before the onset of the rainout shelters (late October) and the atmospheric conditions at 3 November and 1 and 2 December 10-day periods, the average values at both depths indicate such a trend.
The soil moisture on the last day of spectral assessment in each cropping season, close to maturation period (Figure 4b), revealed to be lower in WDR plots, due to the long period to which plants were submitted to water deficit. In turn, in WDV plots, the soil moisture demonstrated an increase, since plants began to be rain-watered.
Soybean grain yield in the 2016/2017, 2017/2018 and 2018/2019 cropping seasons is presented in Figure 5. It is possible to observe that soybean plants under water deficit during the reproductive stages had their grain yield drastically decreased. In the 2017/2018 and 2018/2019 cropping seasons, grain yield was reduced to less than a half of other levels of water availability. In turn, soybean plants subjected to water deficit during the vegetative stages presented grain yield similar to plants under natural rainfall (NIRR treatment), demonstrating that the water deprivation during reproductive stages is more harmful to soybean crop. Several authors have reported similar values of soybean yield on plants under water deficit during the vegetative stages (and rain watered during reproductive stages) compared to plants under good conditions of water availability [49][50][51][52].
Although yield on irrigated plots were similar to non-irrigated plants in the 2016/2017 cropping season, due to the absence of severe natural water deficit (Figure 3c), in the 2018/2019 cropping season, when severe natural water deficit was observed, the IRR treatment demonstrated higher grain yield compared to plants receiving only natural rainfall. Figure 6 demonstrates the average spectral response of soybean genotypes on irrigated (IRR) and water deficit induced at reproductive stages (WDR) treatments at 107 DAS in the 2018/2019 cropping season (when severe natural water deficit was observed) and their correspondent grain yield. It is possible to observe that plants with higher levels of grain yield (irrigated treatment-blue dashed line) presented lower reflectance across the spectrum compared to plants submitted to water deficit at reproductive stages (WDR treatment-red dashed line), which presented lower values of grain yield. Damm et al. [53] developed a complete review highlighting that plants under water deficit present higher values of reflectance in comparison to those under good conditions of water availability. According to Maimaitiyiming et al. [54], because of the leaf biochemical properties and structure, the differences in spectral responses of soybean plants under different levels of water availability are not the same across the spectrum.
Although differences in reflectance from IRR and WDR plants can be observed across the spectrum, differences in visible wavelengths (around 550 nm) and shortwave infrared wavelengths (around 1400 and 2200 nm) demonstrated to be better expressed.
Leaf reflectance across the visible spectrum (between 400 and 720 nm) has been proven to have a direct relation with photosynthetic active radiation (PAR), influenced by the interaction between electromagnetic energy and plant tissue compounds, e.g., chlorophyll a, b [55,56], carotenoids (β-carotene, lutein, violaxanthin and neoxanthin) [57], flavonoids (flavones, flavonols, isoflavones and anthocyanins) [58][59][60]. Hence, high rates of absorbed PAR deliver low reflectance, and low rates of absorbed PAR deliver high reflectance at those wavelengths [61][62][63][64][65]. Soybean plants under water deficit present lower photosynthetic and stomatal conductance rates [66,67], resulting in lower absorption of PAR and delivering higher values of reflectance. Regarding the largest difference observed across the visible spectrum, in the green wavelengths (around 550 nm), Moriwaki et al. [68] observed larger absorption in this spectral interval with increasing chlorophyll contents, albeit, no significant increases in blue or red absorptions.
Despite the absence of large difference in leaf reflectance across near-infrared wavelengths (between 720 and 1300 nm), it is possible to observe higher reflectance from WDR plants. The near-infrared spectrum is associated to leaf physical structure and spatial distribution of cells and also by water content [56,69,70]. Thus, the interaction of electromagnetic energy inside the mesophyll leads to internal scattering [71][72][73] and promotes differences in leaf reflectance.
The shortwave-infrared spectrum (between 1300 and 2500 nm) has been well characterized as negative related to vegetation water status and leaf water content. Braga et al. [17], characterizing the relative water content in soybean plants submitted to irrigated and water deficit conditions, demonstrated higher reflectance values for plants under water shortage. Hence, plants under good conditions of water availability (IRR treatment) presented higher levels of leaf water content and, consequently, lower values of reflectance [74][75][76][77].
The differences in reflectance across the spectrum play an important role in the monitoring and differentiation of soybean crop water status and can contribute to the water management and decision making across cropping season. Besides that, the spectral behavior from visible to shortwave infrared can provide important information for yield prediction, and the understanding of the most correlated wavelengths to yield (also influenced by the phenological iteration across cropping season).

Predicting Soybean Grain Yield through Partial Least Squares Regression-PLSR
The results of PLSR in the prediction of soybean grain yield for the 2016/2017, 2017/2018 and 2018/2019 cropping seasons are presented in Table 5.
In all days of spectral assessment, the correlation coefficient (R 2 ) of the calibration step was superior than the R 2 found on cross-validation step. Consequently, the RMSE from the cross-validation was larger than the values obtained in the calibration step.
In the 2016/2017 cropping season, the lowest values of R 2 and highest values of RMSE were observed at the early stages of crop development (28 and 33 DAS) both in the calibration and cross-validation steps. On those dates, the R 2 was lower than 0.168, and the RMSE was over 588 kg ha −1 (19.96%). However, an increment in grain yield prediction accuracy, with higher R 2 and lower RMSE, was observed as the soybean crop becomes developed.
Therefore, at 89 DAS, the highest R 2 was obtained both at calibration and crossvalidation (0.731 and 0.595, respectively). On this date, the RMSE at the calibration step was 334 kg ha −1 (11.33%) and 416 kg ha −1 (14.14%) at the cross-validation. This date corresponds to the R5 phenological stage, when soybean grains are being formed. Although the accuracy in grain yield prediction increased as soybean crop developed, at the last assessment day in this cropping season (112 DAS), the values of R 2 demonstrated to be lower and the RMSE higher compared to the previous assessment day. On this last evaluation, soybean crop had reached the R6 phenological stage, indicating that grains have been completely filled, and the maturation is initiating.
Based on the obtained results, the PLSR model demonstrated to be able to predict soybean grain yield (ranging from 1221 to 4169 kg ha −1 , as demonstrated by Figure 7) under different levels of water availability with R 2 between 0.731 and 0.595 and RMSE between 334 kg ha −1 (11.33%) and 416 kg ha −1 (14.14%) at the calibration and cross-validation steps, respectively. On the 2017/2018 cropping season, as it has been observed in the previous cropping season, the lowest values of R 2 and highest RMSE were obtained in the early stages of crop development.
Following the results observed in the 2016/2017 cropping season, in the 2017/2018 cropping season, the highest accuracy in grain yield prediction was found at the R5 stage (96 DAS), when the R 2 at calibration and cross-validation were 0.924 and 0.885, respectively, and the RMSE at both steps was 351.49 kg ha −1 (7.77%-calibration) and 437.22 kg ha −1 (9.67%-cross-validation). As previously discussed, decrease in R 2 and increase in RMSE values were observed as the crop reached the maturation stages of crop development (113 DAS).
In this cropping season (2017/2018), the PLSR demonstrated to be able to predict soybean grain yield under contrasting levels of water status across crop development, resulting in yields ranging from 875 to 5398 kg ha −1 , as shown in Figure 7.
The results of the PLSR in the prediction of soybean grain yield in the 2018/2019 cropping season corroborate the results observed in the 2016/2017 and 2017/2018 cropping seasons, with lower values of R 2 at the calibration and cross-validation observed on the earliest days of spectral assessment.
In the 2018/2019 cropping season, the highest values of R 2 , both at the calibration (0.891) and cross-validation (0.810), were obtained at the R5 stage (94 DAS), presenting the lowest values of RMSE (403.59 kg ha −1 -8.90%-and 538.47 kg ha −1 -11.87%respectively). A decrease in prediction accuracy was observed as the crop reaches the maturation stages of development. In this cropping season, the observed yields ranged from 287 to 4823 kg ha −1 (Figure 7).
In the three cropping seasons, the accuracy in soybean grain yield prediction demonstrated to increase as the crop develops and to decrease as the crop reaches the maturation stages of development. Similar results were reported by Herrmann et al. [21] who performed the corn yield prediction using crop reflectance and PLSR. Christenson et al. [19] collected hyperspectral data from soybean canopy between R1 and R6 reproductive stages and applied PLSR to predict grain yield but did not conclude about the best stage for yield prediction. The R5 phenological stage has been suggested to be more suitable for soybean grain yield monitoring and prediction using satellite-based [11,13,78], UAV-based [15] and field-based [52,79] remote-sensed data.
The highest accuracy for soybean yield prediction at R5 phenological stage is associated to the crop phenology, physiological responses according to the water availability and time of inducement of water deficit. The R5 phenological stage corresponds to the period when the grains are being formed, which represents the crop yield, being a result of several physiological iterations during the cropping season. At this stage, most of physiological iterations will be expressed on yield. Hence, the R5 phenological stage is the closest one to yield itself, and spectral assessments during this stage are suitable to investigate the relation between leaf reflectance and yield (also comprising its driving factors). In the forthcoming development stages, the grains had already been formed and plants reached their maturity. At this time, most biotic and abiotic factors that can provoke physiological responses and, therefore, interfere in grain yield, as water availability, soil nutritional status, plant diseases and insect attack, can no longer impair the crop production.
Regarding the physiological responses under different levels of water availability, the lowest accuracy in yield prediction at the early phenological stages is associated to the fact that soybean plants submitted to water shortage at vegetative stages demonstrated to recover their physiological responses and to reach a similar yield compared to those plants grown under good conditions of water availability ( Figure 5). Hence, there is a trade-off between early yield prediction and accuracy: plants from WDV treatment assessed at the vegetative stages of development can be submitted to different water status (e.g., rainfall) on forthcoming stages; likewise, plants from WDR treatment evaluated at vegetative stages might be submitted to different conditions of water availability (e.g., water deficit) in the forthcoming development stages, which imposes limitation to the early prediction of yield with high accuracy.
Considering  Table 6. As it was expected, the highest accuracy in grain yield prediction was found in the calibration step, followed by the cross-validation and external validation. At the calibration, the highest R 2 and lowest RMSE were observed (0.775 and 574.52 kg ha −1 -11.38%-respectively), while at the cross-validation, using the leave-one-out method, the R 2 demonstrated a slightly decrease (0.730), and the RMSE was increased (634 kg ha −1 -12.57%). When applying the generated yield prediction model to external samples (external validation), a positive correlation was achieved between the observed and predicted soybean grain yield, as demonstrated by Figure 8. Hence, soybean grain yield ranging from 287 to 5398 kg ha −1 (Figure 7), due to the different levels of water availability in the three evaluated cropping seasons, could be predicted at the R5 stage, which denotes the possibility of applying the generated PLSR model on forthcoming cropping seasons.
Similar results were obtained in research using crop reflectance and PLSR to predict yield in major crops. Ancin-Murguzur et al. [26], predicting grassland yield, obtained R 2 between 0.37 and 0.82 in the calibration step and between 0.56 and 0.71 in the validation step. Estimating spring barley yield, Barmeier et al. [24] obtained R 2 equal to 0.78 and 0.80 at calibration and validation steps, respectively. In the same context, Ferrio et al. [80] predicted durum wheat yield with R 2 equal to 0.81 and 0.76 at calibration and validation steps, respectively, and R 2 lower than 0.6 when using external validation. In winter wheat, Sharabian et al. [23] demonstrated R 2 equal to 0.66 and 0.73 at the cross and external validation, respectively. Predicting corn yield, Herrmann et al. [21] obtained R 2 equal to 0.77, 0.70 and 0.73 at calibration, cross and external validation steps, respectively.
Christenson et al. [19] used hyperspectral canopy reflectance and PLSR to predict soybean grain yield. The authors obtained R 2 between 0.18 and 0.67 depending on the maturity group of the evaluated cultivars and R 2 equal to 0.44 and RMSE equal to 841 Kg ha −1 when considering all cultivars into the same model. Figure 9 presents the regression coefficients of the PLSR model for soybean grain yield prediction at R5 stage. Although the PLSR coefficients demonstrated to be well distributed across the spectrum, peaks were observed at 408, 550 and 702 nm (deeply influenced by the absorption photosynthetic active radiation); 729 nm (correspondent to the slope in leaf reflectance between the red and near-infrared wavelengths); 1000 and 1917 nm (associated to the light scattering inside the mesophyll and leaf water content, respectively). The strong negative correlation at 702 nm is in accordance to previous research works addressing the yield prediction through PLSR and canopy reflectance in major crops. Ferrio et al. [80] observed negative correlation at 700 and 750 nm in the yield prediction of durum wheat. Herrmann et al. [21] detected negative correlation at 740 nm aiming at predicting corn yield. Estimating spring wheat yield, Øvergaard et al. [81] demonstrated a negative correlation at 700 nm; Christenson et al. [19] observed a strong negative correlation between the reflectance at 715 nm in the prediction of soybean grain yield.
To assess the individual effect of the experimental treatments on the prediction ability of the PLSR, comprising the three cropping seasons at R5 stage, the spectral samples from each treatment (IRR, NIRR, WDV and WDR) were fitted into separated PLSR models. The performance of PLSR in the prediction of soybean grain yield at R5 stage in each experimental treatment (water availability) is presented in Table 7. The highest accuracy in grain yield prediction, both at the calibration and crossvalidation, was found on soybean plants that have been submitted to water shortage during the vegetative stages of crop development and then rain watered from the flowering period (WDV treatment). In accordance to these results, the WDR treatment demonstrated the second highest accuracy in grain yield prediction, both at the calibration and crossvalidation steps.
In contrast, the lowest R 2 and highest RMSE, both at the calibration and crossvalidation, were found on soybean plants from the irrigated (IRR) treatment. It is important to emphasize that the IRR spectral samples from 2017/2018 cropping seasons were analyzed into the NIRR treatment due to the absence of irrigation in this cropping season. Although the prediction accuracy on NIRR demonstrated to be higher compared to the IRR treatment, it was still inferior than the values obtained both in WDR and WDV treatments.
Much research suggests that the expression of drought tolerance genes in soybean genotypes is higher in plants submitted to water deficit, as it might have occurred in the WDV and WDR treatments, leading to physiological differences among soybean genotypes [50,66,67,82,83] and resulting in spectral responses better correlated to grain yield.
Assessing the accuracy of PLSR to predict corn yield using canopy and leaves reflectance, Weber et al. [20] obtained better results on plants subjected to water deficit in comparison to well-watered plants. Accordingly, Christenson et al. [19] reported limitations when predicting soybean grain yield on irrigated plots.
The obtained results demonstrated that the use of leaf reflectance and PLSR is able to predict soybean grain yield under different levels of crop water status. The highest accuracy in the prediction of soybean grain yield under water shortage during vegetative and reproductive stages plays an important role, providing information for better characterization and mitigation of the negative impacts of drought occurrence, leading to adjustments in management practices and decision making regarding food security, logistic, economic and social issues, minimizing potential losses.
The use of leaf-based hyperspectral data demonstrated to be feasible for soybean yield prediction. Although the used sensor is a non-imaging sensor, providing only point-based spectral acquisition, it provides a large number of bands at high spectral resolution, which allows the characterization of multiple iterations between yield (as a result of crop physiology) and reflectance across the spectrum. The methods used on the present manuscript and the obtained results might contribute to future research aiming at yield monitoring using hyperspectral sensors at different levels of data acquisition, including satellite and UAV-based hyperspectral images. Considering that spectroradiometers are usually expensive, its intensive use on large areas might be undermined. However, considering the most explanatory spectral bands for yield prediction, future research can focus on the evaluation of narrow-band vegetation indices, centered on specific wavelengths, which can be generated and acquired by digital cameras equipped with narrow-band-pass optical filters, allowing the monitoring of large crop areas by unmanned aerial vehicles (UAVs).

Conclusions
This present paper addressed the prediction of grain yield in soybean crop grown under different levels of water availability using reflectance spectroscopy and partial least squares regression.
Plants with higher levels of yield presented lower leaf reflectance across the spectrum. Although differences in reflectance from IRR and WDR plants could be observed across the spectrum, differences in visible wavelengths (around 550 nm), influenced by the interaction between electromagnetic energy and plant tissue compounds, and shortwave infrared wavelengths (around 1400 and 2200 nm), influenced by crop water status and leaf water content, demonstrated to be better expressed.
The PLSR in the three evaluated cropping seasons (2016/2017, 2017/2018 and 2018/ 2019) demonstrated the highest accuracy in soybean grain yield prediction at 89 DAS, 96 DAS and 94 DAS, respectively, which correspond to the R5 phenological stage. On these dates, the R 2 on the calibration step ranged from 0.731 to 0.924 and the RMSE from 334 to 403 kg ha −1 (7.77 to 11.33%), respectively.
Analyzing the three cropping seasons into a single PLSR model, soybean yield ranging from 287 to 5398 kg ha −1 could be predicted at the R5 stage with R 2 equal to 0.775, 0.730 and 0.688 at the calibration, cross-validation and external validation step, respectively. A strong positive correlation was achieved between the observed and predicted soybean yield, with RMSE equal to 622 kg ha −1 (13.34%), which denotes the possibility of applying the generated PLSR model in forthcoming cropping seasons.
The obtained results demonstrated that the use of leaf reflectance and PLSR is able to predict soybean grain yield under different levels of crop water status. The PLSR of each experimental treatment at the R5 stage demonstrated higher accuracy for plants submitted to water deficit both at the vegetative and reproductive periods in comparison to plants under natural rainfall or irrigation. The accuracy in grain yield prediction under different levels of crop water status provides valuable information for better characterization and mitigation of the negative impacts of drought occurrence, leading to adjustments in management practices and decision making regarding food security, logistic, economic and social issues, minimizing potential losses.