How Do Methods Assimilating Sentinel-2-Derived LAI Combined with Two Di ﬀ erent Sources of Soil Input Data A ﬀ ect the Crop Model-Based Estimation of Wheat Biomass at Sub-Field Level?

: The combination of Sentinel-2 derived information about sub-ﬁeld heterogeneity of crop canopy leaf area index (LAI) and SoilGrids-derived information about local soil properties might help to improve the prediction accuracy of crop simulation models at sub-ﬁeld level without prior knowledge of detailed site characteristics. In this study, we ran a crop model using either soil texture derived from samples that were taken spatially distributed across a ﬁeld and analyzed in the lab (AS) or SoilGrids-derived soil texture (SG) as model input in combination with di ﬀ erent levels of LAI assimilation. We relied on the LINTUL5 model implemented in the SIMPLACE modeling framework to simulate winter wheat biomass development in 40 to 60 points in each ﬁeld with detailed measured soil information available, for 14 ﬁelds across France, Germany, and the Netherlands during two growing seasons. Water stress was the only growth-limiting factor considered in the model. The model performance was evaluated against total aboveground biomass measurements at harvest with regard to the average per-ﬁeld prediction and the simulated spatial variability within the ﬁeld. Our ﬁndings showed that a) per-ﬁeld average biomass predictions of SG-based modeling approaches were not inferior to those using AS-texture as input, but came with a greater prediction uncertainty, b) relying on the generation of an ensemble without LAI assimilation might produce results as accurate as simulations where LAI is assimilated, and c) sub-ﬁeld heterogeneity was not reproduced well in any of the ﬁelds, predominantly because of an inaccurate simulation of water stress in the model. We conclude that research should be devoted to the testing of di ﬀ erent approaches to simulate soil moisture dynamics and to the testing in other sites, potentially using LAI products derived from other remotely sensed imagery.


Introduction
A successful use of dynamic crop models at sub-field level could provide valuable information to precision farming applications, such as detailed yield forecasts, timing of pesticide application and estimation of potential for variable rate application of fertilizers in the field. Modern earth 1.
How does the soil texture information drawn from SoilGrids compare to detailed soil texture derived from laboratory-analyzed soil samples taken on 14 conventionally farmed fields across France, Germany, and the Netherlands? 2.
With regard to the average total biomass of winter wheat at harvest per field: How does the crop model prediction perform when using either analyzed soil texture (in the following: AS) or texture extracted from SoilGrids (in the following: SG) as model input in a standard run (i.e., no ensemble creation, no LAI assimilation), the ensemble mean (i.e., no LAI assimilation), and combined with the assimilation of S2 LAI information based on either of two different approaches (ensemble Kalman filter and weighted mean)? 3.
Which approach, in combination with AS or SG, reproduces the measured spatial variability of aboveground biomass at harvest within a field better?

Experimental Fields and Data Collection
We collected data from fourteen conventionally farmed fields across Germany, France, and the Netherlands (seven fields during the growing season 2016/2017, another seven during the growing season 2017/2018, see Figure A1 in the Appendix A for a map), where winter wheat was grown in the locally practiced crop rotation. Field sizes varied between 7 hectares and 100 hectares. In 2016/2017, one distinct commercially available cultivar was planted on at least two different dates (with field 2 being the only exception with one planting date only) in each location. In 2017/2018, two cultivars were planted per field on the same date (with field 12 being an exception, where seeds from both cultivars were planted on two dates). Plant protecting agents, growth regulators, and fertilizers were applied based on best practice guidelines in accordance with local recommendations. We did not apply any irrigation.
We drew soil samples in 40 to 60 randomly distributed points per field, and analyzed them for texture composition (divided into the particle sizes sand, silt, clay and gravel according to the German classification system [39]) as well as total organic carbon content in the laboratory. German fields were sampled up to a depth of 90 cm in 30 cm intervals, and up to 60 cm on the French and Dutch fields (see Table 1 for mean values of different soil depths per field). Because of the presence of solid gypsum in deeper layers, some points in field 5 were sampled to a depth of 30 cm only. The minimum distance between two sampling points was 15 m. Table 1. Per-field mean values for different particle sizes (gravel, sand, silt, clay) and soil depths for soil samples analyzed in the lab (=AS) and extracted from SoilGrids (=SG) for sampling locations. All values expressed in mass fraction in %. Daily weather data were collected from weather stations (Adcon Telemetry, Klosterneuburg, Austria) installed adjacent to the fields (rainfall, minimum and maximum air temperature, solar radiation, and wind speed).
Around maturity, aboveground biomass was sampled at each soil sampling point on an area of 1 m 2 , split into the component grain, stem, and leaves, oven dried at 105 • C until no further weight loss occurred, and weighed subsequently. Average total aboveground biomass yield per field varied between 969 g m −2 (Field 5, France) and 1959 g m −2 (Field 3, Germany), the harvest index (i.e., the ratio of grain yield to total aboveground biomass as a measure of reproductive efficiency) varied between 0.41 (Field 2, Germany) and 0.55 (Fields 1 and 7, Germany and Netherlands). Detailed information on the fields investigated in this study can be found in Table A1 in the Appendix A.

SoilGrids Soil Information
The SoilGrids REST (representational state transfer) application programming interface was used to request the soil properties bulk density (BLDFIE), clay content (CLYPPT), silt content (SLTPPT), sand content (SNDPPT), content of coarse fragments (>2 mm) (CRFVOL), and soil organic carbon content (ORCDRC) of every exact location where soil samples in each field were taken. SG delivered information for seven standard depths (0 cm, 5 cm, 15 cm, 30 cm, 60 cm, 100 cm, and 200 cm). We calculated the weighted average of the clay and coarse fragments content using numerical integration over the depths 0-30 cm, 30-60 cm, and 60-100 cm (as suggested by [27]) in order to compare those values to the measured ones (see Table 1 for results).

Leaf Area Index Estimations
The Sentinel Application Platform (SNAP) Sentinel-2 Toolbox Biophysical Processor was used to derive LAI maps for every field (with a pixel resolution of 10 m) based on Sentinel-2 imagery that was atmospherically corrected using the Sen2Cor algorithm [40], with all bands resampled to 10 m resolution (using the Nearest method in SNAP). The Biophysical Processor uses the Sentinel-2 top of canopy reflectance data (coming from the bands 3-8a, 11, and 12) as input into a neural network to calculate LAI for each pixel. The neural network implemented in the biophysical processor was trained using PROSAIL model output-see [35] for a detailed description of the workflow implementation.
LAI values were extracted from each pixel of each image where a soil sampling point was located, using the raster package [41] implemented in R [42], and checked for validity (magnitude of values plus temporal consistency, data not shown). Figure 1 shows the per field acquisition dates of cloud free S2 imagery that were used in this study.
Since we wanted to evaluate how suitable our approaches were as biomass yield predictions tools, we decided to only assimilate LAI observations before the estimated flowering date (i.e., the simulations ran far beyond the assimilation of the last available observation).
Data availability was better in 2018 than in 2017 (see Figure 1). In 2017, every field was covered by at least four images within the period March to June, with a maximum of eight images (field 5). In 2018, every field was covered by at least eleven images within the period March to June, with a maximum of 15 images (field 11).

Crop Model
We employed the generic LINTUL5 model implemented in the modeling framework SIMPLACE (www.simplace.net, accessed January 20, 2020) to simulate daily leaf area and biomass development in all fields. LINTUL5 is a crop growth simulation model developed for potential waterlimited, N-limited, and NPK-limited conditions [43], and has been used widely for crop response assessments [44][45][46]. SIMPLACE (Scientific Impact Assessment and Modelling Platform for Advanced Crop and Ecosystem Management) is a model framework that allows the solution of a modeling problem to be modularized into a number of discrete, replaceable, and interchangeable software units (so-called SimComponents) [47]. The solution used for this study was a combination of the SimComponents

Crop Model
We employed the generic LINTUL5 model implemented in the modeling framework SIMPLACE (www.simplace.net, accessed 20 January 2020) to simulate daily leaf area and biomass development in all fields. LINTUL5 is a crop growth simulation model developed for potential water-limited, N-limited, and NPK-limited conditions [43], and has been used widely for crop response assessments [44][45][46].

SIMPLACE (Scientific Impact Assessment and Modelling Platform for Advanced Crop and
Ecosystem Management) is a model framework that allows the solution of a modeling problem to be modularized into a number of discrete, replaceable, and interchangeable software units (so-called SimComponents) [47]. The solution used for this study was a combination of the SimComponents LINTUL5, SlimRoots, SlimWater, and STMPsim.
Crop growth in LINTUL5 is a function of intercepted radiation, temperature, and radiation use efficiency (RUE). Daily LAI is calculated as the product of the development stage-dependent specific leaf area (SLA) and the weight of the living green leaves (WLVG).
Soil water balance was simulated using SlimWater, where the daily change in soil water content in a multiple layered soil profile is based on the volumes of crop water uptake, soil evaporation, surface run-off, and seepage below the root zone [48]. Root growth was simulated using the SimComponent SlimRoots, where the daily increase in biomass of seminal and lateral roots depends on the input of assimilated biomass from the shoot (see [47] for more information). Soil hydraulic properties based on the measured texture and texture drawn from SoilGrids (see above) were predicted using the database of hydraulic properties of European soils (HYPRES) [49] pedotransfer functions (PTF) for each soil profile. The functions are based on soil hydraulic data provided by a range of institutions from 12 European countries and are applicable to studies at European scale [49]. Research suggests that the accuracy and reliability of pedotransfer functions may be appropriate for applications from regional to national scale [50].
The layered soil profile consisting of the measured data was extended to 200 cm soil depth, assuming the same in situ texture present as in the 60-90 cm layer.
We assumed no occurrence of disease stress and optimal nutrient supply at all times, as all fields were conventionally managed in accordance with local recommendations. Thus, water stress was the only growth-limiting factor considered in the model. In our SIMPLACE <LINTUL5,SLIM> solution, water stress was imposed on biomass production if the calculated actual transpiration of the crop dropped below the calculated potential transpiration because of the insufficient levels of transpirable soil water in the root layer. Evapotranspiration was estimated based on the Applied Penman approach [43].
SIMPLACE <LINTUL5, SLIM> ran in daily time steps. Daily phenology data were provided by a xarvio TM in-house developed, commercial growth stage model that estimated cultivar-specific BBCH stages of winter wheat based on the accumulated thermal temperature, vernalization, and photoperiod. The growth stage model was developed for better estimation of fungicide application timing, and has been validated with roughly 30,000 records in Germany. The BBCH stages were transformed into the LINTUL-internal development stages (DVS) based on a lookup-table, and linked to all SimComponents that required DVS information. For all winter wheat-specific variables beyond phenology, we used the generic values (i.e., no calibration of cultivar specifics). Daily meteorological data needed in the simulation model was taken from the data provided by the weather stations.

Data Assimilation
We implemented two approaches to assimilate LAI values into SIMPLACE <LINTUL5, SLIM>: The ensemble Kalman filter (EnKF) and the weighted mean (WM) approach. EnKF has gained popularity in the scientific community because of its simple conceptual formulation and ease of implementation, plus its low computational requirements [51]. It combines an ensemble forecast and the Kalman filter to calculate the prediction error covariance using the Monte Carlo method [19]. State variables are updated sequentially, taking the uncertainties of the simulation results and observations into account [52]. We refer to other publications for detailed information about EnKF [3,51]. The integration into SIMPLACE<LINTUL5, SLIM> was done in R [42] using the SIMPLACE R wrapper [53], relying on the EnKF implementation provided by Stefan Gelissen (http://blogs2.datall-analyse.nl/2016/06/08/ rcode_ensemble_kalman_filter/, accessed 9 November 2019). The workflow constituted as follows: First, the ensemble members were randomly generated based on the chosen initial values and variance.
Second, model runs using the sets of variables as input were invoked. Simulations run until the first S2 LAI observation became available. This interrupted each model run, EnKF updated the LAI value accordingly, and the runs were re-invoked until the next observation became available.
The WM approach relies on a model ensemble that runs from simulation start to simulation end without compromising the consistency and integrity of the state variables in the ensemble runs. It is based on the attribution of weights to the model ensemble members depending on their proximity between simulation and observation at the first date of LAI observation. These weights are then updated at the subsequent observation dates and used in the calculation of the ensemble mean at each model time step. The efficiency is comparable to those of updating methods. Methodology and test results are documented in detail in our publication (currently under review) [54].
Both approaches (EnKF and WM) relied on the generation of a model ensemble. We chose a set consisting of soil and crop components (SCC) that comprised the combination of three variables that scaled (1) the soil water content at simulation start (SoilWaterInit), (2) the maximal rooting depth that could be reached by the plants (MaximalRootDepth), and (3) a scaling factor for the DVS-dependent specific leaf area (ScaleFactorSLA). By perturbing the first two variables at initialization, the model induced water stress (by modification of the transpiration reduction factor TRANRF) at varying points in time during the growing season, thereby expanding the range of LAI values at any given point in time in the ensemble. We did not alter both parameters after initialization for reasons of model consistency. ScaleFactorSLA scaled all predefined DVS-dependent SLA values uniformly and showed high impact on LAI dynamics and biomass accumulation in the model, and was included as to cover influences of possible hidden factors (such as nutrient stress).
The initial soil water content SoilWaterInit in soil layer k was calculated as where VolumetricWaterContent33to1500 describes the volumetric soil water content between field capacity and wilting point (i.e., plant available water), and VolumetricWaterContent1500 the volumetric soil water content at permanent wilting point. Both values were returned by the pedotransfer function (see above), based on the measured soil texture. We chose a uniform parameter distribution for the WM approach, with defined minimum and maximum values (SoilWaterInit 0.3-1, MaximalRootDepth 0.5-2, ScalingFactorSLA 0.75-1.25). EnKF creates a model ensemble using a randomly generated Gaussian distribution of the variables (mean values for the distribution: SoilWaterInit 0.65, MaximalRootDepth 1.25, ScalingFactorSLA 1.25).
To achieve a maximum effect of the varying initial soil conditions on crop development, simulation start was set to the day preceding sowing. The LAI error was assumed 0.3.

Model Run and Evaluation
We decided to compare the two SG and AS derived soil texture fractions "clay" and "gravel," as differences generally impact the soil moisture dynamics in the simulated soil profile. We calculated the Pearson correlation coefficient (r), the root mean squared error (RMSE), and the bias for each field and soil fraction, comparing each soil layer (0-30 cm, 30-60 cm, and 60-90 (-100 for SG) cm) accordingly. Additionally, we also included the absolute average available water content (AWC, 0-90 cm) calculated by the HYPRES pedotransfer function.
The assessment of the crop model biomass predictions with regard to AS, SG, and the combination of the two different assimilation approaches was based on the comparison of measured and simulated total aboveground biomass at harvest day. For this, SIMPLACE<LINTUL5, SLIM> ran in daily time steps for every sampling point in every field that was part of this study, once using AS and once using SG as input. All available Sentinel-2 LAI estimations for every sampling point were assimilated into the model using the approaches EnKF and WM in combination with the SCC ensemble generation set. Other studies have shown that crop growth simulation model predictions improve if LAI, observed over several growth stages, is assimilated [12,14].
To evaluate the potential benefit of data assimilation, we also included the ensemble mean (EM, i.e., no data assimilation, run with measured and SoilGrids extracted soil texture data using the same ensemble generation method as the WM approach) and the model's standard runs (SR) (i.e., no data assimilation, no ensemble creation, run with single point input from AS and SG using the standard configuration) in the analysis. A workflow for all setups is provided in Figure 2. Workflow for the range of setups investigated in this study. Every setup was run for every sampling point in every field that was part of this study. SR: standard run, EM: ensemble mean, EnKF: ensemble Kalman filter, WM: weighted mean, AS: analyzed soil texture, SG: SoilGrids-derived soil texture.

Comparison of Measured Soil Texture Data and Texture Data from SoilGrids
The results of the comparison of measured soil texture data and SoilGrids texture data per field are presented in Table 2. A strong positive agreement (r > 0.8) could only be seen for the clay content in field 11, while other fields showed weak to no agreement at all (r~0) (e.g., fields 1, 5, 12), and some even showed negative agreements (r < 0, e.g., field 8). RMSE values ranged between 4.01% (field 13) and 17.61% (field 5) mass fraction for clay content, and between 4.3% (field 7) and 12.05% (field 3) mass fraction for gravel content. For the most part, SoilGrids data underestimated the clay content and overestimated the gravel content in the field (positive bias values indicate overestimation by the SoilGrids data). Altogether, it was obvious that the SoilGrids data did not predict the measured soil texture well in any of the fields.
The analysis of the PTF-derived plant-available water content (AWC, i.e., volumetric soil water content at field capacity to wilting point pF2.5) showed no strong relationship in any of the fields; some fields show moderately strong relationships (e.g., fields 3, 11, 12), others no relationship (fields 7, 14), some more even negative relationships (fields 1, 2, 9). The average calculated AWC is lower in the SG data than in the AS data in most fields (except fields 2, 7, and 12). Table 2. Per-field results of comparison of measured soil texture data and texture data extracted from SoilGrids as well as plant-available water content derived from the HYPRES pedotransfer functions. We calculated the three metric RMSE, mean absolute percentage error (MAPE), and the bias. RMSE indicated the magnitude of error in the unit of measurement with symmetry provided; MAPE showed the average absolute percent difference between measured and predicted values, the bias computed the amount by which the measured values were greater or smaller than the predicted ones. We assumed that the crop model simulated spatial heterogeneity of biomass yield well if the value of each metric was close to zero.

Comparison of Measured Soil Texture Data and Texture Data from SoilGrids
The results of the comparison of measured soil texture data and SoilGrids texture data per field are presented in Table 2. A strong positive agreement (r > 0.8) could only be seen for the clay content in field 11, while other fields showed weak to no agreement at all (r~0) (e.g., fields 1, 5, 12), and some even showed negative agreements (r < 0, e.g., field 8). RMSE values ranged between 4.01% (field 13) and 17.61% (field 5) mass fraction for clay content, and between 4.3% (field 7) and 12.05% (field 3) mass fraction for gravel content. For the most part, SoilGrids data underestimated the clay content and overestimated the gravel content in the field (positive bias values indicate overestimation by the SoilGrids data). Altogether, it was obvious that the SoilGrids data did not predict the measured soil texture well in any of the fields.
The analysis of the PTF-derived plant-available water content (AWC, i.e., volumetric soil water content at field capacity to wilting point pF2.5) showed no strong relationship in any of the fields; some fields show moderately strong relationships (e.g., fields 3, 11, 12), others no relationship (fields 7, 14), some more even negative relationships (fields 1,2,9). The average calculated AWC is lower in the SG data than in the AS data in most fields (except fields 2, 7, and 12).  Figure 3 shows the mean RMSE and standard deviation of total aboveground biomass across all fields for the tested approaches. The highest RMSE was produced by SR AS (532.39 g m −2 ) with a standard deviation of 158.12 g m −2 , the lowest by EM SG (246.63 g m −2 ) with a standard deviation of 123.84 g m −2 . However, it is noticeable that all approaches but the standard runs show, on average, similar performances with a mean value of about 250 g m −2 .

Comparison of Simulated vs. Measured Aboveground Biomass
The lowest MAPE, averaged across all fields, was produced by EnKF AS (0.14 with a standard deviation of 0.06) and WM AS (0.14 with a standard deviation of 0.07), the highest by SR AS (0.35 with a standard deviation of 0.14) and SR SG (0.25 with a standard deviation of 0.23). The SoilGrids-based approaches showed higher standard deviations than the analyzed soil texture-based approaches, with similar mean values, however (Figure 4). Figure 3 shows the mean RMSE and standard deviation of total aboveground biomass across all fields for the tested approaches. The highest RMSE was produced by SR AS (532.39 g m −2 ) with a standard deviation of 158.12 g m −2 , the lowest by EM SG (246.63 g m −2 ) with a standard deviation of 123.84 g m −2 . However, it is noticeable that all approaches but the standard runs show, on average, similar performances with a mean value of about 250 g m −2 . The lowest MAPE, averaged across all fields, was produced by EnKF AS (0.14 with a standard deviation of 0.06) and WM AS (0.14 with a standard deviation of 0.07), the highest by SR AS (0.35  On average across all fields, the greatest bias was produced by SR AS (484.52 g m −2 with a standard deviation of 168.51 g m −2 ), the lowest by EM SG (15.37 g m −2 with a standard deviation of 229.77 g m −2 ) and EnKF SG (−32.66 g m −2 with a standard deviation of 269.24 g m −2 ) ( Figure 5). Those approaches using SG as model input tended to produce a lower bias with a higher standard deviation in comparison with those approaches using measured soil texture as model input (which produced higher bias values with lower standard deviations). On average across all fields, the greatest bias was produced by SR AS (484.52 g m −2 with a standard deviation of 168.51 g m −2 ), the lowest by EM SG (15.37 g m −2 with a standard deviation of 229.77 g m −2 ) and EnKF SG (−32.66 g m −2 with a standard deviation of 269.24 g m −2 ) ( Figure 5). Those approaches using SG as model input tended to produce a lower bias with a higher standard deviation in comparison with those approaches using measured soil texture as model input (which produced higher bias values with lower standard deviations).

Comparison of Simulated vs. Measured Aboveground Biomass
derived soil texture as model input.
On average across all fields, the greatest bias was produced by SR AS (484.52 g m −2 with a standard deviation of 168.51 g m −2 ), the lowest by EM SG (15.37 g m −2 with a standard deviation of 229.77 g m −2 ) and EnKF SG (−32.66 g m −2 with a standard deviation of 269.24 g m −2 ) ( Figure 5). Those approaches using SG as model input tended to produce a lower bias with a higher standard deviation in comparison with those approaches using measured soil texture as model input (which produced higher bias values with lower standard deviations).

Assessment of Reproduction of Sub-Field Variability
Our assessment of the sub-field reproduction of our approaches is based on the analysis of the metrics RMSE, MAPE, and bias, assuming that values closer to 0 show a greater agreement with field-measured biomass at harvest day. We decided to exclude the approaches SR AS and SR SG from further analysis because of their inferior average performances (see previous section). Figures 6-8 show the per-field results of RMSE, MAPE, and bias for EM (a), EnKF (b), and WM (c) approaches, either using analyzed soil or SoilGrids-derived texture as model input.

Assessment of Reproduction of Sub-Field Variability
Our assessment of the sub-field reproduction of our approaches is based on the analysis of the metrics RMSE, MAPE, and bias, assuming that values closer to 0 show a greater agreement with fieldmeasured biomass at harvest day. We decided to exclude the approaches SR AS and SR SG from further analysis because of their inferior average performances (see previous section). Figures 6 to 8 show the per-field results of RMSE, MAPE, and bias for EM (a), EnKF (b), and WM (c) approaches, either using analyzed soil or SoilGrids-derived texture as model input.
It is obvious in all three figures that performances of both AS and SG were highly variable between fields and between approaches (EM, EnKF, and WM), as well as per field. No single approach stood out, and neither AS or SG showed a systematically better performance across all fields.   It is obvious in all three figures that performances of both AS and SG were highly variable between fields and between approaches (EM, EnKF, and WM), as well as per field. No single approach stood out, and neither AS or SG showed a systematically better performance across all fields.

Results of the SoilGrids vs. Measured Soil Texture Comparison
With regard to research question 1, the analysis of the SoilGrids vs. analyzed soil texture showed that 1) SG did not predict the field-measured soil texture well (clay content was largely underestimated and gravel content was largely overestimated) across the soil profiles down to 90 cm soil depth, and 2) the plant-available soil water volumetric content derived from the PTFs was substantially lower for SG in most of the fields than for AS. A number of studies have been reported on the limitations of the SoilGrids system [55][56][57], and the authors themselves advise on their homepage to, if national or local scales are targeted, compare the predictions with soil maps derived from national and local soil databases, since they are usually based on more detailed input soil information and therefore exhibit greater accuracy (www.soilgrids.org, accessed 19 November 2019). Combining local and global predictions could be a promising possibility, as well as the creation of SoilGrids at higher spatial resolution based on state-of-the-art remote sensing products that are relevant for soil mapping [27]. Currently, SoilGrids show the highest accuracy and resolution among the soil datasets covering the world [58], and was therefore our information source of choice.

Results of the SoilGrids vs. Measured Soil Texture Comparison
With regard to research question 1, the analysis of the SoilGrids vs. analyzed soil texture showed that (1) SG did not predict the field-measured soil texture well (clay content was largely underestimated and gravel content was largely overestimated) across the soil profiles down to 90 cm soil depth, and (2) the plant-available soil water volumetric content derived from the PTFs was substantially lower for SG in most of the fields than for AS. A number of studies have been reported on the limitations of the SoilGrids system [55][56][57], and the authors themselves advise on their homepage to, if national or local scales are targeted, compare the predictions with soil maps derived from national and local soil databases, since they are usually based on more detailed input soil information and therefore exhibit greater accuracy (www.soilgrids.org, accessed 19 November 2019). Combining local and global predictions could be a promising possibility, as well as the creation of SoilGrids at higher spatial resolution based on state-of-the-art remote sensing products that are relevant for soil mapping [27]. Currently, SoilGrids show the highest accuracy and resolution among the soil datasets covering the world [58], and was therefore our information source of choice.

Results of the Crop Modeling Excersises
The SoilGrids system generally provides worldwide soil property predictions at 250 m resolution [27]. Soil information for each field was therefore provided by few grid cells only, and thus identical estimated soil texture information was assigned to all sampling points located inside one corresponding grid cell. Spatial heterogeneity in the biomass predictions of the SG-based assimilation approaches could therefore only be induced by a) the differences in soil texture information if the field stretched over more than one grid cell (for SR), and/or b) the assimilation of S2-derived LAI information (for EM, EnKF, and WM) that was extracted from 10 m resolution pixels, because all other input variables were identical in the model for each point in the field (weather data, crop specific variables). Contrary to this, spatial heterogeneity in the biomass predictions of the AS-based approaches was a mixed effect of individual soil properties (for SR and EM) and assimilated S2-derived LAI (for EnKF and WM).
Given these limitations, our expectations with regards to the research questions 2 and 3 therefore were that 1) the performance of the crop model biomass predictions with SG-based approaches will be worse because of inaccurate soil texture information, leading to inaccurate modeling of soil moisture and root growth dynamics that ultimately affect the biomass production; 2) those approaches that rely on the assimilation of S2-derived LAI into the crop model will perform better in predicting biomass yield than those approaches without assimilation, because potential model inaccuracies will be corrected for; and 3) spatial heterogeneity of biomass yield in the SG-based approaches will not be as accurately reproduced as in the AS-based approaches because processes that influence biomass production after flowering are not accurately accounted for because of inaccurate soil information.
The analysis of the results showed that: 1.
Mean values of the calculated metrics (RMSE, MAPE, Bias) across all fields did not substantially differ between ensemble-based SG and AS approaches (i.e., EM, EnKF, WM), but standard deviations were greater for SG-based approaches. The model's standard runs (SR, i.e., no assimilation, no ensemble generation), performed worse, and biomass yield was largely overestimated as indicated by the positive bias.These findings show that the average biomass prediction performance based on SG soil information was not inferior to AS soil information prediction performance, but came with a greater prediction uncertainty.

2.
The simulated sub-field heterogeneity of biomass at harvest did not accurately match the measured one in any of the sites, as no approach showed a congruent performance for any of the metrics. AS-based approaches did not outperform SG-based approaches concerning the reproduction of heterogeneity. The performance of the assimilation approaches were largely site-specific, as we saw the approaches performing differently in the different sites, with no clear approach standing out. To our surprise, EM did not perform substantially worse.
Water stress was the only stress (i.e., biomass growth inhibiting) factor that we considered in our SIMPLACE <LINTUL5,SLIM> model solution. Differences between AS and SG approaches with the same setup could therefore only be caused by differences in soil water dynamics and/or root growth simulations that cause diverging levels of water stress. The SLIM components of the model failed to simulate soil-related processes adequately, as the setups without data assimilation using field-measured soil texture (SR AS and EM AS) overestimated, on average, biomass yield to a larger extent than the SR SG and EM AS using SoilGrids-derived soil texture (as indicated by the bias). The assimilation of LAI information, however, alleviated this effect (as the mean bias of EnKF AS and EnKF SG as well as WM AS and WM SG are similar).
The methods of simulating soil water dynamics vary between crop simulation models, with different approaches for soil water movement (e.g., tipping bucket or Richard's approach), different degrees of resolution (e.g., number of soil layers) and weather variables (e.g., approaches for the calculation of evapotranspiration) [59]. While there is probably no single approach that works best in all cases, we encourage a comparative study to investigate the effects of different soil water dynamic simulation approaches on the results of our methodology.

Limitations of Input Data
The pedotransfer functions developed from the HYPRS database allow for the assignment of hydraulic properties to soils in Europe that show similar textural compositions. Thus, those functions should not be used to derive hydraulic properties outside Europe, as the lack of data suggests a risky extrapolation [49]. This means that, currently, our model solution is limited to an application in Europe only. Scientists in other parts of the world are therefore encouraged to rely only on predictions made for soils that were inside the range of soils used to derive the pedotransfer functions.
The aboveground biomass measurements at harvest comprised one measurement per sampling point only (i.e., no repetitive measurements). The values could therefore be prone to measurement errors. This uncertainty should be considered when interpreting the results based on simulated vs. observed results. Furthermore, differences in phenology were the only cultivar-specific properties we considered in this study. The model was not calibrated for other variables (e.g., SLA, RUE) that could have influenced the results positively.

Sentinel-2 LAI Estimations
In this study, LAI estimations were obtained directly through the SNAP Sentinel-2 Toolbox Biophysical Processor. The algorithm is based on a set of strong assumptions associated with the underlying PROSAIL radiative transfer model (regarding leaf optical properties, canopy structure and background reflectance, see [35]). Violations of those assumptions may generally lead to differences with ground truth measurements, but could be corrected for with a custom correction accounting for the specificities of the given type of canopy [35]. Given this, we decided to forgo a validation exercise.
We are currently not aware of any other tool that provides automated LAI estimates at Sentinel-2 or greater spatial resolution (i.e., at resolutions that are of interest for precision farming applications). Arguably, this approach offers great opportunities for crop modelers, as LAI estimations can be derived without much effort. We encourage the implementation for other multispectral satellites (e.g., Landsat-8, PlanetScope).
Contrary to other studies, we cannot confirm that assimilating S2-derived LAI into a crop model improves the general prediction accuracy. Our results suggested that relying on an EM approach might lead to satisfactory results, comparable to those that were produced with the help of LAI assimilation. Furthermore, despite its shown inaccuracy at small scale, SoilGrids data proved not to provide information that lead to totally diverging results from those that used analyzed soil information as model input. However, the reader should keep in mind that, among all sites, none of the approaches showed an outstanding performance. This means that, if one approach works fine for one site, it might not perform well for another site.
Ref [13] assimilated Sentinel-2 derived LAI into the EPIC (Environmental Policy Integrated Climate) model to estimate winter wheat grain yield on Austrian experimental fields under four fertilization management strategies over the course of two growing seasons, using field-measured soil characteristics as model input. They found that assimilation via recalibration of parameters defining the LAI curve improved the model's performance notably during the first year (RMSE 317 kg ha −1 , RRMSE 6% compared to RMSE 572 kg ha −1 , RRMSE 11% without assimilation), but only slightly during the second year (RMSE 1961 kg ha −1 , RRMSE 55% compared to RMSE 2019 kg ha −1 , no RRMSE reported), with a strong underestimation of the observed yield. They attribute this underestimation to (a) rainfall possibly not representative of the site as weather data from a station 30 km away was used, (b) non-representative model default parameters for the specific year, and (c) erroneously modelled water stress conditions. [7] assimilated Sentinel-2-derived LAI into the WOFOST (World Food Studies) model for winter wheat grain yield prediction at field scale in the North China Plain, with soil information extracted from the 1:1,000,000 China Soil Database used as model input. The authors found that the assimilation, relying on EnKF, decreased the RMSE by 69 kg ha −1 to 404 kg ha −1 (MRE 4.32%). They furthermore found that the joint assimilation of LAI and soil moisture estimations showed superior performance in estimating crop yield, but concluded that soil moisture assimilation may not be necessary under very wet conditions [7]. [2] used LAI data derived from Landsat-8, Landsat-7, and Sentinel-2A images to correct the WARM rice model via automatic recalibration of crop parameters at sub-field resolution for 40 paddy fields in northern Italy. They found that assimilating LAI increased the overall accuracy (MAE 0.66 t ha −1 , RRMSE 13.8%), and that sub-field yield variability, in most cases, was well reproduced. Occurring inconsistencies were attributed to possibly erroneous LAI data or factors that were not considered by the model, such as effects of previous cover crop or extreme weather events [2].

Conclusions
The objectives of this study were to: (1) Compare detailed soil texture measurements from 14 fields across France, Germany, and the Netherlands to soil texture information drawn from SoilGrids for the exact same locations; (2) investigate the performance of the crop model solution SIMPLACE<LINTUL5, SLIM> in those locations when using either analyzed or SoilGrids texture as model input in a range of setups in-and excluding the assimilation of Sentinel-2 estimated LAI with regard to average total biomass of winter wheat at harvest; and (3) investigate which approach, in combination with the provided soil texture information, reproduced spatial variability of aboveground biomass at harvest within the fields better.
Despite the inaccurate soil texture information provided by SoilGrids for the locations of the sampling points in the fourteen fields (clay content was largely underestimated, and gravel content was largely overestimated), the mean results of the performance of the SG-based model runs did not differ substantially from the AS-based model runs, for all ensemble-based approaches. We therefore reasoned not to advise against using this source of soil information as crop model input; the user should, however, be aware of greater uncertainty in the prediction (we found a higher standard error with similar mean values for the SG-based approaches). We furthermore concluded that, contrary to other studies, the assimilation of S2-derived LAI (using EnKF and WM) into the crop model did not improve the general prediction accuracy (average over all field), as the results of the EM runs (i.e., without assimilation) were similar. Sub-field variability of aboveground biomass at harvest was not reproduced accurately in any of the fields, because of inaccurate simulation of water stress that influences biomass accumulation dynamic in the crop model.
Based on the findings and discussed limitations of this study, we encourage future researchers to (1) evaluate the performance of other crop models that rely on different approaches to simulate soil moisture dynamics (e.g., Richards approach) and evapotranspiration (Penman-Monteith, Priestley-Taylor, Turc, . . . ) when using either measured or SoilGrids-estimated soil texture as input in combination with assimilated LAI to see if water stress is simulated more accurately; (2) test our approaches in other cropping sites around the globe and/or for other crops to investigate if our findings can be confirmed in different environmental conditions and a variable availability of cloud free satellite imagery; (3) use different sets of remote sensing data to see the effect of variable spatial resolutions; and (4) include further stresses that might alter biomass production (e.g., nutrient stress, heat stress).

Conflicts of Interest:
The authors declare no conflict of interest. The funders had no role in the design of the study; in the collection, analyses, or interpretation of data; in the writing of the manuscript, or in the decision to publish the results.