GYMEE: A Global Field-Scale Crop Yield and ET Mapper in Google Earth Engine Based on Landsat, Weather, and Soil Data

: In this study, we used Landsat Earth observations and gridded weather data along with global soil datasets available in Google Earth Engine (GEE) to estimate crop yield at 30 m resolution. We implemented a remote sensing and evapotranspiration-based light use efﬁciency model globally and integrated abiotic environmental stressors (temperature, soil moisture, and vapor deﬁcit stressors). The operational model (Global Yield Mapper in Earth Engine (GYMEE)) was validated against actual yield data for three agricultural schemes with different climatic, soil, and management conditions located in Lebanon, Brazil, and Spain. Field-level crop yield data on wheat, potato, and corn for 2015–2020 were used for assessment. The performance of GYMEE was statistically evaluated through root-mean-square error (RMSE), mean absolute error (MAE), mean bias error (MBE), relative error (RE), and index of agreement ( d ). The results showed that the absolute difference between the modeled and predicted ﬁeld-level yield was within ± 16% for the analyzed crops in both Brazil and Lebanon study sites and within ± 15% in the Spain site (except for two ﬁelds). GYMEE performed best for wheat crop in Lebanon with a low RMSE (0.6 t/ha), MAE (0.5 t/ha), MBE ( − 0.06 t/ha), and RE (0.83%). A very good agreement was observed for all analyzed crop yields, with an index of agreement ( d ) averaging at 0.8 in all studied sites. GYMEE shows potential in providing yield estimates for potato, wheat, and corn yields at a relative error of ± 6%. We also quantiﬁed and spatialized the soil moisture stress constraint and its impact on reducing biomass production. A showcasing of moisture stress impact on two emphasized ﬁelds from the Lebanon site revealed that a 12% difference in soil moisture stress can decrease yield by 17%. A comparison between the 2017 and 2018 seasons for the potato culture of Lebanon showed that the 2017 season with lower abiotic stresses had higher light use efﬁciency, above-ground biomass, and yield by 5%, 10%, and 9%, respectively. The results show that the model is of high value for assessing global food production.


Introduction
Crop yield modeling and prediction have broad implications on global food security and food production monitoring. Decision makers depend on yield information to determine the potential reduction in crop yield, to deliberate food prices, and to make timely import and export decisions [1]. Farmers can benefit from crop yield modeling tools to monitor crop development throughout the growing season [2] and assess the need for fertilizer applications [3], irrigation [4], and disease control [5]. Many yield forecast approaches have been discussed in the literature, and each has its advantages and limitations. Examples are farmers' reports [6], crop simulation models [7], remote sensing [8], and, recently, artificial intelligence techniques [9]. Crop yield prediction has long been founded on conventional methods, where ground-based crop and yield data are collected from field reports [10]. Nevertheless, this method is costly, and the data collected are not open access. This method is limited in scale, and it can have significant sources of errors as farmers may overestimate

Basis of the Model for Above-Ground Biomass (AGB) and Crop Yield Estimation
The above-ground biomass (AGB) is calculated as a product of the absorbed photosynthetic active radiation (APAR, MJ/m 2 /day) [27] and light use efficiency (LUE) [40]. The model considers the impact of the environmental conditions affecting biomass production, notably for the impact of temperature stress (TS), vapor stress (VS), and soil moisture stress (SMS). The relationship between the factors is given as: where AGB (kg/ha) is the dry above-ground biomass production for the day of the satellite overpass; APAR is the absorbed photon flux by the canopy photosynthetic elements; LUE max is the maximum light use efficiency (g/MJ); TS, VS, and SMS are the temperature, vapor, and soil moisture stressors, respectively; and 0.864 is a unit conversion factor [41]. The actual LUE (LUE max multiplied by the stresses) is a relatively constant property of the crop, with a distinction between C3 and C4 crops [27]. There are several studies on the determination of the LUE max for crops, varying between 1.44 and 3.22 for C3 crops [42][43][44][45] and between 3.27 and 4.26 for C4 crops [46,47]. Still, there is no consensus in the scientific community on the LUE max for crops. An annual global crop map is still not available.
Here, we use a constant value of LUE max (2.5 for C3 crops-wheat and potato) and 3.5 for C4 crops (corn) [32]. Values of LUE max can be linked to a global crop map as it becomes available in GEE. The stress factors are expressed as fractions, with 0 indicating extreme stress and 1 indicating no stress. Details on the estimation of APAR and the stresses are provided in Sections 2.1 and 2.2.
To derive crop yields, the mean dry biomass production for each field at each date of the satellite overpass needs to be accumulated over the growing season of each crop by interpolating between satellite imagery dates to fill in the missing dates. The accumulated biomass is converted into yield through crop harvest indices and the percentage moisture content of the crop at harvest [48]. Crop yield (t/ha) is calculated as: where EY i is referred to as the metric tons of economic yield per hectare of crop i, AGDB is the accumulated above-ground dry biomass, and HI i is the harvest index of crop i, which is the ratio of yield to above-ground biomass.

Remote Sensing of Active Photosynthetic Radiation (APAR)
APAR is approximated directly from photosynthetic active radiation (PAR) and fraction of photosynthetic active radiation (fPAR; intercepted by the leaves and used in the carbon dioxide assimilation process). APAR = fPAR × PAR daily (3) The link between fPAR and LAI and the normalized difference vegetation index (NDVI) have been thoroughly documented in the literature as useful indicators of crop growth in yield models [49,50]. fPAR can be calculated using the NDVI [51,52] as: fPAR = −0.161 + 1.257 × NDVI for NDVI ≥ 0.125 0 Otherwise (4) where an NDVI value of 0.125 indicates bare soil [52]. The photosynthetic active radiation (PAR, MJ/m 2 /day) represents the spectral range from 400 to 700 nm used by the canopy's photosynthetic elements [53]. PAR is a fraction of the incident shortwave solar radiation (Rs) [54]. Rs can be obtained from climatic data (National Oceanic and Atmospheric Administration's (NOAA) climate forecast CFSv2 or Copernicus ECMWF models, both of which are available in GEE), or it can be estimated as a function of daily extraterrestrial radiation R a and atmospheric transmissivity τw as follows (our choice): PAR daily = 0.48 × R a × τw (5) where atmospheric transmissivity can be calculated from water vapor and extraterrestrial solar radiation (R a ) can be calculated from the solar constant, solar declination, and day of the year [55].

Vapor Stress (VS)
The vapor stress factor shows how evapotranspiration and photosynthetic processes are affected by vapor pressure [56]. The VS is estimated from a link with the vapor pressure deficit (VPD) (for relative humidity less than 100%) [57,58]. The VPD is defined as the difference between the saturated and the actual vapor pressure. It is considered a critical driver of the water and carbon demand for crops [59], where increases in the VPD can reduce the uptake of carbon and water use by the plant [60].
where e s and e a are the saturated vapor pressure and the actual vapor pressure (KPa), respectively.

Plant Temperature Constraint (TS)
The temperature stress (TS) was used according to Stewart [61] and Stewart [62]. The author proposed a relationship between the TS and the Jarvis coefficient (J c ) after Jarvis [63]. The TS is computed as a function of the daily temperature, upper and lower limit stomatal activity, and optimum conductance temperature of the crop.
where J c is the Jarvis coefficient calculated according to Equation (8). where T is the air temperature ( • C), T H is the upper limit of stomatal activity set equal to 45 ( • C), K T is the optimum conductance temperature ( • C) set equal to 24, and T L is the lower limit of stomatal activity. It is set equal to 0 • C [64]. For mean daily air temperatures higher than 40 • C, the TS is set to 0.001. Values of T H , K T , and T L are obtained from Maidment [65].

Soil Moisture Stress (SMS)
Surface energy balance retrieval of evapotranspiration (ET) has been used as an indicator of soil moisture availability and vegetation health [66][67][68]. In this study, we use the soil moisture stress factor as an indicator of agricultural drought. The SMS describes anomalies in the actual-to-potential 24 h transpiration ratio (T act24 /T pot24 ) [69] via the energy balance. An overview of the step-by-step procedure for calculating the SMS is given in Figure 1.

Daily Actual ET Modeling Scheme
The energy balance scheme incorporates key meteorological variables proven to provide early warning of declining crop moisture conditions [70][71][72]. Many models can be utilized to estimate evapotranspiration (ET) as a residual of the energy balance, such as DisAlexi and Two-Source Energy Balance (TSEB) [70], Mapping Evapotranspiration at High Resolution with Internalized Calibration (METRIC) [73], Surface Energy Balance Algorithm for Land (SEBAL) [74], Surface Energy Balance (SEB), or Simplified Surface Energy Balance (SSEB) [75]. In this work, we estimated ET based on the latest version of the single-source Surface Energy Balance for Land (pySEBAL) model described thoroughly in Jaafar and Ahmad [76]. We coded pySEBAL in Google Earth Engine, and it can be applied globally to any Landsat 7 or 8 collection and to Sentinel-2-MODIS-VIIRS (Visible Infrared Imaging Radiometer Suite) combination. Briefly, pySEBAL is an improved and automated version of SEBAL [74]. We refer to the GEE version of pySEBAL we developed as G-SEBAL. G-SEBAL performs energy balance at the satellite (Landsat 7 or 8 or Sentinel-2) overpass time to obtain the latent heat flux (λE, W.m −2 ) as the residual of the surface energy equation (λE = R n − G − H). Net radiation (R n ) is composed of net shortwave radiation and net longwave radiation. The former depends on the surface albedo and solar irradiation, while the latter depends on emissivity and surface and air temperatures. R n is computed as an average of the Food and Agriculture Organization (FAO) equation [77] and the Slob equation [78]. The soil heat flux (G) is calculated from a semi-empirical approach as a function of the NDVI, surface temperature, and albedo [79]. The sensible heat flux (H) is computed via a single-source resistance scheme using an iterative process through relating the air density (ρ air ), the specific heat of air at constant pressure (C p ), and the near-surface temperature difference (dT) to aerodynamic resistance (r ah ) [80]. dT is estimated as a linear function of the corrected surface temperature [79]. Aerodynamic resistance r ah is calculated iteratively based on Monin-Obukhov similarity theory until a stable value is reached to calculate the near-surface temperature difference dT at the hot pixel where the ET is assumed to be zero. dT at the cold pixel is then calculated, and the relationship is used to derive dT and therefore H at all pixels in the image. We sharpen the surface temperature using a combination of a T-sHARP thermal sharpening approach and a bilinear moving window interpolation approach to identify an ensemble of hot and cold pixels automatically in GEE [81]. Afterward, we convert the instantaneous ET (derived from the instantaneous latent heat) to daily observations by utilizing the constant evaporative fraction (EF) assumption (EF = LE/[R n − G]) [82], multiplied by an advection factor Ω [Ω = 1 + 0.985 × EF × (exp[(VPD) × 0.08] − 1.0)] (a function of the vapor pressure deficit (VPD)) to account for the increase in the ET during the afternoon period [83]. Daily grass-reference evapotranspiration ETref is calculated from the Penman-Monteith evapotranspiration equation [65]. The potential ET (ETP 24 ) is set equal to ET 24 if the latter is higher than ETref or to ETref otherwise. A digital elevation model (DEM) is used to calculate the slope, the aspect of terrain, and solar angles (latitude, declination, Remote Sens. 2021, 13, 773 6 of 30 solar zenith angle, and solar incidence angle), to adjust land surface temperature (LST) and to correct ETref in the sloped terrains.
Remote Sens. 2021, 13, x FOR PEER REVIEW 6 of 34 Figure 1. Schematic overview of the procedure used to compute soil moisture stress (SMS); the step-by-step calculations include (1) obtaining the energy fluxes (net radiation (Rn), soil heat (G), and evapotranspiration (ET) fluxes needed for calculating the evaporative fraction (EFi)), along with the soil water properties inferred from soil physical properties; (2) estimating the total soil moisture content (θv) using inputs from step 1; and (3) calculating the first estimate of the root zone.

Daily Actual ET Modeling Scheme
The energy balance scheme incorporates key meteorological variables proven to provide early warning of declining crop moisture conditions [71][72][73]. Many models can be Figure 1. Schematic overview of the procedure used to compute soil moisture stress (SMS); the step-by-step calculations include (1) obtaining the energy fluxes (net radiation (R n ), soil heat (G), and evapotranspiration (ET) fluxes needed for calculating the evaporative fraction (EFi)), along with the soil water properties inferred from soil physical properties; (2) estimating the total soil moisture content (θ v ) using inputs from step 1; and (3) calculating the first estimate of the root zone.

Soil Water Properties
To determine the soil moisture stress, it is essential to derive an estimate of soil moisture. Remote-sensing-based field-scale estimation of root zone soil moisture is still a challenging task. We utilize the Open Land Map Earth Engine dataset to derive soil texture and other properties. Field capacity, permanent wilting point, top, and subsoil saturated moisture content, and subsoil residual moisture content are all inferred from the soil physical properties. The soil texture and organic matter content are used to estimate the soil water properties, as explained in Saxton and Rawls [84], and the equations are coded in Google Earth Engine. We use the Open Land Map Earth Engine dataset [85] available in GEE to determine the soil texture of the different soil layers. The dataset uses a compilation of published point data coming from various national and international soil point-data providers, such as the FAOs soil profile databases, among many others for mapping soil properties and classes. The dataset is one of the products of OpenLandMap.org, which provides global gridded environmental layers based on ensemble machine learning. The soil data used herein are the organic carbon, clay, and sand content at six standard depths (0, 10, 30, 60, 100, and 200 cm) at 250 m spatial resolution. We then apply Saxton equations [84] to derive the potential water content at the field capacity and the permanent wilting point so as to calculate the plants' available water in the top (0-60) and subsoil layers . We restrict the analysis to 1 m to account for areas where the bedrock is limiting.

Total Soil Moisture Content (θ v )
We calculate the soil moisture content (θ v ) as a function of the instantaneous evaporation fraction and the saturated soil moisture level in the subsoil (θ SMssub ) (see Equation (9)): where a and b are curve-fitting parameters, 1 and 0.421, respectively [86]. The evaporative fraction describes the segregation of the net radiation into latent heat flux. It is intrinsically restrained by the soil moisture in the root zone [ The root zone soil moisture's first estimate (θ VRZ1 ) is calculated as a function of the fraction of vegetation cover (f c ), with an initial assumption that the topsoil moisture as well as the root zone soil moisture (θ VRZ ) is equal to the total soil moisture (θ v ) [65]. The mean and the standard deviation of the topsoil moisture (θ SM topmean and θ SM topstd ), as well as the maximum and mean root zone soil moisture (θ VRZmax and θ VRZmean ), are computed from the total soil moisture (see Equation (10)). The first estimate of the root zone soil moisture is constrained between the residual subsoil moisture (θ SM rsub ) and the maximum moisture content Maxθ VRZmax (i.e., if the first estimate θ VRZ1 is less than or equal to the residual subsoil moisture (θ SM rsub ), it is then set to the residual subsoil moisture, and if θ VRZ1 is greater than or equal to Maxθ VRZ , it is set equal to Maxθ VRZ ). The soil moisture stress trigger θ Vstress−trigger is the critical value under which plants get stressed [88] computed as a function of the soil moisture at field capacity (θ FC ), soil Remote Sens. 2021, 13, 773 8 of 30 moisture at permanent wilting point (θ PWP ), and the fraction of the total available water that can be depleted from the root zone before moisture stress (p-factor) [65]: where ET 24 is the daily actual evapotranspiration derived from the energy balance (Section 2.2.4) and DF is a crop-dependent depletion factor (a user-defined value). The depletion factor differs from one crop to another, and it usually varies from 0. 30 (12)) are used to calculate the normalized stress trigger (θ Vnormalized−trigger ) representing the effective fraction of available soil moisture [87] (Equation (14)). The first estimate of the moisture stress on the biomass (Ψ θ ) is estimated as a function of θ Vnormalized−trigger (Equation (15)): where Ψ θ is limited between 0 and 1.

Splitting the ET into Evaporation and Transpiration
To calculate the SMS factor, ETP 24 (Section 2.2.4) is split into its component of potential transpiration using an LAI-derived vegetation cover fraction light use extinction factor (ε) [89,90]: T pot24 = 1 − e ((ε×LAI)) × ETP 24 (16) Alternatively, the LAI can be calculated from a remotely sensed estimate of f c derived from the NDVI [91,92]: The LAI is restricted to the range of 0.001-8.
where f c is set equal to 0 when the NDVI is less or equal to 0.125 (representing bare soil) and equivalent to 0.99 for an NDVI greater than 0.8. Having calculated T pot24 , the actual evaporation (E act24 ) component can be estimated using the topsoil saturation degree (SE top ) calculated from the topsoil water properties [93]. As presented in Equation (19), E act2 is calculated from the difference between ETP 24 and T pot24 and either 1 or 1/(SE top + 0.1) −2 : where SE top is the degree of soil saturation of the topsoil, scaled between 0 and 1: where θ SM top is the topsoil moisture and θ SM rtop and θ SM stop are the residual and the saturated topsoil moisture, respectively, and are constants for a specific soil type. The product of the first estimate of moisture stress biomass (Ψ θ ) (Section 2.2.8) and T pot24 provides the first estimate of daily actual transpiration (T act24 ) [94]: where T act24 is set to zero when there is no vegetation cover. The relative value of T act24 and E act24 are used to break down the ET 24 flux into the second and final estimations of actual daily transpiration (T act24 ). The corrected T flux becomes [88,93]: Finally, the final estimate of soil moisture stress (SMS) is calculated as the ratio of actual daily transpiration to potential daily transpiration (transpiration efficiency) [69]: Figure 2 represents a summary of the overall methodological approach followed to estimate AGB and crop yield. This study presents the crop yield estimate at a field scale generated by modifying and applying light use efficiency (LUE) and APAR models via the integration of environmental stresses: vapor stress (VS), plant temperature stress (TS), and soil moisture stress (SMS). A particular focus is placed on evaluating the impact of the SMS estimated through surface energy balance retrieval of evapotranspiration (ET), where ETflux is segregated into its components of actual and potential transpiration. The estimated crop yield is evaluated against the actual crop yield observations collected specific to each study site.

Study Sites and Field Data
The proposed crop yield modeling approach was tested at four sites in three countries, namely Skaff (West Bekaa, Lebanon) for three consecutive growing seasons (winter 2017-2018, summer 2017, and summer 2018) and the Agricultural Research and Education Center (AREC, Bekaa, Lebanon) for summer 2020; São Desidério, Brazil, for 2015 and 2016; and Albacete (Spain) for 2017 and 2018. Details on the minimum and maximum crop yield for each site, the number of observations, and the average size of the monitored fields are illustrated in Table 1. A total of 89 observations of field-scale yield were used for model validation. generated by modifying and applying light use efficiency (LUE) and APAR models via the integration of environmental stresses: vapor stress (VS), plant temperature stress (TS), and soil moisture stress (SMS). A particular focus is placed on evaluating the impact of the SMS estimated through surface energy balance retrieval of evapotranspiration (ET), where ETflux is segregated into its components of actual and potential transpiration. The estimated crop yield is evaluated against the actual crop yield observations collected specific to each study site.   5) reducing the LUE max to the actual LUE by accounting for environmental stresses; (6) remote sensing of absorbed photon flux (absorbed photosynthetic active radiation (APAR)), which is a product of photosynthetic active radiation (PAR) and the fraction of photosynthetic active radiation (fPAR); (7) estimating crop yield as the ratio of the accumulated above-ground biomass (AGB) multiplied by the harvest index (HI) to the % moisture content of the crop at harvest; and (8) comparing the ground-observed and modeled crop yield.

Skaff (Bekaa, Lebanon) Site
The analyzed crops include 21 wheat (T. aestivum) and 31 potato (S. tuberosum) experimental fields located in the west of Bekaa Valley, Lebanon, with an average elevation of 862 m above sea level ( Figure 3). According to Köppen's climatic classification, the climate of the region is Mediterranean (Csa). The predominant soil of the region is fertile clay soil. All fields monitored are irrigated using sprinkler irrigation systems. Management practices follow local practices. Traditional tillage is common. Herbicides are applied once per season for wheat culture and between six and eight times for potato culture. Pesticide spray is carried out at one to two sprays per season, and the fertilizers in the form of urea (43% nitrogen) are applied once, at a rate of minimum 120 kg/ha to a maximum of 800 kg/ha or 336 kg of net N/ha/season for wheat culture, while potato uses 230 kg/ha of nitrogen fertilizer. Potato rotation is widely practiced in West Bekaa (Skaff farms) with wheat, vegetables, and legumes. Remote Sens. 2021, 13, x FOR PEER REVIEW 12 of 34 The actual yield dataset for the analyzed fields was collected via an interview with the Skaff farms' farmer-manager. This approach of data acquisition is referred to as the recall method [102]. The data records were obtained on crop maps, areas planted, crop type, and crop yields for three consecutive seasons: summer 2017, summer 2018, and winter 2017-2018. Agronomic measurements for estimating the harvest index (HI) and crop moisture content (%) parameters were conducted for Skaff fields (West Bekaa, Lebanon). These measurements were taken during the late growing season of the crops (May-September 2018) over two potato and three wheat experimental fields located in West Bekaa (Lebanon). The quadrate was randomly thrown in the field, and the vegetation within the frame of the quadrate was clipped. Three samples of the biomass were taken from each field, and the obtained results were averaged. The above-ground plant component was oven-dried at 75 °C for 48 h to a constant weight and weighed. The water content of the plants was determined by subtracting the above-ground dry mass from the fresh aboveground mass. The crop moisture content (%) was determined by dividing the water content by the fresh crop weight. The experimental verification of potato moisture content (%) showed that the percentage moisture content of potato tuber varies between 75% and 78% (Table S1). As for wheat, the moisture content of grains ranged between 12% and 15% (Table S2). In this study, we used a value of 0.75 for the potato HI and 0.4 for the wheat HI (Table 2), as verified by the literature. The actual yield dataset for the analyzed fields was collected via an interview with the Skaff farms' farmer-manager. This approach of data acquisition is referred to as the recall method [99]. The data records were obtained on crop maps, areas planted, crop type, and crop yields for three consecutive seasons: summer 2017, summer 2018, and winter 2017-2018. Agronomic measurements for estimating the harvest index (HI) and crop moisture content (%) parameters were conducted for Skaff fields (West Bekaa, Lebanon). These measurements were taken during the late growing season of the crops (May-September 2018) over two potato and three wheat experimental fields located in West Bekaa (Lebanon). The quadrate was randomly thrown in the field, and the vegetation within the frame of the quadrate was clipped. Three samples of the biomass were taken from each field, and the obtained results were averaged. The above-ground plant component was oven-dried at 75 • C for 48 h to a constant weight and weighed. The water content of the plants was determined by subtracting the above-ground dry mass from the fresh above-ground mass. The crop moisture content (%) was determined by dividing the water content by the fresh crop weight. The experimental verification of potato moisture content (%) showed that the percentage moisture content of potato tuber varies between 75% and 78% (Table S1). As for wheat, the moisture content of grains ranged between 12% and 15% (Table S2). In this study, we used a value of 0.75 for the potato HI and 0.4 for the wheat HI (Table 2), as verified by the literature. The studied experimental field is located at the American University of Beirut's Agricultural Research and Education Center (AREC) in Bekaa, Lebanon, a 990 m elevation above sea level. Potato (Spunta variety) was planted on 12 March 2020 in an area of 9468 m 2 . Fertilizers were applied before planting (50 kg of di-ammonium phosphate (DAP) and 25 kg of potassium sulfate) and during the growing season (50 kg of urea). The field is irrigated using a micro-sprinkler system. For HI and percentage crop moisture content calibration, biomass data were collected weekly from 30 April 2020 until 7 July 2020. A summary of the collected data and calculated parameters is illustrated in Table S3.

São Desidério (Brazil) Site
The field data used for the Brazil site are for 33 center pivots commercial fields located in the municipality of São Desidério in the western region of the state of Bahia, Brazil. The rectangular area boundary of the analyzed fields is defined by the coordinate pairs 12 • 28 08"S-45 • 45 12"W and 12 • 25 40"S-45 • 34 55"W, with an average altitude of 750 m above sea level. The predominant soil type of the studied fields is Yellow Latosol [110]. According to Köppen's climatic classification, the climate of the region is tropical (Aw) characterized by a rainy season in summer and a dry winter, with an annual precipitation of 1003.4 mm (Figure 4). Crop-specific parameters used for yield conversion are given in Table 2.

Albacete (Spain) Site
The studied fields are located in the province of Albacete (southeast of Spain). The study site lies at 689 m above sea level, with a prevailing climate considered as BSk according to the Köppen-Geiger climate classification. The annual temperature and precipitation of the study site are 13.6 • C and 340 mm, respectively [96]. The yield data are obtained from commercial wheat fields under irrigated (fields 26W and 27W) and rain-fed conditions for the rest of the analyzed fields ( Figure 5). All irrigated fields use center pivot systems. Crop yield data for Fields 22W through 27W were for the 2017 growing season, and 28W through 29W were for the 2018 season. Specific ground measurements of the HI for each field were used, varying from 0.38 to 0.45 [96].

Landsat Surface Reflectance Data
Georeferenced and atmospherically corrected Landsat 7 ETM+ and Landsat 8 OLI/TIRS imagery data were mosaicked, gap-filled (Landsat 7), masked for clouds, and composited within the Google Earth Engine (GEE) platform [114]. For the Skaff (Bekaa, Lebanon) study site, Landsat data with Path 174 Row 37 images for the period 2017-2018 were processed. A total of 27 and 23 nearly cloud-free scenes were used for the 2017 and 2018 growing seasons, respectively. For the AREC (Bekaa, Lebanon) study site, a total of 11 scenes were used to cover the 2020 growing season. For the Brazil site, Landsat WRS-2 Path 220, Row 069 scenes for the corn season (April-October) for 2015 and 2016 each were used. We acquired 40 images of the Brazil study site, including images from both OLI and ETM + sensors for both years. As for the Spain site, a total of 44 images were used for the 2017 growing season and 39 for the 2018 season. The images were selected to cover the full growing season for each culture, with at least four images per field, with an image close to the beginning, middle, and end of the crop cycle to amply define it. The dates with available images are shown in Table S4 (Supplementary Material). Table 3 shows the major data input fed to the tested model with their respective spatial and temporal resolu-

Landsat Surface Reflectance Data
Georeferenced and atmospherically corrected Landsat 7 ETM+ and Landsat 8 OLI/TIRS imagery data were mosaicked, gap-filled (Landsat 7), masked for clouds, and composited within the Google Earth Engine (GEE) platform [111]. For the Skaff (Bekaa, Lebanon) study site, Landsat data with Path 174 Row 37 images for the period 2017-2018 were processed. A total of 27 and 23 nearly cloud-free scenes were used for the 2017 and 2018 growing seasons, respectively. For the AREC (Bekaa, Lebanon) study site, a total of 11 scenes were used to cover the 2020 growing season. For the Brazil site, Landsat WRS-2 Path 220, Row 069 scenes for the corn season (April-October) for 2015 and 2016 each were used. We acquired 40 images of the Brazil study site, including images from both OLI and ETM+sensors for both years. As for the Spain site, a total of 44 images were used for the 2017 growing season and 39 for the 2018 season. The images were selected to cover the full growing season for each culture, with at least four images per field, with an image close to the beginning, middle, and end of the crop cycle to amply define it. The dates with available images are shown in Table S4 (Supplementary Material). Table 3 shows the major data input fed to the tested model with their respective spatial and temporal resolutions, and possible uncertainties. Table 3. Major data inputs used in this study, with the respective spatial and temporal resolutions, and uncertainties.  [95] showed that the use of specific HI values could decrease the difference between the predicted and the measured yield from ±10% (with the use of a single HI value) to ±5% (with the use of a specific HI value); the used HI in this study falls within the reported range, indicating less uncertainty in crop yield estimation.

LUE max _ _
Dong et al. [112] showed significant improvements in biomass estimation accuracy when using the derived variable LUE max (by about 15.0% for the normalized root-mean-square error (nRMSE)) compared to the fixed LUE max .

Weather Data
We use the 6 hr CFSV2 gridded (0.2 arc degrees) weather dataset available in GEE generated by the National Centers for Environmental Prediction (NCEP) of the National Oceanic and Atmospheric Administration (NOAA) [113]. CFSV2 version 2 was developed at the Environmental Modeling Center at the NCEP. CFSV2 is a fully coupled model representing the interaction between Earth's atmosphere, oceans, land, and sea ice. Currently, CFSV2 is the global weather reanalysis and forecast tool that has the highest temporal and spatial resolution to allow for real-time reference evapotranspiration calculations and forecasts in GEE. Other available datasets or systems (e.g., GLDAS, NASA-POWER, ECMWF) are either not available in near real time in GEE or have a lower resolution. Each grid of the CFSV2 file covers an approximate area of 490 km 2 . The variables used in the model to calculate ETref and consequently ETact and transpiration are presented in Table 4.

Definition of the Growing Season
The NDVI time-series images are used to define the crop phenology of each field at pixel scale. An NDVI value of 0.2 is used as an indication of the beginning and the end of the season [114].

Statistical Indicators
To assess the accuracy of the yield estimations, the statistical indicators root-meansquare error (RMSE), mean absolute error (MAE), mean bias error (MBE), relative error (RE), and index of agreement (d) between the estimated and actual yield values are calculated: Root-mean-square error (RMSE): Mean absolute error (MAE): Mean bias error (MBE): Relative error (%RE): Index of agreement (d):  Table 5). The modeled potato yield for the analyzed field in the 2020 growing season at AREC, Bekaa (Lebanon) showed a low deviation from the reported yield, quantified at 8.2%. A better agreement was observed in the wheat crop, and the RMSE (RE) between the measured and simulated values were low (0.6 t/ha (0.83%)), and d was high (0.8), with a slight underestimate of −0.06 t/ha. The accuracy of estimation of the corn grain yields was not as good as that of potato and wheat, but it was acceptable. The index of agreement value (0.5) for the 2015 and 2016 growing season was acceptable, with an RMSE greater than 1 (1.3 t/ha). The RE was −3.4%, with a slight overestimate of 0.4 t/ha, which is considered subtle, and the MAE was equal to 1 t/ha. These values are very satisfactory. The 2016 growing season had six different sown hybrids, which may have contributed to this slight variation due to the use of a single HI (HI = 0.45) for all hybrids. The 2015 growing season had lower values of the RMSE, MAE, MBE, and RE. Additionally, the obtained RMSE in the present study is considered very good in comparison to the results of Sibley [24], where the authors reported RMSE values above 2 t/ha for the verified approach to be promising. Regarding the wheat grain yield at the Albacete study site, the RMSE between the modeled and actual values was 0.28 t/ha, with a slight overestimate of 0.22 t/ha. Additionally, a good agreement existed between the modeled and actual values, with a value of 0.99. Both RMSE and d values were better than those obtained by Campos et al. [115] for their tested approach on wheat fields located in the southeast of Spain, where they showed RMSE values varying between and 0.     Average reported yield (t/ha) 40 Figure S3 shows the absolute difference in percentage between estimated and actual yield values for the analyzed crops at a field level. Both wheat and potato at the West Bekaa study site had a percentage difference range of ±16%. Still, most of the percentage differences ranged between ±10%. For corn, the percentage difference in the grain yield ranged between −22.4% and 14.2%. The 2016 growing season had the largest error. The highest difference was observed in 2016 (Pivot 13B), with a value of −22.4%, followed by −18.8% in Field 16B. The 2015growing season had only one value greater than 16%. The Brazil site had more clouds than the other sites, and interpolation between non-cloudy Landsat overpasses will bring more uncertainty for crops with a shorter growing season (corn matures in 90 days). Excluding the fields that had the highest percentage difference, the percentage difference of the remaining fields was within ±16% for both Brazil and Lebanon sites. On the contrary, the modeled yield deviated by an average of −10.9% from the reported yield in only two fields 25W and 22W, having a difference of greater than ±15% at the Albacete (Spain) study site. This can be attributed to the small size of both fields (average area of 11.5 ha) compared to the average combined field sizes of 24 ha (Table 1). The variations of the AGDB and the soil moisture stress (SMS) scalar at satellite overpasses of the two emphasized potato fields during 2018 result in prominent differences in terms of the accumulated above-ground biomass and yield (Figure 7). A deficit in soil moisture can influence the length of the crop development stages through early crop senescence and thereby reduce the yield. Compared to some studies that neglect the impact of moisture stress when irrigation water is available [116], our analysis indicates that moisture stress represents a major limiting factor for LUE act and crop yields in irrigated fields. Results show that Field 31P had low moisture stress (in the neighborhood of one) throughout the days of the satellite overpass as compared to Field 30P, which experienced some incidents of soil moisture stress (in the neighborhood of zero) during two satellite overpasses (dates 11/03 and 12/04). Some patches of high moisture stress were noticed in Field 31P. Both fields experienced high moisture stress on the day nearest the harvest date (18/08). The high moisture stress observed during the satellite overpass in Field 30P is possibly due to a difference in irrigation time ( An interpretation of the light use efficiency, accumulated AGDB, and the reported and modeled yields between the two selected fields shows significant differences. With the same vapor and temperature stresses over the two fields, we noticed that the mean value of the LUE in Field 30P (1.23 g/MJ) was 12% lower than that in Field 31P. The mean value of the combined stresses was higher in Field 30P ( Table 6). The stresses showed up better in the reported yields, with an 11% difference in stress resulting in a 17% difference in yield. The modeled yield was only 3% less, possibly due to the assumption of the linearity in the interpolation between satellite overpasses. Other factors affecting the variation in the AGB and yield estimation are the crop's susceptibility to unfavorable growth conditions, differences in planting and harvest dates, weed control, and irrigation management. For instance, the presence of abundant weeds in the monitored fields could lead to overestimating biomass production and inaccurate estimation of crop yield. An interpretation of the light use efficiency, accumulated AGDB, and the reported and modeled yields between the two selected fields shows significant differences. With the same vapor and temperature stresses over the two fields, we noticed that the mean value of the LUE in Field 30P (1.23 g/MJ) was 12% lower than that in Field 31P. The mean value of the combined stresses was higher in Field 30P ( Table 6). The stresses showed up better in the reported yields, with an 11% difference in stress resulting in a 17% difference in yield. The modeled yield was only 3% less, possibly due to the assumption of the linearity in the interpolation between satellite overpasses. Other factors affecting the variation in the AGB and yield estimation are the crop's susceptibility to unfavorable growth conditions, differences in planting and harvest dates, weed control, and irrigation management. For instance, the presence of abundant weeds in the monitored fields could lead to overestimating biomass production and inaccurate estimation of crop yield.  Figure 8 shows the spatiotemporal distribution of the AGDB and soil moisture stress (SMS) over the corn-growing season (2016). Low biomass values were found in the initial stages (date 11/05) accompanied by high moisture stress due to the large area of uncovered soil in this period. Generally, the SMS values at the satellite overpass for four emphasized center pivots were close to unity, except at the beginning of the growing season after sowing and in the late season. SMS values started to decrease on 08/09 when the irrigation intervals became longer during the physiological maturity phase of corn. Consequently, based on SMS values that remained close to unity, it is evident that cultivated fields did not suffer from water stress and thus obtained high yields. These results are consistent with previous work, showing an average value of the water stress coefficient (0.94) for four selected center pivots monitored in 2016, indicating the low impact of water stress on biomass production [95]. The visual inspection of yield maps reveals that vegetation conditions were slightly variable between the four emphasized fields over the growing season. This could be due to the use of different corn cultivars.

Wheat: Albacete (Spain) Site
The biomass production dynamically changes during the growing cycle due to factors related to meteorological conditions and crop physiology and management [115]. Grown under different management conditions, and with different lengths of the growth cycle, irrigated Field 26W produced a higher accumulated AGDB and yield than rain-fed Field 24W, possibly due to severe water-limited conditions in Field 24W. The growing cycle in irrigated varieties spans from January to the end of June and in rain-fed varieties from November to the end of May. Selected images from the growing cycle over the emphasized fields are displayed in Figure 9. An increase in the AGDB during the development phase of both Fields 24W and 26W was noted, which can be confirmed by the difference between the selected first and last images (dates 07/03 and 20/06 for Field 24W and 08/03 and 26/05 for Field 26W). During the mid-season (dates 24/03 and 24/04 for Field 24W and 18/05 for Field 26W), high values of the AGDB are observed because of the good crop establishment. High soil moisture stress is visible at the initial stages of the growth cycle for both fields (dates 07/03 for Field 24W and 08/03 for Field 26W). Figure 10 shows the possible detection of high soil moisture stress (SMS) within-field variability at Landsat overpasses, evident on 16/04 and 18/05 for Field 26W and 08/04 for Field 24W. This approach can be advantageous for precision irrigation, providing accurate information about irrigation practices, such as water application uniformity and areas with surface runoff or irrigation deficits near the outside borderline of the central pivot systems, which was confirmed in Field 26W on 20/06, suffering from high moisture stress (~0.5) near its outer boundary. Remote Sens. 2021, 13, x FOR PEER REVIEW 22 of 34 Figure 8. Spatiotemporal distribution of the corn above-ground dry biomass (AGDB, t/ha), soil moisture stress (SMS), accumulated above-ground dry biomass (sum AGDB, t/ha), and yield (t/ha) values as a function of the dates of Landsat satellite overpass (dd/mm:11/05,20/06,17/07,07/08,31/08, and 08/09) during the 2016 growing season in the São Desidério (Brazil) site; emphasis on four fields: 08A through 11A.

Wheat: Albacete (Spain) Site
The biomass production dynamically changes during the growing cycle due to factors related to meteorological conditions and crop physiology and management [119]. Grown under different management conditions, and with different lengths of the growth cycle, irrigated Field 26W produced a higher accumulated AGDB and yield than rain-fed Field 24W, possibly due to severe water-limited conditions in Field 24W. The growing cycle in irrigated varieties spans from January to the end of June and in rain-fed varieties from November to the end of May. Selected images from the growing cycle over the emphasized fields are displayed in Figure 9. An increase in the AGDB during the development phase of both Fields 24W and 26W was noted, which can be confirmed by the difference between the selected first and last images (dates 07/03 and 20/06 for Field 24W and 08/03 and 26/05 for Field 26W). During the mid-season (dates 24/03 and 24/04 for Field 24W and 18/05 for Field 26W), high values of the AGDB are observed because of the good crop establishment. High soil moisture stress is visible at the initial stages of the growth cycle for both fields (dates 07/03 for Field 24W and 08/03 for Field 26W). Figure 10 shows the possible detection of high soil moisture stress (SMS) within-field variability at Landsat overpasses, evident on 16/04 and 18/05 for Field 26W and 08/04 for Field 24W. This approach can be advantageous for precision irrigation, providing accurate information about irrigation practices, such as water application uniformity and areas with surface runoff or irrigation deficits near the outside borderline of the central pivot systems, which was confirmed in Field 26W on 20/06, suffering from high moisture stress (~0.5) near its outer boundary.

Seasonal Co-Variability in Above-Ground Dry Biomass (AGDB), Environmental Stressors, and Yield: Examples from Lebanon
Understanding patterns of the AGB is essential for guiding farming practices and helps farmers in decision making. Figure 10 shows the time development of the monthly AGDB. The variability across the months is considerable, reflecting differences in potato and wheat crop phenology. It is possible to detect very similar behaviors for both wheat and potato, with a fast increase in biomass at the start of the growing cycle, stability during the mid-season, and a decrease at the end. The AGDB started to increase in March, and it peaked in May (middle of the season) at ~0.2 t/ha. A decrease in the monthly AGDB was noticed at the end of the growing season in August. As for wheat, the monthly biomass reached its maximum in March-April, at ~0.12 t/ha, and then decreased toward the end of the growing season (June-July).
The potato crop in the Bekaa site diverged in its AGDB temporal profile in the 2017 and 2018 growing seasons. These deviations can be attributed to the variation of environmental stressors among the seasons of the same crops. To study the impact of the environmental stressors, an inter-comparison between seasons was performed. Our analysis shows that the soil moisture, temperature, and combined stresses were higher in summer 2017 as compared to 2018 (Table 7). This was notable in terms of the maximum actual LUE and the AGDB for potato in summer 2018, which were, respectively, 5% and 10% higher than those in 2017 (due to higher moisture and temperature stress). Furthermore, the average modeled and reported yields for potato in 2018 were 9% and 6% higher than those in 2017, respectively. In contrast to potato crop, wheat crop studied during winter 2017-2018 experienced lower environmental stresses (with values approaching unity, indicating low stress). This is evident in terms of a higher average combined stress factor and lower moisture stress (Table 7).

Seasonal Co-Variability in Above-Ground Dry Biomass (AGDB), Environmental Stressors, and Yield: Examples from Lebanon
Understanding patterns of the AGB is essential for guiding farming practices and helps farmers in decision making. Figure 10 shows the time development of the monthly AGDB. The variability across the months is considerable, reflecting differences in potato and wheat crop phenology. It is possible to detect very similar behaviors for both wheat and potato, with a fast increase in biomass at the start of the growing cycle, stability during the mid-season, and a decrease at the end. The AGDB started to increase in March, and it peaked in May (middle of the season) at~0.2 t/ha. A decrease in the monthly AGDB was noticed at the end of the growing season in August. As for wheat, the monthly biomass reached its maximum in March-April, at~0.12 t/ha, and then decreased toward the end of the growing season (June-July).
The potato crop in the Bekaa site diverged in its AGDB temporal profile in the 2017 and 2018 growing seasons. These deviations can be attributed to the variation of environmental stressors among the seasons of the same crops. To study the impact of the environmental stressors, an inter-comparison between seasons was performed. Our analysis shows that the soil moisture, temperature, and combined stresses were higher in summer 2017 as compared to 2018 (Table 7). This was notable in terms of the maximum actual LUE and the AGDB for potato in summer 2018, which were, respectively, 5% and 10% higher than those in 2017 (due to higher moisture and temperature stress). Furthermore, the average modeled and reported yields for potato in 2018 were 9% and 6% higher than those in 2017, respectively. In contrast to potato crop, wheat crop studied during winter 2017-2018 experienced lower environmental stresses (with values approaching unity, indicating low stress). This is evident in terms of a higher average combined stress factor and lower moisture stress (Table 7).

Assessment of the Operational Model: Strengths and Weaknesses
Our work's breakthrough is that it is the first to provide 30 m crop biomass with relatively high spatial and temporal resolution using soil and weather data along with ET data generated in the same model. GYMEE is highly adaptable, and it can ingest Sentinel-2 multispectral imagery and the VIIRS-375 m I-5 thermal band, allowing even for higher spatial (10 m) and temporal (2-3 days) resolution. It can also adapt to various single-and dual-source energy balance ET models, such SSEB, SEBS, and TSEB. A cropspecific LUE max should be utilized whenever crop data are known. Given that there is no consensus in the scientific community on the maximum LUE for even the same crop (see, for example, Chen et al. [117] and Zhu et al. [118]) and given that GYMEE performed very well with these average values, we believe that the two values (2.5 for C3 and 3.5 for C4) provide good enough estimates for modeling the biomass using the Monteith equation. The LUE max parameterization might introduce some uncertainty, but we believe that it is a strength of the model that it performed very well with the average values of LUE max . It would be straightforward to update GYMEE with a LUE max map derived from a crop map when such a map becomes available. At present, many global products for land-cover classification are available, which only provide information for broad classes, such as forest, cropland, and grassland, but not crop types. The dynamic nature of cropping patterns and within-year variations needs to be considered since this is a field-scale model. The analyses performed consider different physical soil properties by using global gridded soil data from an Open Land Map Earth Engine dataset Hengl and MacMillan [85] available in GEE. Additionally, the proposed operational model considers the variations in environmental conditions by incorporating environmental stresses (temperature, vapor, and soil moisture constraints) computed from agrometeorological data of the global 6 h CFSV2 gridded weather dataset available in GEE. The proposed model's major operational advantage is the spatial quantification of the soil moisture stress constraint and its impact on reducing biomass production. This approach can be alternatively used for areas with limited ground data availability.
There are a few uncertainties for simulating the AGB as well as crop yield, including uncertainties of satellite data and model parameters. We use multispectral VIs to determine the length of the growing cycle. However, this approach is limited under cloudy conditions. Future implementations could harness NDVI-S1 radar data relationships to fill in long gaps over cloudy regions. In addition, if a scene is missed due to clouds, the small temporal resolution (say 16 days) under water deficit conditions could lead to inaccurate biomass values [119]. In these situations, the use of different thermal times like growingdegree-days and accumulated reference evapotranspiration (ETref) with the temporal evolution of VIs to monitor phenology could be an alternative [120]. The parameterization of the LAI and the fPAR would also bring some uncertainty as there is no consensus in the literature on a physical method to derive the LAI from remote sensing [121]. Even though the percentage difference between the modeled and actual yields is considered low (Figure 7), the prediction may be improved by using crop-specific parameters (harvest index and percentage moisture content of produce) instead of single values. It is vital to use different HIs when considering different hybrids and years under various weather environments [95]. Crop yield varies with the crop's moisture content (percentage moisture) at harvest and with the HI. The HI varies with the variety of the crop and the cultural practices, and therefore a range should be provided. Careful estimation of the HI and the percentage moisture is necessary to derive accurate yields from biomass calculations. For example, a high-yielding variety of the same crop would have a higher HI. GYMEE does not predict the HI or the percentage moisture, both of which should be specified by the user to derive the yield from the biomass. Other uncertainties could arise from the constants in the heat stress equation. A sensitivity analysis of the response of the heat stress to changes in the parameters is presented in Figure S4 (Supplementary Material). With the simulated results, it can be deduced that both values of optimal conductance temperature and temperature stress scalar would have had a low uncertainty for the crop yield estimation. Of course, model performance could be better with optimized T H and K T values specific to each crop type.
Some uncertainty, which is not due to the model itself, might arise from actual crop yield data acquisition. For example, we used the recall method for the Skaff (Bekaa, Lebanon) site. Although this approach produced good results, it may include uncertainty if farmers over-or under-report their crop yields or when farmers do not account for within-field variability as they provide values on a whole-plot basis [99,122]. Other sources of error, not directly attributable to the model, are issues due to the size and shape of the fields. Overall, the poorest performance was obtained in small fields (Fields 25W and 22W from the Spain study site), with an average size less than 12 ha. The accuracy of the yield estimation is affected in small fields [123], and here Sentinel-2 can be used. The model is also sensitive to landscape evaporation. Using an ET model other than the modified pySEBAL might yield is a different transpiration component that would influence the soil moisture stress scalar of the yield. An ensemble of ET models may be a better approach. In fact, we have already started implementing GYMEE using a harmonized Landsat-Sentinel-2 product and sharpened thermal bands of MODIS and VIIRS, along with a more stable energy balance model.

Conclusions
In this study, a newly proposed GYMEE model for predicting yield at the farm level, based on the integration of ET and abiotic stressors into the light use efficiency and Monteith model, was presented and evaluated using surface reflectance images from Landsat 7 and 8 satellites. GYMEE shows promising results in running within the GEE platform. GYMEE was validated at different sites (Lebanon, Spain, and Brazil) for wheat, potato, and corn. The results obtained in the present work endorse the use of remote sensing as a helpful tool for the operational estimation of crop yield at a field scale under a wide range of ecology, management systems, and soil types. Compared to the literature, this research focuses on the following aspects: (i) spatio-temporal crop yield modeling at the small field scale under a wide range of management conditions, (ii) an operational framework for crop yield modeling, and (iii) integration of global weather data (CFSV2) and global soil data into actual evapotranspiration and abiotic stressor calculations for biomass calculations. Because it is implemented in GEE, GYMME can be used to estimate the ET at the global scale and improve the understanding of water use by crops as well as improve estimates of the ET as a major component of the hydrologic cycle. As more soil moisture products become, available either via further analysis of radar data, such as Sentinel-1, or via downscaling of existing soil moisture missions such as Soil Moisture Active Passive (SMAP), a more comprehensive sensitivity analysis becomes possible for validating the model in other areas of the world where yield data are available. We conclude that the suggested method can be a useful tool for estimating the yield of the studied crops in Mediterranean, semi-arid, and tropical climates.

Supplementary Materials:
The following are available online at https://www.mdpi.com/2072-429 2/13/4/773/s1: Table S1: Agronomic measurements for potato crop at Skaff (West Bekaa, Lebanon) (average of three samples for each date), Table S2: Agronomic measurements for wheat at Skaff (West Bekaa, Lebanon) (average of three samples for each date), Table S3: Agronomic measurements for potato at AREC (Bekaa, Lebanon), Table S4: At-site cloud-free Landsat imagery used in this study, with respective dates; L7 (Landsat 7) and L8 (Landsat 8); Figure S1: Wheat-fields NDVI time series Plots for winter 2017-2018 season, Figure S2: Potato fields NDVI time series plots for the summer 2018 season, Figure S3: Percentage deviation (%) of modeled yield from the actual yield for the analyzed crops at each of the studied sites, Figure S4: Simulations of temperature stress (TS) at a fixed daily air temperature of (28°C) as a function of: (a) the upper limit of stomatal activity (°C) (TH) and (b) the optimum conductance temperature (°C) (KT).