Within-Field Rice Yield Estimation Based on Sentinel-2 Satellite Data

: Rice is considered one of the most important crops in the world. According to the Food and Agriculture Organization of the United Nations (FAO), rice production has increased signiﬁcantly (156%) during the last 50 years, with a limited increase in cultivated area (24%). With the recent advances in remote sensing technologies, it is now possible to monitor rice crop production for a better understanding of its management at ﬁeld scale to ultimately improve rice yields. In this work, we monitor within-ﬁeld rice production of the two main rice varieties grown in Valencia (Spain) JSendra and Bomba during the 2020 season. The sowing date of both varieties was May 22–25, while the harvesting date was September 15–17 for Bomba and October 5–8 for JSendra . Rice yield data was collected over 66.03 ha (52 ﬁelds) by harvesting machines equipped with onboard sensors that determine the dry grain yield within irregular polygons of 3–7 m width. This dataset was split in two, selecting 70% of ﬁelds for training and 30% for validation purposes. Sentinel-2 surface reﬂectance spectral data acquired from May until September 2020 was considered over the test area at the two different spatial resolutions of 10 and 20 m. These two datasets were combined assessing the best combination of spectral reﬂectance bands (SR) or vegetation indices (VIs) as well as the best timing to infer ﬁnal within-ﬁeld yields. The results show that SR improves the performance of models with VIs. Furthermore, the correlation of each spectral band and VIs with the ﬁnal yield changes with the dates and varieties. Considering the training data, the best correlation with the yields is obtained on July 4, with R 2 for JSendra of 0.72 at 10 m and 0.76 at 20 m resolution, while the R 2 for Bomba is 0.87 at 10 m and 0.92 at 20 m resolution. Based on the validation dataset, the proposed models provide within-ﬁeld yield modelling Mean Absolute Error (MAE) of 0.254 t · ha − 1 (Mean Absolute Percentage Error, MAPE, of 3.73%) for JSendra at 10 m (0.240 t · ha − 1 ; 3.48% at 20 m) and 0.218 t · ha − 1 (MAPE 5.82%) for Bomba (0.223 t · ha − 1 ; 5.78% at 20 m) on July 4, that is three months before harvest. At parcel level the model’s MAE is 0.176 t · ha − 1 (MAPE 2.61%) for JSendra and 0.142 t · ha − 1 (MAPE 4.51%) for Bomba . These results conﬁrm the close correlation between the rice yield and the spectral information from satellite imagery. Additionally, these models provide a timeliness overview of underperforming areas within the ﬁeld three months before the harvest where farmers can improve their management practices. Furthermore, it highlights the importance of optimum agronomic management of rice plants during the ﬁrst weeks of rice cultivation (40–50 days after sowing) to achieve high yields.


Introduction
Rice (Oryza sativa L.) is one of the three most important crops in the world, being vital for the world's food supply. Rice is grown in Asia, America, Australia, Europe, and Africa, following diverse production practices. In 2019, the global rice production was close to 755 million tons, covering an area of 162 million hectares. In Europe, its production is mostly located in the southern countries, Spain being one of the major producers and representing approximately 28% of the total production across the European Union in 2019 [1]. The Valencian Community produces 16% (125 thousand tons) of the national rice, covering an area of approximately 15 thousand hectares in 2019 [2]. During the last 50 years, world rice production has increased by 156%, while the cultivated area has only increased by 24%, resulting in a 107% increase in yield. The evolution of world population growth and the increase in rice production have been closely connected over the last 50 years [1]. According to the latest United Nations' forecasts for the next 50 years, the world population could increase by 35% [3], which means that rice production must continue increasing. Given that the statistics of the last five decades show that the cultivated area has hardly changed, the required increase in rice production must imply an increase in yield [4].
Earth observation (EO) data allow efficient monitoring of crop growth at field scale owning to its high spatial (~10 m) and temporal (each 5 days) resolutions (e.g., [5][6][7]). The launch of the ESA optical high-resolution missions Sentinel-2A in 2015 and Sentinel-2B in 2017 and their fusion with Landsat providing free high-frequency and high-resolution data, along with the significant advances in big data analytics and high-performance computing, have revolutionized the EO uptake for agriculture applications. Recent works have shown that satellite data can be used to infer crop yields both within-field and at field scales. For instance, Skakun [8] showed that corn and soybean yield variability could be explained using Planet 3 m spatial resolution while the coarser resolution of Sentinel-2 (10 m) reduces the explained yield variability by 14%. The same study also concluded that the most important spectral bands explaining the yield variability are green, red edge and Near Infrared (NIR) but the lower correlation obtained for high yields suggested saturation of multi-spectral optical data. Kayad [9] investigated different Vegetation Indices (VI) from Sentinel-2 and machine learning techniques to assess corn yield within the field scale. Their study selected as the best VI the Green Normalized Difference Vegetation Index (GNDVI), which integrated into a Random Forest provided an R 2 of 0.60 over an independent validation test. Zhao [10] tested a set of VI from Sentinel-2 to estimate wheat yield at the field scale. Deines [11] applied the Scalable Crop Yield Mapper (SCYM) [12] to over one million corn fields yield observations spanning nine states in the US Corn Belt and based on Landsat observations. Their results showed that remote sensing technology at within-field scale show coefficients of determination of 0.40 and increase to 0.45 and 0.69 when aggregated to field level and county level scales, respectively.
Focusing on rice crops, the Asia-RiCE initiative [13] is the rice monitoring component of the Group on Earth Observations Global Agricultural Monitoring initiative (GE-OGLAM) [14]. Its main goal is to foster the use of EO, leveraging existing agricultural monitoring programs and initiatives for rice monitoring at national, regional and global scales as an input to the GEOGLAM Crop Monitor [15] and the Agricultural Market Information System (AMIS) Market Monitor [16]. EO technologies have proven to be useful in monitoring rice crops biophysical and biochemical properties pushing towards a more efficient farming management. The implementation of remote sensing technologies in rice farming has evident advantages in monitoring rice growth, soil fertility evaluation, detection of diseases, and yield estimation, among others. Previous works have shown that the growth of the rice crops, as well as the nitrogen content, can be inferred based on the spectral reflectance from EO sensors. For instance, Cai [17] showed that the Normalized Difference Vegetation Index (NDVI) or the Green NDVI (GDVI) can be used to monitor the nitrogen content of rice crops. Shi [18] studied the use of different VIs to map the distribution of rice diseases such as rice blast. In fact, several studies (e.g. [19][20][21]) show that the NIR spectral region is sensitive enough to detect the first symptoms produced by rice blast, and other studies show how remote sensing data can be used to monitor the nitrogen deficit in rice (e.g., [22,23]). Regarding rice yield monitoring, Gilardelli [24] assimilated Leaf Area Index (LAI) data from Landsat and Sentinel-2 at 30 m spatial resolution into the WARM rice model [25] to derive within-field rice yields in Italy with a Mean Absolute Error (MAE) of 0.66 t·ha −1 and Relative Root Mean Square Error (RRMSE) of 13.8%. Finally, an extensive overview on rice yield estimation or forecasting based on EO can be found in Mosleh [26]. One of the major challenges in rice monitoring over Asia is the high frequency of cloud coverage. Therefore, most of the works based on this area use synthetic aperture radar (SAR) for rice yield monitoring (e.g. [27][28][29]). Other works use optical data based on VI that are then related to forecast rice yields mostly at regional level using coarse resolution data (MODIS [30,31], SPOT-4 [32] or AVHRR [33]) or medium resolution (Landsat [34]). However, most works or rice yield monitoring are applied at regional scale or at parcel level and there are essentially no works on rice yield estimation at within-field scale. Additionally, VIs are widely used as inputs for the yield models, and Skakun [8] showed that surface reflectance-based models outperformed VI-based models highlighting the importance of incorporating surface reflectance (SR) values directly into the yield models. Therefore, in this work we aim to expand the science of rice yield monitoring at within-field scale, testing all spectral information retrieved by Sentinel-2. Particularly, the main science questions that we aim to answer in this study are: To what degree can EO data in the optical spectral region explain the rice within-field yield variability? Which spectral band or combination of bands is better correlated with the final yield? When is the most critical timing of the rice growing season when satellite imagery can explain the final yields?
In this work we present a study to monitor rice within-field yield variability. It is based on multi-spectral SR data retrieved by Sentinel-2 and rice yield maps acquired with harvesting machines over 52 fields covering a total area of 66ha in Valencia (Spain) during the 2020 season.

Study Area
The coastal wetland Albufera is located at the Mediterranean Spanish coast and south of Valencia city (39 • 20 N, 0 • 21 W). The wetland has an area of 211.2 km 2 and is bordered by the Turia (north) and Jucar (south) rivers. The Albufera is the second largest lake in the Iberian Peninsula with an area of 23.2 km 2 . It is shallow (1.2 m of mean depth) and has a mean salinity of 1-2%. The European Commission [35] considers the Albufera a special protected area in the Natura 2000 network and restricts agriculture practices in the area to only rice crops. This area has a typical Mediterranean climate, with an annual mean air temperature and humidity of 18.3 ± 0.1 • C and 65.0 ± 0.5%, respectively [36]. Annual rainfall is usually concentrated in the spring and autumn seasons. The water level of the lake is regulated by a large network of irrigation channels that connect it to the rice fields and to the Mediterranean Sea. This area can be considered a homogeneous rice planting area of approximately 10 × 20 km 2 extension.
JSendra and Bomba common japonica-type Spanish rice varieties (Oryza sativa var.japonica), were used in the experiment. The most commercial cultivar in East Spain (Valencia) is JSendra (cross between M202 and Senia in 2005 by Instituto Valenciano de Investigaciones Agrarias, IVIA), characterized by high yields (typically 7.5-8.0 t·ha −1 ) and good culinary quality. Bomba is a traditional rice variety, obtained by selection in 1929 in Valencia, and it is characterized by low yields (typically 3.5-4.0 t·ha −1 ) and excellent culinary quality. The differences in performance between the two varieties are significant: JSendra plants have a greater number of tillers, a lower height, and a longer season than Bomba plants. The planted area of JSendra in 2020 was 6716 ha (44%), while in Bomba's planted area was 1777 ha (12%). Table 1 shows the timing of the main stages of both varieties in Valencia.

Field Data Pre-Processing
Field data was collected by harvesting machines and provide dry volumetric yield in irregular polygons as shown in Figure 1A. These data were preprocessed to remove errors in the software acquisition such as turns, overlaps (keeping the first overpass) and extreme values (JSendra: >11.0 t·ha −1 ; Bomba: >7.0 t·ha −1 ) that had no biological meaning [37]. Additionally, we removed small sized polygons with extreme values compared to the neighbors [38]. Given the noise observed in the dataset, these data were aggregated to an intermediate 5 m resolution ( Figure 1B) to apply a grid moving average filter of 7 × 7 pixels ( Figure 1C). Finally, this dataset was aggregated to 10 m resolution removing the edge pixels ( Figure 1D). area was 1777 ha (12%). Table 1 shows the timing of the main stages of both varieties in Valencia.

Field Data Pre-Processing
Field data was collected by harvesting machines and provide dry volumetric yield in irregular polygons as shown in Figure 1A. These data were preprocessed to remove errors in the software acquisition such as turns, overlaps (keeping the first overpass) and extreme values (JSendra: >11.0 t⋅ha −1 ; Bomba: >7.0 t⋅ha −1 ) that had no biological meaning [37]. Additionally, we removed small sized polygons with extreme values compared to the neighbors [38]. Given the noise observed in the dataset, these data were aggregated to an intermediate 5 m resolution ( Figure 1B) to apply a grid moving average filter of 7 × 7 pixels ( Figure 1C). Finally, this dataset was aggregated to 10 m resolution removing the edge pixels ( Figure 1D).

Satellite Data
We downloaded Sentinel-2A and Sentinel-2B data covering the main rice area in Valencia, that is using the tiles 30SYJ and 31SBD, from May until September 2020 and we applied the atmospheric correction algorithm LaSRC [39]. All the Sentinel-2 spectral bands at 10 m and 20 m spatial resolution were analyzed to select the best combination to infer rice crop yield. Table 2 shows the spectral response and spatial resolution of the considered bands.

Satellite Data
We downloaded Sentinel-2A and Sentinel-2B data covering the main rice area in Valencia, that is using the tiles 30SYJ and 31SBD, from May until September 2020 and we applied the atmospheric correction algorithm LaSRC [39]. All the Sentinel-2 spectral bands at 10 m and 20 m spatial resolution were analyzed to select the best combination to infer rice crop yield. Table 2 shows the spectral response and spatial resolution of the considered bands.

Methods
The two rice varieties were treated separately. For each variety, 70% of the data was randomly selected as training data and 30% of data was kept for validation purposes (Table 3). Additionally, Figure 2 shows the spatial distribution of the datasets.

Methods
The two rice varieties were treated separately. For each variety, 70% of the data was randomly selected as training data and 30% of data was kept for validation purposes (Table 3). Additionally, Figure 2 shows the spatial distribution of the datasets.   Each date and spatial resolution were analyzed independently, looking for the best combination that provided the best performance metrics against the yield values. Additionally, multivariable linear regressions were built between yields at pixel level and their corresponding surface reflectance values of a given band, interactions between bands and vegetation indices (Appendix C). Specifically, when combining all bands at each spatial resolution, the following equation was tested to derive the yield of pixel i: y est = a 0 + a 1 ·x 1 + a 2 ·x 2 + a 3 ·x 3 + · · · + a n ·x n (1) where y est is the estimated yield; a 0 , a 1 , a 2 , a 3 , a n are the model coefficients; and x 1 , x 2 , x 3 , x n are the surface reflectance (SR) spectral bands, vegetation index (VI) or the interaction between the different SR for a given date of the growing season. The software StatGraphics Centurion XVII (v.17.2.00) [40] was implemented to determine the best combination of SR on equation 1, using a stepwise regression method, for each date, spatial resolution, and variety.
Finally, the following performance metrics were derived based on the validation data: the coefficient of determination (R 2 ); the Mean Absolute Error (MAE); and the Mean Absolute Percentage Error (MAPE).
These performance metrics of each model based on a spectral band or combination of bands. Finally, each of the variables were validated with a linear regression analysis to test the statistical significance of each factor with a probability p < 0.05, according to the Student's t-test. The linear model was validated with the Snedecor statistical F-test (p < 0.05).
The processing of the yield maps and the satellite images, as well as the validation, was carried out with the QGIS 3.10.14 software [41].

Evaluation at Within-Field Level in JSendra Rice
The JSendra fields' coefficient of determination evolution for each Sentinel-2 spectral band at 10 m ( Figure A1a) and at 20 m ( Figure A1b) spatial resolution is analyzed in Appendix A. These results include the R 2 temporal evolution of three simple linear regression models combining the different spectral bands (M1S in Figure A1a and M2S and M3S in Figure A1b). Note that the R 2 values studied represent all within-field pixels of the JSendra training data. The models can be written as: where the spatial resolution is specified in parenthesis and the model coefficients are different for each equation and for each date. Figure A1a shows that at 10 m, the band that shows the highest correlation with the yield is B8 (NIR), followed by B2 (Blue) and B3 (Green), while B4 (Red) shows the lowest correlation. At 20 m ( Figure A1b), B8A shows the highest correlation, followed by B8 and B7 (Red edge) whose evolution is very close to B6 (Red edge), but has a minor peak at the beginning of the season. Other bands, such B11 and B12 (SWIR), show some lower peaks of correlation, having only one significant peak on July 4. Visible bands show a similar tendency compared to 10 m resolution and B5 (Red edge) is very close to B4 (Red). The linear models combining spectral bands (M1S, M2S and M3S) show the best correlation, which is consistent with previous works utilizing remote sensing data to infer within-field maize and soybean yields [8]. The best performance (based on the coefficient of determination evolution) happens on July 4 for both spatial resolutions, that is when rice is in the tillering phase. Comparing the performance at 10 m and 20 m the coefficient of determination remains almost the same and shows an equivalent evolution.
Focusing on July 4, Figure 3 shows the average spectral response of each band, splitting all training data into nine yield ranges. Note that we preserved the spatial resolution of each band. B6, B7, B8A and specially B8 show the largest differences amongst yield ranges by increasing their reflectance values with the yield.
beginning of the season. Other bands, such B11 and B12 (SWIR), show some lower peaks of correlation, having only one significant peak on July 4. Visible bands show a similar tendency compared to 10 m resolution and B5 (Red edge) is very close to B4 (Red). The linear models combining spectral bands (M1S, M2S and M3S) show the best correlation, which is consistent with previous works utilizing remote sensing data to infer within-field maize and soybean yields [8]. The best performance (based on the coefficient of determination evolution) happens on July 4 for both spatial resolutions, that is when rice is in the tillering phase. Comparing the performance at 10 m and 20 m the coefficient of determination remains almost the same and shows an equivalent evolution.
Focusing on July 4, Figure 3 shows the average spectral response of each band, splitting all training data into nine yield ranges. Note that we preserved the spatial resolution of each band. B6, B7, B8A and specially B8 show the largest differences amongst yield ranges by increasing their reflectance values with the yield. Next, Tables A1 and A2 (Appendix B) show the performance metrics of each model at 10 m and 20 m spatial resolution, respectively, focusing on the tillering phase and based on the training data. This phenological stage has been determined in the previous results as critical to monitor rice yield. Three Sentinel-2 acquisition dates were selected to analyze the performance metrics of each model: the date of the R 2 peak (July 4), the previous date (June 29) and the next one (July 19). Note that in these tables, models consisting in bands are built based on a linear regression of said band or combination of band. Additionally, the tables show the performance metrics of three models (M1S+, M2S+ and M3S+) whose definition is based on a stepwise regression program (StatGraphics). These models are written as: where the spatial resolution is specified in parenthesis and the model coefficients are different for each equation and for each date. As a summary of the results, Table 4 shows the evaluation of the best performing models for each date and spatial resolution. Next, Tables A1 and A2 (Appendix B) show the performance metrics of each model at 10 m and 20 m spatial resolution, respectively, focusing on the tillering phase and based on the training data. This phenological stage has been determined in the previous results as critical to monitor rice yield. Three Sentinel-2 acquisition dates were selected to analyze the performance metrics of each model: the date of the R 2 peak (July 4), the previous date (June 29) and the next one (July 19). Note that in these tables, models consisting in bands are built based on a linear regression of said band or combination of band. Additionally, the tables show the performance metrics of three models (M1S+, M2S+ and M3S+) whose definition is based on a stepwise regression program (StatGraphics). These models are written as: where the spatial resolution is specified in parenthesis and the model coefficients are different for each equation and for each date. As a summary of the results, Table 4 shows the evaluation of the best performing models for each date and spatial resolution. The best performing models at 10 m are M1S and M1S+, which combine the four bands (B2, B3, B4 and B8). In relation to 20 m, the best performing models are M3S, M2S+ and M3S+. Comparing the dates, July 4 observations provide the best performance metrics, showing R 2 greater than 0.70, MAPEs lower than 4% (<3% at 20 m) and MAEs ranging from 0.20-0.25 t·ha −1 . In relation to the VIs (based on visible, NIR, Red edge and SWIR regions), they do not improve considerably the spectral bands or linear combinations of bands with higher R 2 (data, equations and references are shown in Appendix C). The best  Finally, Figure 4 shows the evaluation of the best performing models (M1S and M1S+ at 10 m and M3S and M3S+ at 20 m) for the JSendra variety on July 4 considering all data from the validation fields for each spatial resolution. M1S+ and M3S+ provide a yield modelling error (MAE) of 0.254 t·ha −1 (3.73%) and 0.240 t·ha −1 (3.48%), respectively, which is aligned with the statistics shown on the training data. In fact, 60% of all pixels evaluated presented a difference between modelled and referenced yield lower than the MAE value, with a maximum difference of 1.0 t·ha −1 and with only 10% of data exceeding a difference of 0.50 t·ha −1 . Additionally, Figure 4 (right) shows the performance metrics of the models without interactions at a spatial resolution of 10 m (M1S) and 20 m (M3S). The performance metrics (MAE of 0.318 t·ha −1 (4.71%) at 10 m and 0.305 t·ha −1 (4.50%) at 20 m) show worse results than on the training data modelling, which suggests that the interactions proposed in M1S+ and M3S+ models provide a better consistency and applicability. Finally, we tested a Random Forest model considering all bands from the training data as inputs and evaluating its performance with the validation data. This model provides an R 2 of 0.79 and MAE of 0.292 t·ha −1 (4.33%) at 10 m and R 2 of 0.78 and MAE of 0.302 t·ha −1 (4.46%) at 20 m. These statistics are equivalent to M1S+ and M3S+. The best performing models at 10 m are M1S and M1S+, which combine the four bands (B2, B3, B4 and B8). In relation to 20 m, the best performing models are M3S, M2S+ and M3S+. Comparing the dates, July 4 observations provide the best performance metrics, showing R 2 greater than 0.70, MAPEs lower than 4% (<3% at 20 m) and MAEs ranging from 0.20-0.25 t⋅ha −1 . In relation to the VIs (based on visible, NIR, Red edge and SWIR regions), they do not improve considerably the spectral bands or linear combinations of bands with higher R 2 (data, equations and references are shown in Appendix C). The best results for VIs are obtained with the Rice Growth Vegetation Index (RGVI) (R 2 = 0.66). Appendix D shows an example at 10 m and 20 m of a reference yield map, each spectral band, and the modeled yield.
Finally, Figure 4 shows the evaluation of the best performing models (M1S and M1S+ at 10 m and M3S and M3S+ at 20 m) for the JSendra variety on July 4 considering all data from the validation fields for each spatial resolution. M1S+ and M3S+ provide a yield modelling error (MAE) of 0.254 t⋅ha −1 (3.73%) and 0.240 t⋅ha −1 (3.48%), respectively, which is aligned with the statistics shown on the training data. In fact, 60% of all pixels evaluated presented a difference between modelled and referenced yield lower than the MAE value, with a maximum difference of 1.0 t·ha −1 and with only 10% of data exceeding a difference of 0.50 t·ha −1 . Additionally, Figure 4

Evaluation at Within-Field Level in Bomba Rice
Focusing now on the Bomba variety, Figure A2 (Appendix A) shows the coefficient of determination evolution along the rice season for each Sentinel-2 band acquisition at 10 m ( Figure A2a) and at 20 m ( Figure A2b) spatial resolution considering the training data. Additionally, the figures include the R 2 for three models which linearly combine all bands, following the same structure than the models described in JSendra (M1S, M2S and M3S) Remote Sens. 2021, 13, 4095 9 of 22 but changing the model coefficients. Thus, for Bomba these equations are named M1B, M2B and M3B (M1B in Figure A2a and M2B and M3B in Figure A2b). Note that Bomba rice was harvested by mid-September, thus latter dates are excluded in the plot.
In this case, there are two dates with the maximum correlation at 10 m and 20 m, July 4 and August 28, although the latter includes half the number of pixels than the former. Analogously to JSendra, the results are similar at both spatial resolutions. However, in Bomba rice the visible bands (B2, B3, B4) show the best correlation with the final yields on July 4. B5 shows a very similar performance to the visible bands, and even improves and anticipates the correlation peak with the final yield on June 29. Furthermore, these bands keep a high R 2 throughout July and decrease in August (when the flowering stage happens). Additionally, B3 (Green) and B5 show an acceptable value of R 2 on August 28. On the contrary, the NIR bands show poor correlation on July 4, which is improved by mid-season (end of July), and finally shows an R 2 peak on August 28. Analogously to JSendra, B8, B8A and B7 present very similar results. B6 shows an equivalent evolution to these bands but with a poor correlation during mid-season. Regarding the SWIR bands, at the beginning of the season they show a small correlation peak which decreases until reaching a minimum by the end of June and then improves again, reaching a peak by the end of July which is comparable to the visible bands' correlation. Finally, the models M1B, M2B and M3B show the best performance.
Focusing on July 4, Figure 5 shows the average spectral surface reflectance for all pixels within the training fields splitting them by yield ranges. In this variety and date, the visible bands (specially B3) have a larger separability showing lower SR values for higher yields. Otherwise, the NIR, Red edge and SWIR reflectance do not show a clear dependency with the yield.

Evaluation at Within-Field Level in Bomba Rice
Focusing now on the Bomba variety, Figure A2 (Appendix A) shows the coefficient of determination evolution along the rice season for each Sentinel-2 band acquisition at 10 m ( Figure A2a) and at 20 m ( Figure A2b) spatial resolution considering the training data. Additionally, the figures include the R 2 for three models which linearly combine all bands, following the same structure than the models described in JSendra (M1S, M2S and M3S) but changing the model coefficients. Thus, for Bomba these equations are named M1B, M2B and M3B (M1B in Figure A2a and M2B and M3B in Figure A2b). Note that Bomba rice was harvested by mid-September, thus latter dates are excluded in the plot.
In this case, there are two dates with the maximum correlation at 10 m and 20 m, July 4 and August 28, although the latter includes half the number of pixels than the former. Analogously to JSendra, the results are similar at both spatial resolutions. However, in Bomba rice the visible bands (B2, B3, B4) show the best correlation with the final yields on July 4. B5 shows a very similar performance to the visible bands, and even improves and anticipates the correlation peak with the final yield on June 29. Furthermore, these bands keep a high R 2 throughout July and decrease in August (when the flowering stage happens). Additionally, B3 (Green) and B5 show an acceptable value of R 2 on August 28. On the contrary, the NIR bands show poor correlation on July 4, which is improved by midseason (end of July), and finally shows an R 2 peak on August 28. Analogously to JSendra, B8, B8A and B7 present very similar results. B6 shows an equivalent evolution to these bands but with a poor correlation during mid-season. Regarding the SWIR bands, at the beginning of the season they show a small correlation peak which decreases until reaching a minimum by the end of June and then improves again, reaching a peak by the end of July which is comparable to the visible bands' correlation. Finally, the models M1B, M2B and M3B show the best performance.
Focusing on July 4, Figure 5 shows the average spectral surface reflectance for all pixels within the training fields splitting them by yield ranges. In this variety and date, the visible bands (specially B3) have a larger separability showing lower SR values for higher yields. Otherwise, the NIR, Red edge and SWIR reflectance do not show a clear dependency with the yield.  Next, Tables A3 and A4 (Appendix B) show the performance metrics of each model at 10 m and 20 m spatial resolution respectively, focusing on the same dates as in JSendra. These three dates are essential for a forecast, since tillering is still being defined and it is still possible to modify crop management to improve final yields. Models are built and shown based on the same principles as in the JSendra variety. The resulting models of the stepwise regression program (StatGraphics), M1B+, M2B+ and M3B+ are defined as: where the spatial resolution is specified in parenthesis and the model coefficients are different for each equation and for each date. As a summary of the results, the Table 5 shows the evaluation of the best models for each date and spatial resolution. Comparing the dates, July 4 observations provide the best performance metrics, showing an R 2 greater than 0.85, MAPEs lower than 8% (<6% at 20 m) and MAEs ranging from 0.18-0.23 t·ha −1 . The best model at 10 m is M1B+, that is the model including B2, B3, B4 and their interactions. Note that this model does not include the NIR band. Additionally, model M1B shows similar results to the simple linear model without the NIR band (a 0 + a 1 ·B2 + a 2 ·B3 + a 3 ·B4). At 20 m, M2B+ and M3B+ show the best performance metrics. Note that M2B+, which depends only on B2 and B3, shows similar performance metrics to M3B+ (that depends on B2, B3, B11 and B12) and M3B (that includes all bands). Additionally, it is important to note that B3 (Green) and B5 individually show excellent results. Regarding VIs, the best performance is obtained using the Normalized Difference Red Edge 2 (NDRE2) (R 2 = 0.59). All the performance metrics of the VIs tested, their equations and references can be found in Appendix C. Analogously to the JSendra section, Appendix D shows an example at 10 m and 20 m of a reference yield map, each spectral band, and the modeled yield. Figure 6 (left) shows the validation of the best performance models (M1B+ and M3B+) for the Bomba variety on July 4 considering the validation fields. These results show that the proposed models M1B+ and M3B+ provide a MAE of 0.218 t·ha −1 (5.82%) and 0.223 t·ha −1 (5.78%), respectively. A total of 60% of pixels presented a difference between modelled and referenced yield lower than the MAE, being the maximum difference around 0.80 t·ha −1 , with only 9% of data exceeding a difference of 0.50 t·ha −1 . Note that the differences in MAPE between varieties were due to their corresponding average yields, being 3.8 t·ha −1 for Bomba and 7.0 t·ha −1 for JSendra in the studied area. The validation results of M1B and M3B are shown in Figure 6 (right) (MAE of 0.261 t·ha −1 (6.93%) at 10 m and 0.273 t·ha −1 (7.17%) at 20 m) and are slightly worse than M1B+ and M3B+, respectively. The decrease in R 2 when compared to the training data evaluation is caused by the limited yield variability in the validation pixels (between 2.5 and 5.0 t·ha −1 ) compared to the training pixels (between 0.8 and 5.0 t·ha −1 ). Note that this is not the case in JSendra, where the training and validation pixels show similar variability (5.5-8.0 t·ha −1 ). Finally, a Random Forest algorithm based on the same dataset provides a R 2 of 0.71 and MAE of 0.287 t·ha −1 (7.67%) at 10 m and R 2 of 0.67 and MAE of 0.297 t·ha −1 (7.69%) at 20 m. These performance metrics are worse than the previous models based on linear regressions (M1B+, M3B+, M1B and M3B).

Evaluation at Parcel Level
Finally, Figure 7 shows the evaluation of the model at parcel level by aggregating at this scale the pixel level estimations at 10 m based on models M1S+ for JSendra (left) and M1B+ for Bomba (right) on July 4, and the harvester machine measurements through all validation fields. These plots show a very good agreement with a MAE of 0.176 t⋅ha −1 and 2.61% MAPE for JSendra; and a MAE of 0.142 t⋅ha −1 and 4.51% MAPE for Bomba.

Evaluation at Parcel Level
Finally, Figure 7 shows the evaluation of the model at parcel level by aggregating at this scale the pixel level estimations at 10 m based on models M1S+ for JSendra (left) and M1B+ for Bomba (right) on July 4, and the harvester machine measurements through all validation fields. These plots show a very good agreement with a MAE of 0.176 t·ha −1 and 2.61% MAPE for JSendra; and a MAE of 0.142 t·ha −1 and 4.51% MAPE for Bomba. are worse than the previous models based on linear regressions (M1B+, M3B+, M1B and M3B).

Evaluation at Parcel Level
Finally, Figure 7 shows the evaluation of the model at parcel level by aggregating at this scale the pixel level estimations at 10 m based on models M1S+ for JSendra (left) and M1B+ for Bomba (right) on July 4, and the harvester machine measurements through all validation fields. These plots show a very good agreement with a MAE of 0.176 t⋅ha −1 and 2.61% MAPE for JSendra; and a MAE of 0.142 t⋅ha −1 and 4.51% MAPE for Bomba.

Discussion
In this work we analyze the rice yield dependency on the spectral information provided by Sentinel-2 satellite image acquisitions prior to harvest. To do so, an extensive dataset of harvester machine measurements of within-field yield during the 2020 campaign is preprocessed and linked to the pixel level surface reflectance retrievals considering all Sentinel-2 spectral bands. The two major rice varieties planted in the Albufera region have been analyzed: JSendra and Bomba.
First of all, it is worth noting the high variability of within-field yields despite the limited size of the Valencian fields (areas around 1 ha). In the few examples that we show in this work, even at 10 m resolution and after applying the grid moving average filter to the harvester machine data, the variability of yields can get up to 2.5 t·ha −1 , highlighting underperforming areas within the field. Given that all areas within a field are managed in the same manner, empirical models are the only feasible solution to model such variability. Looking at the plots of the coefficient of determination's seasonal evolution of each band against the final yield, two periods with maximum values of R 2 can be identified in both varieties. There is a first period, when the crop is ending the vegetation stage and a second period, when the crop is ending the maturity stage. Both periods are preceded and followed by a decrease of R 2 . During the first month after the start of season (until end of June) there is barely correlation of the spectral information with the final yield given the reduced size of the rice plants and their signal mixture with water (we need to keep in mind that Valencian rice remains flooded during most of the season). By the beginning of July, the highest correlation is achieved in both varieties. This might be a consequence of the higher density of vegetation and the minimization of the water background signal. During this period, rice plants are in the tillering stage. Tillering is one of the fundamental rice stages defining the final yield since it determines the number of productive tillers per plant which in turn determine the number of grains and their weight. During this phase, the importance of each spectral region is very different among both varieties. JSendra shows a stronger correlation of NIR bands with the yield while Bomba shows a larger R 2 with visible bands and band 5 (red edge). This might be a consequence of the agronomic characteristics of each variety. While JSendra plants are denser but smaller in size, Bomba plants are thinner and have a lighter green color. The higher NIR importance in JSendra during this stage might be linked to the positive impact in the final yield of denser fields or areas within the field. Besides, when monitoring vegetation, visible band spectral response is linked to the leaf pigments performance. In particular, Feng [42] showed that band 5 (red edge) is correlated with the chlorophyll-a content during the rice tillering stage. Therefore, the higher importance of this spectral region in Bomba final yields might highlight the dependency of the yield with the chlorophyll-a content during tillering. From mid-July until mid-August the NDVI reaches its maximum in both varieties and remains high and stable. During this period the plant remains in the reproductive stage. By mid-August, the coefficient of determination that decreases during the reproductive stage reaches a minimum value. This could be caused by mixing different fields with different phenological stages, where some areas remain in the reproductive stage while others enter the ripening stage. It could also be a consequence of the mixing signal from the springs. By the end of August there is another R 2 peak for both varieties. The JSendra variety shows a stronger correlation of visible bands (and specially the green band) with the final yield. The importance of the visible bands at the ripening stage might differ in those fields that enter the senescence phase earlier compared to those that remain green longer, which may enhance final yields. Regarding Bomba, NIR bands are predominant during this stage. However, looking at the number of pixels covered during this second peak of correlation, which is reduced to 20% in JSendra and 50% in Bomba, and considering its proximity to harvest, this second date has not been considered to build the forecast model. Finally, comparing the performance metrics of modeled rice yield with different VIs and SR data supports the findings in Skakun [8] in corn and soybean, that is SR-based models outperform VI-based models to monitor rice yields.
We also studied different models to correlate yield with the spectral surface reflectance. The simplest model is the linear regression of all bands both at 10 m and at 20 m spatial resolution. Besides, the models proposed by a stepwise regression model (StatGraphics), linearly combine the spectral bands but add some multiplicative terms or band ratios. The results show a similar performance by both models on the training samples, but when applied to the validation samples, the StatGraphics model outperform the linear regression models. Overall, and based on the available dataset and the particular year analyzed, the best statistics can be achieved on July 4 with an error of 0.254 t·ha −1 (3.73%) for JSendra and 0.218 t·ha −1 (5.82%) for Bomba rice based on the 10 m spatial resolution data. The 20 m resolution analysis, despite adding more spectral information, shows very similar performance metrics. At parcel level the proposed models provide a MAE of 0.176 t·ha −1 (2.61% MAPE) for JSendra and a MAE of 0.142 t·ha −1 (4.51% MAPE) for Bomba. More complex relationships were also tested using a Random Forest algorithm, providing equivalent results to the linear regression models in JSendra rice and worse results in Bomba rice (given the lower number of samples). Therefore, given the simplicity of linear models, which allow a better interpretation of the inputs considered, we propose using linear models to monitor within-field rice yield.
Finally, we need to highlight that despite the good performance metrics obtained, Figures 4 and 6 (validation plots) and the examples in Appendix D, show that extreme yield values are smoothed out by the models, that is overestimation of low yields and underestimation of high yields. This might be a consequence of the uncertainties of the harvester machine. This is consistent with previous studies [8][9][10].
As a summary and addressing the science questions stated in the Introduction Section, the most critical timing when EO data is better correlated with the final rice yield for both varieties is the beginning of July during the rice tillering stage. The best bands that explain the within-field yield variability are the NIR bands for JSendra and visible and red edge (B5) for Bomba. The proposed models based uniquely on EO data provide accuracies of 0.22-0.25 t·ha −1 at within-field scale and 0.14-0.17 t·ha −1 at parcel level.
These models were developed considering as a reference yields from a particular year (2020) and a limited sized area where meteorological conditions can be considered constant so no meteorological data could be integrated. Therefore, these models are expected to work on those years with similar meteorological conditions. To solve this limitation, future works will integrate both EO and meteorological data of forthcoming seasons.

Conclusions
In this work we analyzed the use of remote sensing data to monitor rice yield. To do so we leveraged an extensive within-field rice yield dataset covering 66 ha of the two major rice varieties planted in Valencia, JSendra and Bomba. The results showed very different correlation of the spectral bands with the final yields, depending on the variety and phenology. During the vegetation stage, the NIR bands showed a stronger correlation in JSendra, whereas the visible bands showed a higher importance in Bomba rice. On the contrary, during the maturity stage, JSendra yield is better correlated with the visible bands while Bomba yields show higher R 2 with the NIR bands. These results highlight the need for discriminating between the two varieties to generate EO-based yield models. The proposed model allows a within-field yield forecast error with a MAE of 0.254 t·ha −1 (3.73%) for JSendra and 0.218 t·ha −1 (5.82%) for Bomba on July 4th and parcel level MAE of 0.176 t·ha −1 (2.61% MAPE) for JSendra and a MAE of 0.142 t·ha −1 (4.51% MAPE) for Bomba. This date coincides with the rice tillering stage and underlines the importance of achieving a good performance of the plants during this stage. Therefore, agronomic decisions in crop management before this stage are critical to improve rice yield. Additionally, these estimations are about three months prior to harvest and provide enough time for improving the rice management towards more direct interventions of those areas that already show any underperformance in plant nutrition (or diseases control) during the tillering stage, thus advancing the science of precision agriculture.  Appendix C Table A5 provides detailed information about the vegetation indices, and their mathematical formula according to [8,10,26], that have been tested in the yield models. Table A6 shows the performance metrics of these VIs on July 4 for each variety.   Appendix D Figure A3 shows the reference yield maps from the harvester machine (D1A and D1H) over a JSendra training field and the corresponding surface reflectance and DVI values from Sentinel-2 on July 4 (B-F at 10 m, I-M at 20 m). Furthermore, Figure A3G,N show the resulting yield estimations based on the M1S+ (left) and M3S+ model (right). Looking at the spatial distribution of the yield measurements, there are two areas with larger yield in the center of the field while the left and right edges show lower yield values. This pattern is very close to the NIR (B8) and DVI distribution across the parcel whereas the other bands hardly show any similarity. This provides a visual idea of how each spectral band or VI intervenes and how it affects the change of spatial resolution (10 m, left and 20 m, right). The models can identify high and low yield zones. However, extreme values are smoothed.  Figure A4 shows an example of a Bomba training field where D2A and D2H show the reference yields at 10 m and 20 m, respectively, other images show the corresponding pixel level reflectance and DVI values (D2B-F at 10 m, D2I-M at 20 m). D2G and D2N show the yield resulting maps after implementing M1B+ and M3B+ respectively. The yield maps show two clusters with high within-field yields, while the top edge shows the lowest yields. Looking at the surface reflectance and the DVI images, the green (B3) and red (B4) bands show the highest correspondence with said distribution. Moreover, the maps give a visual validation of the effect of spatial resolution. From these figures, NIR, Red edge and SWIR should have better results in the modelling. However, those bands do not explain the yield variability between fields well. Finally, the estimated yield maps present a good agreement with the reference yield map identifying high and low yield zones, although extreme values are smoothed out.