Spatial Analysis of Agronomic Data and UAV Imagery for Rice Yield Estimation

: In this study, a spatial analysis of agronomic and remote sensing data is carried out to derive accurate rice crop yield estimation. The variability of a series of vegetation indices (VIs) was calculated from remote sensing data obtained via a commercial UAS platform (e-Bee) at four dates (per stage of development), and the development of estimation models was conducted. The study area is located in the region of Chalastra (municipality of Thessaloniki, North Greece) and the primary data were obtained during the 2016 growing season. These data include ultra-high resolution remote sensing multispectral images of 18 plots totaling 58 hectares of Ronaldo and Gladio rice crop varieties, 97 sample point data related to yield, and many other pieces of information recorded in the producer’s ﬁeld log. Ten simple and compound VIs were calculated, and the evolution of their values during the growing season as well as their comparative correlation were studied. A study of the usability of each VI was conducted for the different phenological stages of the cultivation and the variance of VIs and yield; the more correlated VIs were identiﬁed. Furthermore, three types of multitemporal VI were calculated from combinations of VIs from different dates, and their contribution to improving yield prediction was studied. As Ronaldo is a Japonica type of rice variety and Gladio is Indica type, they behave differently in terms of maturation time (Gladio is approximately 20 days earlier) and the value of every VI is affected by changes in plant physiology and phenology. These differences between the two varieties are reﬂected in the multitemporal study of the single-date VIs but also in the study of the values of the multitemporal VIs. In conclusion, Ronaldo’s yield is strongly dependent on multitemporal NDVI (VI 6th July + VI 30 Aug , R 2 = 0.76), while Gladio’s yield is strongly dependent on single-date NDVI (6 July, R 2 = 0.88). The compound VIs RERDVI and MCARI1 have the highest yield prediction (R 2 = 0.77) for Ronaldo (VI 6th July + VI 30 Aug ) and Gladio (R 2 = 0.95) when calculated in the booting stage, respectively. For the Ronaldo variety, the examination of the multitemporal VIs increases yield prediction accuracy, while in the case of the Gladio variety the opposite is observed. The capabilities of multitemporal VIs in yield estimation by combining UAVs with more ﬂights during the different growth stages can improve management and the cultivation practices.


Introduction
The rice crop exceeds 500 million tons per year and was the dominant food for 2.7 billion people in 2010 [1] and 3.5 billion people in 2017 [2]. In global rice yield, rice from Indica-type varieties represented 85.4% of the yield in 2017, and Japonica rice accounted for 14.6% of the world's rice production [3]. In Greece, rice production reached 220,930 tons in a cultivation area of 29,860 ha [4] and represents 1.4% of the total cultivation area of the country [5]. The 25,000 ha are in the region of Central Macedonia, west of Thessaloniki in the deltas of three rivers. Due to the extremely great importance of rice cultivation both from a nutritional and economic point of view, from the 1990s onwards a significant effort has been made by the scientific community to develop yield estimation models in combination with remote sensing [6], or estimation of plant traits and phenology [7].
Traditionally, the estimation of crop yield derived from crop data collected in situ by agronomists. This technique is usually subjective, time-consuming, and often leads to erroneous estimations [8]; additionally, the different methods for calculating yield and extrapolating lead to different results. The term "estimation model" usually includes an algorithm that quantifies and dynamically interprets the process of plant growth, yield, and interaction with environmental factors [9].
Vegetation indices (VIs) are quantitative expressions that are calculated from the reflectance values of remote sensing data and are mainly related to the vegetation status or biomass [10]. The correlation between photosynthetic activity and VIs is important in environmental monitoring with remote sensing and has been extensively studied [10].
In addition to the common VIs that derive from one date, there are also the multitemporal VIs [11], which are calculated from the use of data obtained from more than one date. It appears from the literature that these multitemporal vegetation indices are important in the study of growth, for example, in phenological characteristics. Multitemporal VIs were initially used to improve yield estimation [12], and for wheat yield estimation in Kansas and Ukraine a multitemporal NDVI was successfully used [13]. Other researchers [14] estimated wheat yield with multitemporal VIs such as ΣNDVI and ΣRVI acquired from remote sensing data from the tillering phenological stage up to the grain-filling stage and achieved greater estimation accuracy than single-date VIs. In addition, single-date VIs are more sensitive to cultivation practices [15], while multitemporal VIs minimize the yield estimation errors that derive from the date of data acquisition, the process method being used and the established cultivation practices [16]. A multitemporal VI considered in this study is the cumulative index SUM (VI) resulting from the sum of each vegetation index between two dates.
The study focuses on two rice varieties, Ronaldo and Gladio; both varieties were cultivated with the wet paddy rice cultivation technique. The Ronaldo [17] variety is a short grain japonica type (O. sativa subsp. Japonica) with mean cultivar height of 90 cm at full growth; a life cycle of 150 days (from seeding to maturity, under northern Greece's climatic conditions); high resistance to rice blast disease (caused by the fungus Pyricularia oryzae), Brown Spot disease (caused by the fungus Helminthosporium oryzae) and stem bending (despite the elongated stems); high resistance to cold conditions; high tillering capacity; delayed growth; and very high yield capability. Its grains' mean dimensions are 6.5 × 2.9 mm and they contain 18% amylose. The Ronaldo variety is considered to have semi-dwarf grain, suitable for para-boiling, with a very low percentage of defective grains, featuring a very high production capacity and semi-precocious cycle resistant to lodging and diseases.
The Gladio [18] variety is a short grain japonica type (O. sativa subsp. Indica) with short cultivar height at full growth, short stem length, a life cycle of 130 days (from seeding to maturity), high resistance to rice blast disease, adequate resistance to Brown Spot disease, high resistance to and stem bending, high resistance to cold conditions, high tillering capacity, delayed growth and usually high yield capability. Its grains' mean dimensions are 7.5 × 2.3 mm and they contain 27% amylose. It adapts to a great variety of growing environments and conditions, and its short stem and early maturing cycle enables the reduction of risks at the time of harvesting and drying costs, respectively. It does not usually require fungicide treatments because of its high resistance to most rice-seedling pathogens and diseases.
This study's main scope is to investigate, for rice cultivation, the variability of a series of vegetation indices calculated from remote sensing data derived from a modern, commercial UAV platform and their optimal correlation with the yield. The above scope is divided into four objectives: (a) the correlation of yield with vegetation indices, (b) the identification of the best vegetation indices for estimating yield and the monitoring of vegetation indices over time with the best correlation with the yield during a growing season and for the various phenological stages of cultivation, and (c) to find the optimal combination of vegetation indices and remote sensing data acquisition time with higher accuracy in yield estimation.

Study Area
This study was conducted in the rice fields of Chalastra, in the prefecture of Thessaloniki, in northern Greece ( Figure 1). Primary data were used that are part of a larger experiment conducted in 2016 in the context of the research of the ECODEVELOPMENT company in the field of precision agriculture. Data from in situ rice weight measurements were obtained mainly from the central area consisting of 18 plots ( Figure 2) with a total area of 58 ha, of which in 11 plots the Ronaldo variety is cultivated and in the remaining 7 the Gladio variety is cultivated. The equable sowing seed density in combination with the germination capacity were verified so that the same vegetation percentage (>95%) was achieved in all fields and for each of the two varieties.

Remote Sensing and In-Situ Data
The remote sensing data were obtained with UAV ( Figure 3) and specifically with the eBee model of SenseFly [19]. The mounted camera is a modified multiSPEC 4C [20] for the needs of research in precision agriculture with four spectral bands at the following wavelengths: 550, 660, 710 and 790 nm (green, red, red edge and near infrared). The four flights were designed with eMotion software and were taken as much as possible at similar hours of the day (Figure 4), with 20 cm pixel resolution. In addition to multispectral images, 97 samples of point yield data were used in the study from 13 out of 18 rice plots. To collect the rice yield point samples, a cubic metal mesh with a side length of 50 cm was constructed ( Figure 5). From the 97 samples, a few days before the start of the harvest, a collection of taxa was carried out, while at the same time the position of each point was recorded with hand GPS. For each plot but also for each of the 97 samples, the yield was calculated by kilograms per hectare, while other information about the crop and cultivation practices was recorded by the producer, such as dates and type of fertilization, crop care, etc. From these primary data, after processing emerged the dataset used for statistical analysis.

Image Analysis and Vegetation Indices
The flowchart of the data processing methodology ( Figure 6) demonstrates all stages, from remote sensing data acquirement to the final mean value datasets of VIs, per field or sample data area.
The flight plan was designed with eMotion software for each date of UAV flight in the rice fields. With the Pix4D software, the production of the mosaics was automated, after setting some basic parameters. Pix4D radiometric corrections were applied to transform the pixel digital numbers (DNs) to reflectance values using information and data from the calibration card of eBee.
The mosaics were georeferenced, but a further correction was needed to achieve the accuracy of studying the Vis-yield correlation. The rectification step was carried out with a 3rd-degree polynomial function (RMSE < 1.2) using a sufficient number (~30) of fixed ground points (crossroads, building corners, electricity poles, trees, etc.) and with the red band as a background, on which all the other bands "matched" for each shooting date. The co-registration between the composite images of the four dates was carried out with this reference image of 6 July. Beyond that, the bands were stacked on four multispectral images to calculate the VIs.
Many VIs were investigated, incorporating the red edge spectrum. Comparison of complex vegetation indices with spectral bands at 705, 750, 670 and 800 nm in terms of their correlation with chlorophyll content in rice showed a strong correlation when the VIs included red edge and near infrared or some combination of them (at 710 nm and 800 nm) [21]. Due to the low absorption of chlorophyll in the red edge spectrum, its use in the calculation of vegetation indices reduces the saturation often observed at high values of the leaf mass index (LAI), and the reflectivity remains sensitive to the absorption of chlorophyll in medium and high values [22].
The vegetation indices (Table 1) with the multispectral images were stacked in four images with 14 bands (4 spectral bands and 10 VIs). From these four images, firstly the average value was calculated for each field with zonal statistics for the correlation with the mean yield of each plot. In addition to correlating the point yield samples with the VIs, zonal statistics from a buffer zone of 2 m around each point were made of Vis' values of an area of approximately 12.5 m 2 consisting of 315 pixels (20 cm), to deal with the GPS error due to the low accuracy (1.5 m-2.5 m) around the 97 points.
Preliminary analysis of the existing correlation between yield data and VIs was conducted at 2 "spatial levels" or scales for which data were obtained: At sub-plot scale derived from point data related to yield.
Subsequently, the yield estimation in relation to VIs and multitemporal VIs was based on multiple linear regressions on the different datasets. In this study, 10 VIs were selected for further analysis, with the first VI being NDVI [23]. It was used in this work as a benchmark, as it has been adequately studied by many researchers and has been known for many years for its correlation with many parameters of plant growth. The second VI (NNIR) is a complex index, as it includes 3 spectral bands. Due to the use of reflectance in the green band, it was found to be strongly correlated with the absorption of nitrogen, which in turn affects the nutrition and ultimately the yield [24]. To reduce NDVI's observed saturation at the growth stages of maturity and achieve a linear correlation of the biophysical parameters of VI, the REDVI (Renormalized Difference Vegetation Index) was proposed, which combines the advantages of DVI and NDVI [25] and was further improved with the addition of red edge [26]. NDRE (Normalized Difference Red Edge) [27] is essentially the index that occurs after replacing the red band with that of the red edge on the NDVI index. NDRE is a better index of the plant tissue health/vitality than NDVI for mid-to-late-stage crops, where they have accumulated high levels of chlorophyll in their leaves, because the red edge band is penetrates more so the leaves than red light and so is less absorbed. It is more suitable than NDVI for high-yield crops throughout the growing season, because NDVI often loses sensitivity after the plants have accumulated a critical level of leaf cover or chlorophyll content. The Chlorophyll red edge Index (CIre) [28] is used to estimate the total chlorophyll content of leaves. The green and red bands' reflectance values are sensitive to small changes in chlorophyll. The total chlorophyll content is linearly correlated with this index. Therefore, it is widely used, especially in large crops. MCARI1 [29] is a vegetation index that belongs to the category of solid line vegetation indices. It has a very high correlation with the leaf area index (LAI) and removes interference from the reflectivity of the soil. It also includes the red edge range, so it is considered useful for the study of crops with varying percentages of vegetation and even in the middle-late stage [30]. The RESAVI [26] vegetation index is a variation of the soil adjusted vegetation index (SAVI) and its improved version of OSAVI. RESAVI is resistant to the variability of soil reflectivity and has increased sensitivity to high LAI and vegetation cover greater than 50%. This index is best used in areas with relatively sparse vegetation where the soil is visible through the crown of the plants [31]. The red edge Re-normalized Different Vegetation Index (RERDVI) [26] is an improvement of the Normalized difference Red Edge (NDRE). It is an index for the study of the vitality of the vegetation with mid-late growth when the plant tissues have accumulated high levels of chlorophyll content. The red edge wavelength it uses penetrates more so the leaves than red and is thus less absorbed [32]. The TVI [33] index is calculated as the area of a hypothetical triangle in the spectral space that connects the maximum reflection region in green, the minimum chlorophyll absorption region, and the NIR. When chlorophyll absorption causes a decrease in reflectance in the red spectrum and the abundance of leaves causes an increase in reflectance in the NIR, the total area of the triangle and therefore the value of the index increases. It is good for estimating LAI, but its sensitivity to chlorophyll increases with increasing crown density [34]. The MTVI2 [29] VI is a variant of MTVI and in essence an improvement of TVI. It is considered to be strongly correlated with LAI and corrects soil reflectivity at very low LAI values. Its values are influenced by structural elements of plant tissues and are also related to the quality characteristics of the plant. In Figure 7, three Vis are demonstrated with RGB, where red is RERDVI, green is NDVI and blue is MCARI1.
Further to the above VIs, 3 multitemporal VIs [11] were also calculated: where x i and x j represent the values of vegetation indices at two different stages of development. For the experiment of this work, the method of calculating the multitemporal vegetation indices is described in Table 2.

Results
In the present experiment, one factor that exists is the different rice variety. The following graphs were designed to visually investigate how rice production is affected by varieties ( Figure 8). In addition, the range of yield for each variety, as shown by the yield point data, appears to be significantly larger than that obtained from the average yields for each variety. This indicates the existence of variability within the parcels that affect yield. Figure 8a shows the difference in yield between the two rice varieties, which is also proved statistically (F = 19.323, Sig = 0.000) while the descriptive statistics are shown in the Table 3. Moreover, it can be seen in Figure 8b that the point yield data, when viewed by variety, are slightly biased upwards. In other words, it seems that the average value of the yield (in kg/ha) is higher in both varieties than that which results from the total yield of the parcels. This is because the sampler chose for the yield samples areas where the cultivation did not present any problems. However, the averages of the plots also take into account areas with stressed vegetation, such as those near the boundaries of the plots or areas where the vegetation has been degraded by other factors (diseases, reduced plant density, etc.).
Additionally, the spatial variability is recognized in many cases from other similar experiments in Greece, even in small plots of land of 1 hectare. It is therefore common practice to test NDVI at the beginning of the growing season to assess this variability [35].
The assumption of existing soil variability was taken into account and effort was made in order to observe it. To investigate this hypothesis from the two data sets, Pearson's linear correlation values were initially calculated for the general population and each variety separately (Table 4). Of the ten VIs that were calculated during the phenological stages of booting, panicle heading and ripening, three VIs were selected for further analysis: NDVI (as a benchmark), RERDVI (due to the best correlation with yield for the Ronaldo variety) and MCARI1 (due to the best correlation with yield for the Gladio variety) ( Table 4). Table 4. Pearson's correlation coefficient between the 10 examined vegetation indices and the yield (mean value for each plot), for each variety separately (the average best correlation for Ronaldo and G, respectively, are presented in bold).

General (N = 18) Ronaldo (N = 11) Gladio (N = 7)
Vegetation The difference between the two varieties is demonstrated. For Ronaldo, the vegetation indices at the stage of panicle heading are not correlated at all with the yield, while both at the stage of booting and ripening are very strongly correlated. Conversely, for Gladio, a strong correlation emerges for the first two stages (booting and panicle heading), while on the ripening it seems that the vegetation indices cannot estimate the yield with accuracy.
In Figure 9 the NDVI between the three dates appears different. On the 6 July, on the phenological stage of booting the average values are in the range 0.74-0.89, but without signs of saturation (which is presented a little earlier at the stage of maximum tillering). On 21 July, the heading stage had a lower average value, while on 30 August the NDVI maintained its low values. These values are expected, as at the stage of booting has already formed and the crop is close to flowering, where the germination as reflected by other parameters (LAI, fpar, etc.) goes down, to have its lowest average value (of the three examined dates) on 30 August when rice cultivation is at the beginning of ripening.
We also notice that both MCARI1 and especially RERDVI increase their value on all plots, especially at the ripening stage. This is explained by the fact that these indices utilize the red edge spectrum and are therefore more sensitive to the ripening stage of the crop, at which time the chlorination of the plants has declined significantly.
The above observation can be explained by the physiology of the plants of each variety in combination with the theory behind the vegetation indices and the correlation of VIs with rice growth cycle [36]. The Ronaldo is a late-maturing variety, of low vivacity, especially after the stage of panicle heading, which grows very tall, has long internodes in the flowering stem and gives a very large yield [37]. The stage of heading begins around the time of taking the multispectral image (21 July) and for lasts many days after. During this stage, the plant is stressed, as it was preceded by fertilization to help the booting (on 11 July) with fast absorption fertilizer, while the next fertilization (at the peak of this stage) took place on 10 August. On the other hand, the Gladio variety is premature, about 20 days earlier and livelier, with short internodes, and on 21 July it has completed the vegetative stage of growth and booting has already formed. The yield seems to be more strongly correlated with the vegetation indices as it does not face the delay in Ronaldo's growth. In Gladio, however, due to prematurity, the multispectral image of 30 August cannot be correlated with yield, as the plants are already at the stage of fruit filling and have been fertilized a few days before (10 August). The results of the linear regression analysis using the multitemporal vegetation indices [11] as independent variables and the yield as dependent are summarized in Table 5 and the regression models on Supplementary Material. As shown, yield from both varieties is strongly correlated with the VIs already from the first acquiring date of remote sensing data, something that is not captured for the next data acquisition dates. For the Ronaldo variety the yield has a strong dependence on multitemporal NDVI from the first and third flight (on 6 July and 30 August) (R 2 = 0.76), while there is a weak dependence on NDVI from the other 2 days. However, there is also a significant dependence on the use of the multitemporal index SUM (VI) concerning the third date of any combination of dates, while the Ronaldo's yield dependence on the multitemporal index MRL is high (R 2 = 0.75). In addition, the dependence of yield from the RERDVI index (R 2 = 0.77) is achieved by utilizing data from the first and last flight (on 6 July and 30 August). RERDVI utilizes data from the red edge band, which is sensitive to parameters such as N uptake and yield [38]. However, there is a significant improvement in the dependence of yield with the multitemporal index SUM (VI) concerning data of the first and third flights, while the maximum yield dependence is achieved with the multitemporal index MRL (R 2 = 0.81).
The yield of the Gladio variety has a strong dependence on NDVI and MCARI1 (NDVI R 2 = 0.88, MCARI1 R 2 = 0.95) by using data from the first flight (on 6 July), while there is weak dependence for the last date. Contrary to what happens in the Ronaldo variety, there is no improvement with the use of the SUM (VI) index over a single date, whatever combination of dates is chosen.

Discussion
It is generally accepted that no yield estimation model is accurate locally unless developed and tested locally [39]. Some of these models are integrated into broader decision support systems that are currently in the form of software and being developed directly for farmers so that they can implement optimal management, save money and apply them to crop management information systems [40].
The results of this study showed that the yield prediction varies according to the rice variety; therefore, the R 2 values are improved with multitemporal VIs and they can give better results, as mentioned in the literature [41]. This study captured that a single-date VI can be used for yield estimation but varies considerably depending on the variety and the VI, while the estimation is focused on one field. In future research, VIs that have the green band in the calculation may be further exploited to identify management zones [42] or to assess crop characteristics of differing physiology and phenology [43].
The main problems regarding yield prediction with VIs are that the VIs lose sensitivity in the last growth stages and they saturate at dense vegetation [44]. In this study we were not faced with the saturation error mainly because of the red-edge usage on VIs which have great potential for yield estimation [45] and rice growth monitoring [46], while the used VIs remained sensitive in all acquisition dates. The strongest correlation of VIs with yield was on the booting stage of Ronaldo and booting and heading for Gladio variety, something that is also confirmed in the literature [11,47,48]. In addition, it is common to use different wavelengths' VIs for each growth stage [49], since the sensitivity of VIs changes at different stages of rice plant growth.
As already seen from the preliminary statistical and qualitative analysis of the data, all vegetation indices depend on physiology and phenology of rice plants. This is to be expected, as the reflectance values of the radiation at the wavelengths are included and determine the values of VIs. In essence, they are affected by the type of plant tissue (crop or weeds), the structure of the plant tissues and their proportion (chlorophyll content, N, etc.), the soil parameters and the vegetation density.
In conclusion, the UAV with the ability to acquire multispectral imagery according to the phonological stage of the rice plants can be used for precision agriculture applications with single-date VIs or multitemporal VIs [11]. Additionally, UAV imagery is more applicable for field-scale rice yield estimation with pixel size of 10-30 cm, bypassing cloud cover problems, in contrast to remote sensing platforms with lower resolution [50]. The yield estimation models at any growth stage are useful for farm risk management, and it can be a decision support system for rice plant cultivation practices [51]. Finally, the purpose of UAVs is to replace field measurements for rice growth monitoring, yield estimation and crop needs, which has not yet reached the level set by the research community [44].

Conclusions
The optimal correlation of VIs studied with the yield is achieved using linear regression, although in the literature some non-normalized VIs would be expected to behave differently. The variability within the parcels attributed to the soil composition and the nutrition of the plants predictably affects the most sensitive vegetation indices of the present study. This variability is highlighted even by the qualitative examination of the Vis' maps of the parcels, the soil maps and the yield maps.
NDVI is strongly correlated with the yield level for each plot regardless of variety, even when it is calculated at an early stage (6 July). Of the remaining VIs, RERDVI for Ronaldo and MCARI1 for Gladio have the highest correlation coefficients with yield, even when calculated at the booting stage.
The multitemporal VIs calculated by combinations of VIs from different stages have an increased correlation with the yield, although again the variety has a very large effect here as well. In particular, for the Ronaldo variety, the examination of the multitemporal VIs increases the estimation accuracy of the crop at all stages (R 2 increases up to 0.63), while in the Gladio variety there is an improvement only at the stage of crop maturation.
Utilization of the largest number of measurements of yield point data did not contribute positively to the improvement of the correlation of the Vis with yield. This is attributed to the accumulation of systematic errors in the calculation of VIs and the spectral outliers. These systematic errors are attributed to position errors in the field. It seems that more precision is required when receiving field data, possibly using RTK GPS.
In conclusion, the calculation of multitemporal VIs from several dates, e.g., per week, is feasible using the UAVs and may further improve yield estimation accuracy and accurately identify the crop development stages that are most important for its estimation to improve management and cultivation practices.