Grain Yield Estimation in Rice Breeding Using Phenological Data and Vegetation Indices Derived from UAV Images

: The accurate estimation of grain yield in rice breeding is crucial for breeders to screen and select qualiﬁed cultivars. In this study, a low-cost unmanned aerial vehicle (UAV) platform mounted with an RGB camera was carried out to capture high-spatial resolution images of rice canopy in rice breeding. The random forest (RF) regression techniques were used to establish yield models by using (1) only color vegetation indices (VIs), (2) only phenological data, and (3) fusion of VIs and phenological data as inputs, respectively. Then, the performances of RF models were compared with the manual observation and CERES-Rice model. The results indicated that the RF model using VIs only performed poorly for estimating yield; the optimized RF model that combined the use of phenological data and color VIs performed much better, which demonstrated that the phenological data signiﬁcantly improved the model performance. Furthermore, the yield estimation accuracy of 21 rice cultivars that were continuously planted over three years in the optimal RF model had no signiﬁcant difference ( p > 0.05) with that of the CERES-Rice model. These ﬁndings demonstrate that the RF model, by combining phenological data and color Vis, is a potential and cost-effective way to estimate yield in rice breeding.


Introduction
Rice (Oryza sativa L.), one of the most important food crops in the world, is consumed by more than half of the global population [1], especially in East Asia and Southeast Asia. Food price fluctuations in the market can adversely affect rice cost and production; hence, a convenient and reliable technology for predicting rice yield is necessary for farmers and governments to make appropriate decisions in regard to rice production [2].
Traditionally, agronomists rely on field survey to obtain approximate yield predictions. However, estimation based on empirical and subjective knowledge resulted in inaccurate crop assessment and were limited in capacity for yield estimation in regional scale [3,4]. Satellite imagery-based remote sensing (RS) data have been widely used for monitoring crop growth status and nutritional conditions [5,6]. Yield prediction models were established based on meteorological factors and an RS vegetation index (VI) using the multiple linear regression technique [7], and it was found that the inclusion of remote sensing data could significantly improve the yield prediction accuracy. The enhanced vegetation index (EVI) and normalized difference vegetation index (NDVI) data derived from the moderate resolution imaging spectroradiometer (MODIS) were compared for estimating rice yield, and the result indicated that the EVI-based models were slightly more accurate than the NDVI-based models [1]. However, typically, satellite imagery had conflict between spatial and temporal resolutions, and the quality of RS data was considerably affected by atmospheric interference [8,9]. Although synthetic aperture radar (SAR) techniques that could penetrate vegetation canopies were not influenced by clouds [10,11], it was difficult to exactly extract crop information due to the small size of farmlands in the south of China.
For understanding the effect of site-specific crop management and environmental stress on crop production, a highly accurate and reliable model for crop yield estimation in the plot-scale was essential [12].
As for plot-scale yield estimation, crop growth models have been successfully proposed to simulate yield such as DSSAT [13], AquaCrop [14], EPIC [15], and APSIM [16]. These models become efficient and useful tools for field management and precision agriculture [17]. The CERES-Rice model involved in the DSSAT model has been widely used as a technological tool to support the decision-making for rice field management activities [18,19]. The capability of the CERES-Rice model for rice yield estimation has been reported in previous studies [20][21][22]. However, the implementation of crop growth models (e.g., CERES-Rice) in commercial application remains limited because of their high data acquisition costs. For example, the soil properties need to be measured in the laboratory and the cultivar parameters need to be tuned costly. Moreover, the proper use of these models requires skillful manipulators with agronomic knowledge to obtain satisfactory results.
In recent years, unmanned aerial vehicle (UAV) with a high spatial resolution of imagery has attracted considerable interest for crop yield estimation in plot scale [23][24][25]. Feng et al. [26] extracted eight image features from a UAV system and found that the fusion of two image features could improve the accuracy of cotton yield estimation. Yang et al. [27] constructed the convolutional neural network (CNN) model with the dataset of RGB and multispectral images derived from UAV to estimate rice yield, and they found that the CNN's accuracy in RGB image dataset was better than that in the multispectral image dataset. The VIs and crop height data extracted from UAV-RGB images with crop surface models (CSMs) were combined to estimate corn yield by three linear models [28]. Agueera Vega et al. [29] calculated NDVI from multi-temporal images obtained from a UAS and found that NDVI was highly correlated with grain yield. Furthermore, a good correlation also exists between the sugarcane yield and G-R vegetation index obtained by the UAV-RGB images [30].
Although most scholars have proven the potential of image features for crop yield estimation under the same crop cultivar, the yield estimation in breeding programs was still challenged by the variation of genotypes. In breeding trials, thousands of genotypes from two parents were planted without duplication. The great variation of phenotypes in yield could be caused by the combination of environmental and genetic variations [31]. Accordingly, only a few studies have been reported for yield estimation in crop breeding [32]. For example, Zhou et al. [32] used CNN to estimate the yield of 972 soybean breeding lines with R 2 of 0.78 and RMSE of 391 kg ha −1 based on the UAV imagery. Ashapure et al. [33] demonstrated that a simple Artificial Neural Network (ANN) model could perform well for yield estimation in a set of tomato varieties by using UAV-RGB imagery.
To the best of the authors' knowledge, previous studies focused on estimating the plotscale crop yield by using image features from UAV-based imagery, and they neglected the growth duration that determined the amount of incident solar radiation closely associated with biomass and grain yield as an important phenological factor [34]. In rice breeding, the growth duration varied across cultivars; thus, the estimation of grain yield in rice breeding, only based on VIs from UAV images, could result in considerable uncertainty. The random forest (RF) regression technique has been successfully used to estimate crop biomass [35], grain yield [36], and nitrogen status [37]. Thus, in this study, RF was used to construct the yield estimation models. Meanwhile, the yield estimation of the same cultivars that were continuously planted over three years in the RF model was compared to the CERES-Rice model. Accordingly, the objectives of this study are: (1) to establish the RF model for yield estimation in rice breeding using UAV-based RGB images only; (2) to improve the capability of the RF model combining RGB images and phenological data; (3) to validate the performance of the optimal RF model, comparing it with that of the CERES-Rice model.

Field Trial Design
The experimental station is located at the Guanshan rice breeding base (28 • 19 N, 112 • 40 E), Jinzhou town, Ningxiang City, Hunan Province, China ( Figure 1). Its annual average temperature, rainfall, and sunshine duration are 16.8 • C, 1358 mm, and 1739.2 h, respectively. The average annual sunshine hour duration per day is about 4.8 h. The soil type is fine-loam with pH, organic carbon, and total nitrogen concentrations of 6.7, 17.1 g kg −1 , and 1.76 g kg −1 , respectively. In this study area, the breeding strategies include the photoperiod/thermo-sensitive genic male sterile line-based two-line system and the cytoplasmic male sterile line-based three-line system. A total of 122, 127, and 105 one-season-late indica hybrid rice combinations were investigated, respectively, in 2017, 2018, and 2019. These hybrids were developed by different breeding institutes and companies and were applied for commercial release in Hunan province, China. All hybrids were grown in a randomized complete block with three replicates. Seedlings were transplanted in each plot, which was comprised of 450 plants at spacing of 17 cm × 20 cm. All the experimental plots applied the same fertilizer treatment with N (195 kg ha −1 ), P 2 O 5 (112 kg ha −1 ), and K 2 O (112 kg ha −1 ), using urea as the N fertilizer. The N fertilizer was split into three applications, with 40% being basal fertilizer, 40% being tillering fertilizer, and 20% being ear fertilizer. All phosphorus and potassium fertilizer were used as basal fertilizer. Water, insects, weeds, and disease were controlled when needed. Sowing was conducted in early June; then, transplanting was performed in late June or early July, depending on the climate conditions and seedling growth status (Table 1).

Field Data Collection
In field survey, rice breeders observed and recorded the growth stages of different cultivars every day; the phenological traits have been specifically described by Fageria et al. [38]. The basic phenological information of different rice cultivars were shown in Tables 1 and A1. The rice grain yield of the investigated plots was harvested, depending on the maturity date of each rice cultivar. After the harvest, the grains were put into an oven at 75 • C until their weights did not change (after 72 h) and were then weighed by an electronic balance (±0.1 g). To analyze the phenological differences among the rice cultivars from these plots, the duration and effective accumulated temperatures between the ST (sowing to transplant), TI (transplant to initial heading), IH (initial heading to full heading), HM (full heading to maturity), and SM (sowing to maturity) were calculated (Table A2). The effective accumulated temperature was calculated as follows: where GDD is the effective accumulated temperature, • C; where i = 1, 2, 3, . . . m days during the growing season. T avg is the daily average temperature, • C; T base is the minimum temperature required for crop growth, • C. T max is the daily maximum temperature, • C; T min is the daily minimum temperature, • C. The biological upper limit temperature of rice (T upper ) is 40 • C, and the lower limit temperature (T base ) is 10 • C [39].

UAV Data Acquisition and Image Processing
The UAV used in this study was a four-rotor consumer UAV (Phantom 4 Pro, SZ DJI Technology Co., Ltd., Shenzhen, China) with a maximum payload capacity of 1.375 kg. This UAV had a 1-in CMOS 20-megapixel camera mounted on its platform that was employed to capture the rice canopy images during the flights. Autonomous flights were carried out by Pix4Dcapture software to have overlap (70% forward and 80% side). The angular field of view is 60 • horizontal × 27 • vertical, resulting in a nominal resolution of 3.68 cm ground sampling distance (GSD) at 100 m above ground level. The focal length of the camera lens was 35 mm, and the RGB sensor could acquire 5456 × 3632 pixels over a brief period (faster than 1/2000 s). The captured camera images were in JPEG format. The images were collected on 6 September 2017; 7 September 2018; 29 August 2019, when practically all the rice cultivars were at the heading stage. The flights were carried out between 11:00 a.m. and 1:00 p.m. in stable ambient light conditions [9].
After the flights, the RGB images were downloaded from the memory card. The Pix4Dmapper (version 1.1.38, Pix4D SA, Lausanne, Switzerland) was used for image stitching. First, the overlapping images were automatically aligned by a feature point matching algorithm. Second, fifteen GCPs were used to georeference each image. Finally, the orthomosaic map with a TIFF image format was exported. In this study, VIs or bands were extracted from the orthomosaic maps by using the "GDAL" module in Python 3.6. Region of interests (ROIs) in individual plots were determined based on the orthomosaic maps by dividing the study area into regular plot polygons ( Figure 1). In this study, Esri ArcGIS 10.2 software was used to draw ROIs and establish a buffer between boundary plots to reduce plot edge effects [40]. Then, the mean VI (Table 2) of ROI represented the value of each plot.

RF Model
RF is an ensemble learning method that combined the independent predictor results with bagging and random feature selection; the final result is obtained by voting or taking the mean value [48]. RF is insensitive to collinearity among multiple variables, obtaining high estimation accuracy and minimizing overfitting phenomenon [49].
In this study, fourteen VIs were calculated as different mathematical combinations of the three visible bands (i.e., red, green, and blue) from the UAV-based images ( Table 2). Most of the VIs selected in this study were widely used to estimate the vegetation's diverse biophysical parameters, such as aboveground biomass [50,51], chlorophyll content [52], nitrogen accumulation [9], and yield [41]. The phenological data involved five duration length variables (e.g., ST, TI, IH, HM, and SM) and five effective accumulated temperature variables (e.g., GDDST, GDDTI, GDDIH, GDDHM, and GDDSM) under specific growth stages (Table A2). The RF regression technique was carried out to establish the estimation model for grain yield in the breeding trial by using VIs only, phenological data only, and combination of VIs and phenological data. We used the dataset in 2017 and 2018 to train RF models. The remaining dataset in 2019 was used to test the RF models. Three hyperparameters that need to be tuned in an RF model are the maximum depth of trees (max_depth), minimum number of samples to split an internal node (min_samples_split), and minimum number of samples at a leaf node (min_samples_leaf). A grid search algorithm with 10-fold cross validation was used to split the set cultivar-wise in the calibration dataset to obtain optimal parameters values ( Table 3). RF models were implemented by using the function "RandomForestRegressor" in the package of scikit-learn (https://scikit-learn.org/stable/, accessed on 28 November 2021) in Python 3.6. The input data for the CERES-Rice model included meteorological data, soil characteristics, field management information, and cultivar specific parameters. The meteorological data were collected from the local experiment meteorological station. The minimum requirement of weather data included daily maximum air temperature ( • C), minimum air temperature ( • C), solar radiation (MJ m −2 d −1 ), and precipitation (mm). The soil parameters and characteristics are shown in Table 4. The physical and chemical properties of each horizon of collected soil samples (Table 4) were analyzed by using the Soil Science Society of America (SSSA) and American Society of Agronomy (ASA) methods for soil analysis [53]. Field management information, such as planting date, transplanting date, harvest date, and fertilization amount, were shown in Table 1, Table A1, and Section 2.1. There are eight cultivar parameters in the CERES-Rice model: four phenology-related parameters (P1, P2R, P5, and P2O) and four yield-related parameters (G1, G2, G3, and G4). The cultivar parameters were determined by the generalized likelihood uncertainty estimation (GLUE) method. The specific procedures of parameter estimations were described in the following section.

Cultivar Parameter Estimations
In this study, we screened out 21 rice cultivars that were planted continuously over three years. The datasets in 2017 and 2018 were used to calibrate cultivar parameters. Then, the dataset in 2019 was used to evaluate the performance of the CERES-Rice model. The GLUE method exploited prior distributions of the parameters and variables to obtain the optimal value. First, Monte Carlo sampling of distributions were used to generate 6000 random parameter sets in each cultivar. Second, the likelihood value was calculated based on the mean observation data such as flowering date, maturity date, and grain yield. Then, we ran the CERES-Rice model 6000 times, and the results were evaluated by GLUE. After applying this approach to parameter sets, a final set of cultivar parameters was determined (Table A3).

Statistical Methods
The Pearson correlation analysis between studied variables and measured yield was carried out by a function called "pearsonr" in the Python module "scipy.stats". A pairedsamples t-test at p ≤ 0.05 was used for paired mean difference comparisons between the estimated values of the optimal RF model and the CERES-Rice model for 21 replicated cultivars across the years. The paired-samples t-test method was used to test the null hypothesis "there were no significant differences in the predictions of the optimal RF model and CERES-Rice model for 21 replicated cultivars in 2019". The normality of the distribution of sample datasets was tested by using the Kolmogorov-Smirnov test, and homogeneity of variances was tested by using the Levene test. All sampled datasets followed a normal distribution (p-values > 0.05) and similar variances (p-values > 0.05). Therefore, a pairedsamples t-test could be used in this research. The Python script named "ttest_rel" function from Scipy Python module was used to perform the paired-samples t-tests.
For yield estimation, the performance of models was evaluated in the calibration dataset (in 2017 and 2018) and validation dataset (in 2019), respectively. The estimation accuracy of the yield model was quantified using the coefficient of determination (R 2 , Equation (2)), root mean square error (RMSE, Equation (3)), and mean absolute error (MAE, Equation (4)). These statistical indices were calculated as follows: where n is the number of samples in the calibration or validation data set; S i is the estimated yield; M i is the measured yield; M i is the mean value of the measured yield. Mathematically, the model with higher values of R 2 , a smaller RMSE and MAE corresponds to more accurate results.  Figure 3 also shows negative correlations between yield and GDL for 21 replicated cultivars across the years. The variability of yield was larger than that of GDL for all cultivars, as well as for 21 replicated cultivars across the years ( Table 5). Note that the GDL of cultivars in 2018 tended to be smaller than that in 2017 and 2019. The GDL difference was largely attributed to the variations in climate conditions across the years. The mean daily temperature between sowing and maturity in 2018 was 1.4 • C and 0.7 • C higher than that in 2017 and 2019, respectively (Table A4). Additionally, the mean values of yield for 21 replicated cultivars were relatively consistent across the years.     Figure 4c shows the relationship between VIs and GDL. All VI variables were significantly correlated with GDL (p < 0.05) in 2017. However, some VI variables such as G, B, G/R, and INT had no significant correlation with GDL (p > 0.05) in 2018 and 2019. Accordingly, the relationship between VIs and GDL was inconsistent across the years.

RF Method for Yield Estimation
Three different RF models were established for yield estimation in rice breeding, and the results were evaluated using R 2 , RMSE, and MAE. Figure 5 shows the distributions of the simulated yield by those models in calibration and validation sets. When using VIs only as inputs, the simulated yield based on the RF model was mainly around 9 t ha −1 . The variation of simulated yield was small, especially in the validation dataset ( Figure 5). When using phenology data as inputs, the distribution of simulated yield was similar to that of the RF (VIs + phenoloy) model. The range of simulated yield in the RF (VIs + phenoloy) model was closer to the range of the measured yield. However, all the RF models underestimated yield at the high yield level, to a certain extent. In comparison with approaches that solely use VIs or phenology to estimate rice yield, the RF (VIs) model exhibited unsatisfactory performance for yield estimation in the validation set with R 2 = 0.06, RMSE = 0.65 t ha −1 , and MAE = 0.51 t ha −1 , respectively. In contrast, the RF model that used only phenological data exhibited good yield estimations with R 2 values of 0.62 and 0.46 in the calibration and validation sets, respectively, which indicated that the phenological information across rice cultivars was an important factor in determining yield in rice breeding. When combined RGB-VIs and phenological data as inputs, the RF model (VIs + phenology) achieved the highest performance in rice yield estimation in the calibration and validation sets; R 2 , RMSE, and MAE were 0.70 and 0.53, 0.48 and 0.43 t ha −1 , and 0.38 and 0.34 t ha −1 , respectively ( Table 6). These results indicated that more variables in the RF model provided more accurate yield estimation. However, the RF (VIs + phenology) model in the all datasets underestimated the yield when the measured yields were greater than 10 t ha −1 (Figure 6). Additionally, the optimal RF model overestimated yield when the measured yields were less than 7 t ha −1 (Figure 6).

CERES-Rice Model for Yield Estimation
Unlike the RF regression technique for yield estimation proposed in Section 2.4, the CERES-Rice model is a crop growth dynamic model that can simulate yield, aboveground biomass, crop phenology, etc. Based on the dataset of 21 rice cultivars planted over three years in the experimental site, the GLUE method was used to obtain 21 sets of rice cultivar parameters (Table A3). The relationship between the simulated and measured yield was shown in Figure 7. The result showed the good accuracy with the R 2 , RMSE, and MAE values of 0.75 and 0.80, 0.37 and 0.48 t ha −1 , and 0.31 and 0.42 t ha −1 in calibration set and validation set, respectively. The accuracy of CERES-Rice model for yield estimation was slightly better to that of deep convolutional neural networks for rice yield estimation with R 2 and RMSE values of 0.59 and 0.66 t ha −1 , respectively [27]. These results indicated that the yield from the CERES-Rice model provided relatively accurate estimations.  2019). The horizontal error bar refers to ± one standard deviation associated with each mean in the measured yield distributions. The error bar in the CERES-Rice model was not presented here. Due to the same input data and cultivar parameters in the same cultivar with three replicates, the estimated yields obtained from the CERES-Rice model for three replicates were consistent.

Performance Comparison between CERES-Rice Model and the Optimal RF Model
In order to evaluate the robustness and capacity of the optimal RF model that combined VIs and phenological data as inputs, the simulated yield of the 21 rice cultivars planted over three years based on the RF (VIs + phenology) model was compared with that of the CERES-Rice model. Figure 8 shows the relationship between the measured yield and simulated yield obtained from these two models. Compared to the RF (VIs + phenology) model, the slope and R 2 values of the linear regression between simulated yield, based on the CERES-Rice and measured yield, were closer to 1. However, the values of RMSE and MAE, based on the optimal RF model, were comparable to those based on the CERES-Rice model ( Table 7). In addition, the paired-sample t-tests results showed that there were no significant differences between the CERES-Rice model and the optimal RF model for simulated yields in 2019 (p > 0.05, Table 7).

Response of Phenology to Yield
Phenology determines the amount of interception of solar radiation by the rice canopy that is closely associated with grain yield [34]. In this study, 10 phenological indicators were selected to analyze their correlation with yield. It was found that the GDDHM, TI, and SM were significantly correlated (|r| > = 0.41, p < 0.01) with the yield for each year and combined years (Figure 4b). Among the three phenological indicators, GDDHM was positively related to yield across the years (r > = 0.45, p < 0.01). This result is supported by the previous study, which found that GDDmat (from 50% heading to maturity) was positively associated with yield (r = 0.46) in Italian rice germplasm [54]. In our study, the mean daily temperatures during the reproductive period were in the optimum temperature range (20-35 • C) for rice development (Table A4) [55]. The increase in mean daily temperature during this phase could increase GDDHM significantly and then increase the photosynthesis rate, which contributes to rice yield increase [56]. The strong negative relationship was found between TI and yield across the years (r from −0.45 to −0.70, p < 0.01), suggesting that the cultivars with short durations before heading presented a high-level yield. One possible explanation is that cultivars with short durations during the vegetative phase could reduce N uptake to ensure N supply at the reproductive phase, which is beneficial to increase the yield. However, due to the limitations of the observed data, we could not provide any qualitative or quantitative explanation; the relationship among N uptake, duration length, and yield needs to be further studied in rice breeding. The short GDL (i.e., SM) tended to afford high grain yield in the Guanshan rice breeding base (Figure 2), which was consistent with the results by using short growth durations in central China [57]. They found that the grain yield improved by way of desirable traits, e.g., the plants were tall, and the panicles were heavy. Chen et al. [58] also demonstrated that short-duration cultivars had higher spikelet filling rate and higher harvest index than long-duration cultivars in Hunan Province, China; thus, the higher grain yield could be a target for breeding high-yielding rice cultivars with short durations. Climate warming has reduced the rice growth duration in China over the past three decades [59]; shortduration cultivars were proposed for German oats without the stress caused by climate warming [60]. It needs to be pointed out that, despite certain associated yield penalties, the use of short-duration cultivars could be a viable option for adaptation to climate change. Cultivating short-duration cultivars may be beneficial to rice yields by avoiding extreme heat stress at the heading stage (i.e., flowering) [61] and reducing drought exposure [62]. In the long run, the yield benefits brought by the positive shifts of cultivar traits may offset the losses caused by the adverse phenological changes to a certain extent [59]. However, in this study, we only analyzed the relationship between phenology (i.e., GDL and effective accumulated temperature variables) and yield, and we did not try to quantify the decline in yield due to the increase of GDL because pest and disease pressure, a variety of genetic properties such as spikelet fertility and the harvest index are equally important for determining the yields. Since rice breeding depends on both consumer demands and climate in China, further research is necessary to be carried out to determine the reasons for the improved performance of short-duration cultivars by combining measured data or obtaining information that could reflect rice breeding objectives.

Importance of Phenology to RF Model Formulation
At present, the ability to estimate crop yields based on UAV images is an area of active research [63]. The applications of VIs obtained from UAV images, such as NDVI, E × G, and VIgreen, have been reported in rice yield predictions [36,63]. However, there are few studies describing yield estimation based on the UAV-VIs in rice breeding. In this study, the relationships between yield and VIs from UAV-RGB images were analyzed from 2017 to 2019. In terms of variable importance, R/B was the most important for yield estimation in the optimal RF model among the 14 VIs (Figure 9). This was mainly attributed to the consistent performance of R/B on yield across the years (Figure 4a). However, the RF model based on only VIs had an insufficient accuracy and over-fitting problems for yield estimation (Table 6 and Figure 5). There were three main reasons for these results: (1) the variation in Pearson correlation between VIs and yield was relatively large and inconsistent across the three years ( Figure 4a); (2) due to the high vegetation coverage at the heading stage, VIs (e.g., E × G and NRI) became obviously saturated ( Figure A1), and this result was consistent with the findings of Zhou et al. [63] and Gitelson et al. [64]; (3) the effect of VI values on yield was barely quantified because of the variation in the phenology across cultivars in rice breeding. Compared with the RF model using VIs only, adding phenology significantly improved the accuracy of yield estimation (Table 6). Significant benefits were associated with the use of phenological information and VIs in the remote sensing-based soybean and maize yield models [65], which were similar to the results of this study. Yang et al. [27] also pointed out that the combination with the observations of phenological dates could improve model accuracy for yield estimation. In this study, the phenological variables, such as SM and GDDHM, had the determined effect on model estimation accuracy in rice breeding (Figure 9). We demonstrated that phenological data dominated the yield variance, and the data set of VIs provided the secondary contribution to yield estimation. However, the phenological stages of plots were based on manual observation in the experimental site. Recently, the VI data have been widely used to extract phenology. The methodologies based on VI data are mainly shown as two types: (1) curve fitting methods that used Savitzky-Golay filtering, logistic function, and Gaussian function to smooth the time-series VI curves, then extracted phenological information based on its inflection points [66,67]; (2) shape model fitting methods that could smooth VI curves, and then extracted phenology based on optimum scaling coefficients and visual observations [68]. These methodologies based on time-series VI data often relied on high temporal resolutions. Therefore, it was necessary to detect phenology-based on time-series images from UAV to improve the applicability of the RF models for yield estimation in the future.

Comparison between RF and CERES-Rice Models
There were no significant differences in the estimated yield (p > 0.05) between the optimal RF model and the CERES-Rice model based on the paired-samples t-tests (Table 7). The accuracy of estimated yield in these two models was comparable in terms of RMSE and MAE. In fact, the statistical assumptions of these two kinds of models were totally different. The CERES-Rice model was a process-based model that required advanced data in higher costs. In this study, 21 replicated rice cultivars planted for three years were selected to simulate yield by using the CERES-Rice mode. Although, the crop growth model could obtain reliable and satisfactory results in agriculture [19][20][21]. Using crop growth models to simulate yield for breeding programs is still challenged by the variances in phenotypes and genotypes. It is difficult for crop growth models to calculate cultivar parameters in hundreds and even more breeding materials. In contrast, the RF model proposed in this paper, with the use of UAV-based RGB images, was performed at the decreased cost. The simulation results based on the optimal RF model coincide with the measured yield ( Figure 10). In this study, the optimal RF model could be used as a case to prove that the fusion of VIs and phenological data in an RF model were useful and promising for yield estimation in breeding. Similar conclusions could be found in the study of Wan et al. [36], where they found that the fusion of VIs and crop traits in an RF model could improve model accuracy for yield estimation. However, yield was affected by many factors such as crop traits, environmental conditions (heat stress and cold stress), and field management [51,54,55], which made the RF model difficult to apply for large areas. Thus, the feasibility of estimating yield in the field by using phenological data and VIs from high-resolution UAV images should be additionally determined by the more field validation. Future studies are also required to verify the use of phenology and VIs for different ecological areas and crops.
There were some limitations for the RF models to further improve the yield estimation accuracy. For example, the images of multi-growth stages needed to be examined for yield estimation. There were only three bands (R, G, and B) of images used to calculate VIs. Thus, the spectral information of multispectral and hyperspectral sensors mounted on UAS should be captured and further analyzed. In addition, other methodological methods such as machine learning (ML) and deep learning might be applied to improve yield estimation accuracy in rice breeding.

Conclusions
In this study, 14 VIs derived from UAV-based images and phenological data were used to estimate grain yield in rice breeding from 2017 to 2019. Three RF models based on VIs and phenological variables were constructed for yield estimation. The results demonstrated that these models explained 52-70% and 6-53% of grain yield variability, respectively, in calibration and validation sets. VIs from the UAV-RGB imagery had inadequate performances in yield estimation because of differences in phenology across rice cultivars. The RF model that combined VIs and phenological data obtained the best accuracy for yield estimation. Further analysis indicated that phenology dominated the yield variance, and VIs provided a secondary contribution to the estimation of yield in rice breeding. Additionally, there were no significant differences between the simulated yields of the optimal RF model and the CERES-Rice model for 21 replicated cultivars. These results further indicated that the foregoing RF model demonstrated the considerable potential for yield estimation in rice breeding. Future work should be carried out to investigate the RF model with spectral information derived from the multispectral and hyperspectral sensors onboard UAS in multiple growth stages to improve the model robustness and applicability.  Acknowledgments: Thanks to Jingyou Guo and Youdong Hu for providing the details of the experimental design in this study.