Next Article in Journal
Application of UAV-Based Multi-angle Hyperspectral Remote Sensing in Fine Vegetation Classification
Previous Article in Journal
Text Mining in Remotely Sensed Phenology Studies: A Review on Research Development, Main Topics, and Emerging Issues
Open AccessArticle

Establishment of Plot-Yield Prediction Models in Soybean Breeding Programs Using UAV-Based Hyperspectral Remote Sensing

1
Soybean Research Institute/MARA National Center for Soybean Improvement/MARA Key Laboratory of Biology and Genetic Improvement of Soybean/National Key Laboratory for Crop Genetics and Germplasm Enhancement/Jiangsu Collaborative Innovation Center for Modern Crop Production, Nanjing Agricultural University, Nanjing 210095, China
2
Shandong Shofine Seed Technology Co. Ltd., Jiaxiang 272400, China
3
National Engineering Research Center for Information Technology in Agriculture, Beijing 100097, China
*
Author to whom correspondence should be addressed.
Both authors contributed equally to this work and should be considered co-first authors.
Remote Sens. 2019, 11(23), 2752; https://doi.org/10.3390/rs11232752
Received: 16 October 2019 / Revised: 12 November 2019 / Accepted: 19 November 2019 / Published: 22 November 2019

Abstract

Yield evaluation of breeding lines is the key to successful release of cultivars, which is becoming a serious issue due to soil heterogeneity in enlarged field tests. This study aimed at establishing plot-yield prediction models using unmanned aerial vehicle (UAV)-based hyperspectral remote sensing for yield-selection in large-scale soybean breeding programs. Three sets of soybean breeding lines (1103 in total) were tested in blocks-in-replication experiments for plot yield and canopy spectral reflectance on 454~950 nm bands at different growth stages using a UAV-based hyperspectral spectrometer (Cubert UHD185 Firefly). The four elements for plot-yield prediction model construction were studied respectively and concluded as: the suitable reflectance-sampling unit-size in a plot was its 20%–80% central part; normalized difference vegetation index (NDVI) and ration vegetation index (RVI) were the best combination of vegetation indices; the initial seed-filling stage (R5) was the best for a single stage prediction, while another was the best combination for a two growth-stage prediction; and multi-variate linear regression was suitable for plot-yield prediction. In model establishment for each material-set, a random half was used for modelling and another half for verification. Twenty-one two growth-stage two vegetation-index prediction models were established and compared for their modelling coefficient of determination (RM2) and root mean square error of the model (RMSEM), verification RV2 and RMSEV, and their sum RS2 and RMSES. Integrated with the coincidence rate between the model predicted and the practical yield-selection results, the models, MA1-2, MA4-2 and MA6-2, with coincidence rates of 56.8%, 58.5% and 52.4%, respectively, were chosen for yield-prediction in yield-test nurseries. The established model construction elements and methods can be used as local models for pre-harvest yield-selection and post-harvest integrated yield-selection in advanced breeding nurseries as well as yield potential prediction in plant-derived-line nurseries. Furthermore, multiple models can be used jointly for plot-yield prediction in soybean breeding programs.
Keywords: soybean breeding; plot-yield prediction; UAV-based hyperspectral remote sensing; vegetation index; multiple linear regression; determination coefficient (R2); root mean square error (RMSE) soybean breeding; plot-yield prediction; UAV-based hyperspectral remote sensing; vegetation index; multiple linear regression; determination coefficient (R2); root mean square error (RMSE)

1. Introduction

In plant-breeding programs, accurate yield evaluation of breeding lines is the key to release of novel cultivars since yield is always the most important trait among those targeted [1,2]. In conventional breeding processes, two major links are involved, one is derivation of breeding populations with targeted variants through hybridization and/or mutagenesis, the other is selection for the targeted variants through successive field tests integrated with some necessary lab evaluations. In the selection step, precise yield evaluation of the variants or breeding lines mainly depends on precise field experiments, which is often influenced by complicated environmental conditions (mainly soil uniformity) acting on the field plots, especially as the tested number of lines increased. In fact, yield evaluation of breeding lines is the key to successful release of cultivars, which is becoming a serious issue due to soil heterogeneity in enlarged field tests. For raising the experiment precision, a series of experiment designs and corresponding statistical methods were developed in last century, including various incomplete block designs such as blocks in replication design and lattice design which can test hundreds of breeding lines in a same experiment [3]. However, even so, the soil heterogeneity is still difficult to overcome with for a yield test comprising thousands of breeding lines. That requires new techniques to improve the yield-test precision and yield-selection efficiency and effectiveness.
Remote-sensing images have been widely used in the measurement of crop traits, such as plant height, chlorophyll content, leaf area index (LAI), disease susceptibility, drought stress sensitivity, nitrogen content, yield and etc. [4,5,6,7,8,9]. These are based on the differences in spectral reflectance of the canopy among varieties for the above traits [10]. The most of the field-based research for yield estimation models using canopy reflectance and canopy temperature measurements were focused on 2 or 3 band indices, which can be highly variable and inconsistent among breeding lines or varieties [11]. The biophysical/biochemical components of the field population, such as the canopy and leaf structures, may not allow the plant to fully reach its genetic yield potential [4]. Some researchers have suggested that yield gains observed in field crops can be attributed to more efficient photosynthetic parameters in addition to genetic reasons [12,13,14,15]. Studies that focus on utilizing full spectrum instruments in prediction models have been reported in recent years in wheat [16,17,18], corn [19], rice [20], cotton [21] and soybean [22,23] in optimal and/or drought environments.
Hyperspectral remote sensing provides a continuous spectrum with plentiful band information and high-resolution images. Hyperspectral imaging has become a common method used to predict crop traits and yields [24,25]. Hyperspectral remote-sensing data acquired from the ground [26,27,28], unmanned aerial vehicles [10,29,30,31], airborne platforms [32] and satellite platforms [33] can capture crop canopy spectra in narrow bands and thereby provide information on the biophysical/biochemical composition of the canopy. Low-altitude and flexible unmanned aerial vehicles (UAV) provide an important, affordable and low-cost approach to quantify the components of crop phenotyping [34,35] and precision agriculture [36,37]. Therefore, the UAV are becoming critical in high-throughput crop phenotyping of large number of plots and field trials in a near real-time and dynamic way.
In using UAV equipped with an imaging spectrometer, to find band regions that most significantly contribute to yield estimation is the key to the prediction accuracy. Early researches focused on finding new wavelengths and spectral regions that correlated to plant function. Tucker [38] proposed 5 primary and 2 transition regions of the visible and near infrared spectrum to characterize plant functions. Signal-to-noise ratio in remote-sensing research is always a concern, and sensing the reflectance of plant canopies increases this ratio [19]. In addition, not all the hyperspectral reflectance data in a plot, but those from certain sizes of plot can be used to avoid the border influence between plots, thus the optimized sampling area of hyperspectral reflectance in a plot should be identified to minimize the prediction error.
In the study on hyperspectral remote-sensing technology, strategies for high-throughput field-based phenotyping were investigated with different methods, which showed an obvious difference in estimation accuracy. Vegetation indices are used usually to maximize the relationship between certain reflectance wavelengths and plant function when the effect of background noise is well-controlled [39,40]. The most of the vegetation indices correlate with plant parameters such as pigment status, grain yield. OSAVI (optimized soil-adjusted vegetation index), EVI (enhanced vegetation index), RVI (ration vegetation index), PVI (perpendicular vegetation index) and DVI (difference vegetation index) can be used to estimate leaf nitrogen content [8,41], while NDVI (normalized difference vegetation index), RVI and GNDVI (green and near infrared difference vegetation index) have been used extensively to predict yield and other plant functions in many crops using hyperspectral and satellite imagery [42,43,44,45,46,47,48,49,50,51,52,53]. The 10 vegetation indices often used for yield prediction are listed with their full names, formulae and references in Table 1. In literature, the yield prediction model that was constructed from NDVI at the flowering, podding, and seed-filling stages of all breeding lines was the best, with a coefficient of determination (R2) value of 0.66 [28]. Sensitive vegetation indices in the form of NDVI and RVI based on canopy spectral reflectance were suggested to predict the grain yield of soybean by Ma et al [45] and Qi [54], where NDVI was found to have the highest correlation with soybean yield.
In our breeding programs in north China, more than 1000 breeding lines are yield-tested usually for cultivar releasing each year. To make a precise yield evaluation and selection, we took incomplete block design (Blocks in Replication design), additional check plots, precise field management and careful plot-yield harvest. But the plot yield of lines were still fluctuated obviously. We considered using UAV-based hyperspectral remote sensing to predict plot-yield of lines as an auxiliary yield selection tool in addition to the above fine experiment procedures. Thus, the present study aimed at to explore how to establish prediction models for plot-yields in breeding programs for soybeans using UAV-based hyperspectral remote sensing, to establish, validate and select optimal plot-yield prediction models, and then to demonstrate their efficiency and effectiveness in real breeding programs. To fulfill the objective, the major elements in model construction, including the optimal plot sizes for a representative reflectance data set, suitable vegetation indices with their optimal spectral bands, appropriate growth stage for hyperspectral remote sensing data collection and appropriate regression models corresponding to the vegetation indices and spectral bands were studied using plot-yield and UAV-remote-sensing data on four sets of large number of breeding lines in a real soybean breeding program. The selected prediction models were examined for utilization in a real breeding program.

2. Materials and Methods

The whole process of the study includes the following five linked steps: (i) four sets of breeding lines in the real breeding programs were tested and UAV remote-sensed; (ii) the optimal plot sizes for representative reflectance data set were determined; (iii) based on the three of four yield-test breeding line data sets, the optimal vegetation indices along with their optimal spectral bands were analyzed and selected (another set of plant-derived-line data was for validation); (iv) a series of regression models using different vegetation indices (and their combinations) extracted from single or multiple stage hyperspectral remote-sensing data were analyzed for their precision; (v) the selected prediction models were examined with the verification root mean square error (RMSEV) and real breeding selection results. Finally, three best models were selected for yield prediction in 1st- and 2nd-year yield-test as well as for plant-derived-line evaluation which may be used in comprehensive yield selection integrated with the harvested yield records. Please see the flowchart for the UAV data-processing process (Supplementary material Figure S1).

2.1. Plant Materials and Field Experiments

The study was taken along with a real soybean breeding program at Shandong Shofine Seed Technology Co. Ltd. The first-year yield-test in 2015 (1stYYT 2015) with 532 breeding lines, second-year yield-test in 2015 (2ndYYT 2015) with 274 breeding lines, and the second-year yield-test in 2016 (2ndYYT 2016) with 297 breeding lines in a total of 1103 breeding lines were used for establishment and verification of yield prediction models. In addition, a recombinant inbred lines population derived from NN1138-2×KF-1 (named NJRIKY) with 441 lines were used for verification of the prediction models to imitate the selection for plant-derived-line at early breeding stage (Supplementary material Table S1).
The experiments were designed and conducted at Shofine Seed Technology Co. Ltd. in Jining, Shandong, China (E 116°22′10~20″, N 35°25′50″~26′10″) in 2015–2016 as indicated in Table S1 and Figure 1A,B. Each set of lines were tested in a blocks in replication design experiment with three replications using randomized complete blocks design analysis as an approximation [3]. The detailed allocations are listed in Table S1. These breeding lines vary obviously in yield, plant height, growth period, and other agronomic traits. In 2ndYYT 2015, 48 breeding lines were retained in the 2ndYYT 2016, and 165 breeding lines of the 1stYYT 2015 were promoted to the 2ndYYT 2016, these two groups of lines having two years of spectral reflectance and corresponding yield data, therefore, can offer more information for establishing and validation of the prediction models. The planting density was approximately 190,000 plants ha−1. The plot seed yield was measured by harvesting plots with the seed moisture adjusted to 13%, recorded as t ha−1.

2.2. Assembly of the Unmanned Aerial Vehicle (UAV)-Based Hyperspectral Remote-Sensing System

An UAV with eight rotors, flying height around 50 m, equipped with a Cubert UHD185, a Sony digital camera and a position-orientation system, was assembled for taking the hyperspectral reflectance (Figure 2). The total weight of the attached equipment was approximately 470 g and its housing was measured approximately 28 × 6.5 × 7cm. The instrument had a spectral range of 454 nm to 950 nm, a 4-nm spectral sampling interval, an 8-nm spectral resolution at 532 nm, and a total of 125 spectral channels. For each band, a 50 × 50 pixel image with a 12-bit dynamic range (4,096 digital numbers, DN) was created. Inside the camera, the different bands were projected to different parts of a charged coupled device (CCD). At the same time, as the hyper spectrum (HS) image was being recorded, a grayscale image with a resolution of 990 × 1000 pixels was captured.
Before the UAV flight, de-noising and lens distortion correction were completed. A black and white board was used for radiation calibration of the UHD185 (Table S2). For stitching the hyperspectral image, a certain degree of image was overlapped (heading overlap >70%, side overlap >30%). To obtain stable data, the reflectance was taken on the day with calm and cloudless weather.

2.3. Processing of the UAV Hyperspectral Reflectance and Determination of the Reflectance-Sampling Unit-Size in Plots

The software, Cubert-Pilot (Version1.1, Cubert GmbH, Ulm, Germany) and Agisoft Photoscan Pro (Version 1.4, Agisoft LLC, Russia) were used to realize the image mosaic [55,56]. All graphs of the maximum area vector of each plot were used to fit the ArcGIS software (Version10.0, ESRI, Redlands, USA) on the spliced hyperspectral image. The POS (position and orientation system) data, digital image and hyperspectral data were aligned and fused. DOM (digital orthophoto map) data were obtained after four steps of pre-processing: including (i) aligning photos, (ii) building the dense cloud, (iii) building the mesh, (iv) building the texture. The process of obtaining the UAV hyperspectral reflectance data is shown in Figure S1.
Based on the image mosaic, reflectance-sampling unit-size (area in a testing plot) was studied to avoid the boarder influence. The sample unit was centered on the geometric center of each plot to avoid the space vector region beyond the plot boundary. Then, the bandmath module of ENVI software (Version 4.8, HARRIS geospatial, Wokingham, UK) combined with IDL (Interactive Data Language) was used to scale the length and width of the maximum area of each plot by 20 times (unit-sizes were designed using ENVI procedure, Table S3), and thus, 21 reflectance-sampling unit areas were defined, which were then used as vector images to obtain 21 reflectance-sampling unit data sets. The coefficient of variation (CV) of the top three vegetation indexes (NDVI, RVI and VOGI) were calculated for all the 21 reflectance-sampling units using the analysis of variance (ANOVA) procedure (SAS Institute Inc., NC, USA). Based on the relationship between the CV and reflectance-sampling unit-size, the best unit-size was chosen for further analysis.

2.4. Optimization of the Vegetation Indices along with Corresponding Hyperspectral Bands

All two-band combinations (R(x1) and R(x2)) for the 10 most popular vegetation indices related to yield prediction reported in previous literature, including NDVI, RVI, VOGI (Vogelmann red edge index 1) and others in Table 1, within the spectral range of 454~950 nm, were screened and constructed according to the vegetation index formulas.
The contour map of determination coefficients (R2, Equation(1)) was plotted according to the value of R2 completed with “plsregress” function in MATLAB R2010b (MathWorks, Inc., Natick, MA, USA). From the contour map, the sensitive wavebands along with the corresponding indices were identified according to their largest R2 values. The R2 are calculated as follows:
R 2 = 1 i = 0 n ( x i y i ) 2 i = 0 n ( x i x ¯ ) 2
where xi and yi are the measured yield value and the predicted yield value, respectively, x ¯ , is the average value of xi i varies from 0 to n, where n is the number of tested breeding lines.

2.5. Establishment and Verification of the Yield Prediction Models

In this study, three sets of breeding lines in a total of 1103 (1stYYT 2015, 2ndYYT 2015 and 2ndYYT 2016) were tested for their plot yield and canopy spectral reflectance at the full flowering stage (R2), the full podding stage (R4), the initial seed filling stage (R5), and the full seed filling stage (R6) using a UAV-based hyperspectral spectrometer. The software “plsregress” function in MATLAB R2010b randomly takes half of the lines in each test for establishing the yield prediction model and takes the other half for validation of the established model. The soybean yield prediction models were established from the different material sets based on linear and non-linear (curvilinear) regressions. The formula of the root-mean square-error (RMSE) was used to evaluate the precision of the established models, which is as follows:
R M S E = i = 1 n ( x i y i ) 2 n
where xi and yi are the measured yield value and the predicted yield value, respectively, i varies from 0 to n while n is the number of tested breeding lines.
The coefficient of determination in model construction is designated as RM2 and the root mean square error of the model as RMSEM; the coefficient of determination calculated from the other half of the lines is used for validation and designated as RV2 and the root mean square error of the model as RMSEV; both sets of determination coefficient and root mean square error are used to assess the yield prediction model. For a comprehensive evaluation to balance the modelling and verification, these two parts were summed up as RS2 and RMSES. Yield predictions with higher R2 and lower RMSE are deemed to be better ones.
All of these statistics (RM2, RMSEM, RV2, RMSEV) were completed with MATLAB R2010b software (MathWorks, Inc., Natick, MA, USA). The calibration and validation of the established yield model were calculated by using Microsoft Excel 2007 (Microsoft Corporation, Redmond., Washington, USA).

2.6. Superior Plot-Yield Prediction Models Selected for Breeding Programs

Twofold methods were used to verify all of the established yield prediction models. In the first method, all the models were evaluated with the three sets of yield-tested breeding lines using RMSEV summed over the three sets of breeding lines. In the second method, in breeders’ actual yield selection, the breeding line with average yield in a single year less than 3.00 t ha−1 is treated as low-yielding line to be eliminated (Eli), that with average yield in a single year more than 3.75 t ha−1 is treated as high-yielding line to be promoted (Pro) and that with average yield in a single year between 3.00 and 3.75 t ha−1 is treated as intermediate line to be reserved for further observation (Res). According to the selection classification, the prediction values of lines in each of the three sets of tests (1stYYT 2015, 2ndYYT 2015 and 2ndYYT 2016) were grouped into the respective categories and compared with the actual selection results. The coincidence rate between the predicted classification and actual breeding classification was calculated for each of the three yield-tests as well as the total value of the three tests.
Based on the results from the two methods, the superior plot-yield prediction models were determined and also were checked for their utilization in plant-derived line selection.

3. Results

3.1. Field Experiment Precision and Variation among the Tested Breeding Lines

The yield distribution, variation, and coefficient of variation (CV) of the four sets of breeding lines are summarized in Table 2. The average plot seed yield of the breeding lines of the 1stYYT 2015 ranged between 1.83 and 4.99 t ha−1, that of 2ndYYT 2015 ranged from 1.65 to 4.91 t ha−1, that of the 2ndYYT 2016 ranged between 1.72 and 4.41 t ha−1 and the average plot yield of the plant-to-lines of NJRIKY ranged from 1.08 to 3.39 t ha−1, with their genotypic coefficient of variation values of 34.85%, 29.35%, 26.90% and 33.15%, and their error coefficient of variation of 19.18%, 15.89%, 12.81% and 33.31%, respectively. These results indicated that large yield variation existed in the three sets of breeding lines with small experimental errors while there was larger yield variation and experimental error but less mean yield for the NJRIKY plant-derived lines population. Therefore, the data of the three sets of breeding lines were used for the establishment and validation of yield-prediction models from which the established models can fit a relatively wide situation, while those of the NJRIKY was to be used for calibration of the established prediction models to imitate the plant-derived line prediction and selection.

3.2. Analysis for Sensitive Wavebands and Optimal Vegetation Indices for Breeding Line Yield-Prediction

For identifying the hyperspectral reflectance wavebands sensitive to yield, the yield of 2ndYYT 2015 (Figure 3A), 1stYYT 2015 (Figure 3B), NJRIKY test 2015 (Figure 3C) and their corresponding average hyperspectral data at R2, R4, R5 and R6 were analyzed, The wavelengths with maximum and minimum correlation coefficients between spectral reflectance and seed yield were 750~950 nm and 454~710 nm, respectively, for the tests (Figure 3).
Based on the 2ndYYT 2015 (Figure 4A) and 1stYYT 2015 (Figure 4B) data, the contour maps of determination coefficients of linear regression between the two-band NDVI, RVI at R5 stage and yield were established using the “plsregress” function in MATLAB procedure. The dark red area presented the highest correlation zone, and the best sensitive bands for yield-prediction concentrated in the range of 550 nm to 750 nm.
The results of the relationship between vegetation indices and yield at different single growth stages analyzed using MATLAB procedure are listed in Table 3; the sensitive bands of the 1stYYT 2015 at R2, R4, R5 and R6 growth stages were 750 nm and 770 nm, 750 nm and 770 nm, 634 nm and 674 nm and 550 nm and 710 nm, respectively, while those of the 2ndYYT 2015 were 482 nm and 590 nm, 514 nm and 606 nm, 514 nm and 606 nm and 550 nm and 710 nm, respectively. This indicated that the sensitive bands varied greatly between the two breeding line tests for the same growth stage, while the sensitive bands also varied at the different growth stages even for a same yield-test.
Table 3 also shows that the yields of the two tests were all highly correlated with canopy reflectance at R5 stage, with the maximum R2 up to 0.68 and 0.50 respectively, and therefore, the best growth stage to collect the UAV hyperspectral reflectance data for yield-prediction using vegetation indices was at R5, while the spectral sensitive bands for soybean yield-prediction were in 454~850 nm. The other growth stages, R2, R6 and R4, were in turn not as good as R5. The 10 vegetation indices were ranked for each of the growth stages in the two yield-tests according to their determination coefficients, NDVI and RVI were all ranked the top two (Table 3). Since NDVI and RVI based on filtered optimized bands are the two most sensitive indices, they were selected for the establishment of yield-prediction models in this research.

3.3. Optimized Reflectance-Sampling Unit-Size for Organizing the UAV Hyperspectral Reflectance Data

From the UAV reflectance data set of the breeding lines, the hyperspectral data of each plot were obtained using the vector image georeferenced with the hyperspectral image. Twenty-one reflectance-sampling unit-sizes were designed using ENVI procedure combined with IDL language (Table S3), each plot image and vector map at each spatial scale were read, and the 21 datasets of the average spectral reflectance in each plot were extracted (Figure S2). It could be seen that the spectral reflectance of the canopy corresponding to different spatial sampling unit areas was of no significant difference in 550~750 nm of the visible light bands, but the difference was significant in the 750~850 nm near-infrared region. To select the best sampling unit-size of hyperspectral reflectance and eliminate plot marginal effects, the hyperspectral reflectance plot data of 2ndYY T2015, 1stYYT 2015 and NJRIKY at R5 growth stage were used. The CVs of red and near-infrared band, NDVI, RVI and VOG1 of the three tests were also calculated from the spectral information extracted from the 21 different reflectance-sampling unit-sizes. The smaller the value of the coefficient of variation, the better the reflectance-sampling unit-size. Figure 5 showed that the CV of red-band, near-infrared, NDVI, RVI and VOG1 distributed between 0.15~0.18, 0.16~0.18, 0.13~0.14, 0.01~0.02, 0.05~0.06 for 2ndYYT 2015, and 0.12~0.15, 0.11~0.15, 0.15~0.20, 0.03~0.04, 0.04~0.05 for 1stYYT 2015, and 0.83~0.98, 1.10~1.19, 0.37~0.48, 0.05~0.07, 0.05~0.05 for NJRIKY. The CV was larger when the sampling unit area was at the small or large side that was probably because fluctuations caused by too small unit while marginal effect of the sampling area included in a too large unit. However, all the results showed only slight differences of CV among band values and vegetation indices under the 21 different sampling areas. The reflectance-sampling unit-areas with stable CVs were approximately between 2.1~8.1 m2, 1.2~5.2 m2 and 1.0~2.7 m2 for 2ndYYT 2015, 1stYYT 2015 and NJRIKY, respectively (Figure 5). Thus, when the proportion of the sampling unit-size in that of the total plot was between about 20% to 80%, the canopy reflectance data obtained could be used for plot-yield prediction. In the establishment of prediction model below, the upper-side of the optimal sampling unit-area was preferred since all the hyperspectral data can be obtained from one flight and no additional expense was needed.

3.4. Identification of Major Factors for the Establishment of Plot-Yield Prediction Models

In the establishment of plot-yield prediction models, all the material sets were separated into two subsets for mutual checks which was done automatically by the MATLAB software. The materials, in a total of 1,103 lines, were organized and coded as 1stYYT 2015 (A1 + B1), 2ndYYT 2015 (A2 + B2), and 2ndYYT 2016 (A3 + B3) (Table S1), while the total of the three sets of materials was coded as A4 + B4 (= A1 + B1 + A2 + B2 + A3 + B3), A4 (= A1 + A2 + A3) including 551 lines, B4 (= B1 + B2 + B3) including 552 lines. The 165 lines of 1stYYT 2015 were promoted to the second-year yield-test in 2016, which was designated A5, while the 48 lines of the second-year yield-test in 2015 were retained in the second-year yield-test in 2016, which was designated B5. The 165 + 48 = 213 lines in 2015 was designated A6, while the 213 lines in 2016 was designated B6, therefore, A5 + B5 = A6 + B6 = 426 lines. The prediction models were constructed based on A1 + B1 A1, B1, A2 + B2, A2, B2, A3 + B3, A3, B3, A4 + B4, A4, B4, A5, B5, A6 + B6, A6 and B6 in a total of 17 material groups (Tables S1, S4 and S5).
The 17 material sets were used to screen for major factors to be included in yield-prediction models. The exponential, linear and logarithmic regressions with one vegetation index (RVI or NDVI) at R5 were established using Excel 2007 procedure (Tables S4 and S5). The results showed that the difference of R2 between the RVI and NDVI were not significant and the R2 of linear function of all material sets were somewhat larger and more stable. Among the models in Table S4, the linear regression y = 3E-05x + 0.6526 (x = RVI (618, 674)) with R2 of 0.61 and y = −2E-05x + 0.2055 (x = NDVI (618, 674)) with R2 of 0.61 both for A1 + B1 (1stYYT 2015); the two linear regressions composed of NDVI or RVI both with R2 of 0.49 for A2 + B2 (2ndYYT 2015). The similar situation was observed for other material groups, such as A1, B1, A2, B2, etc., which indicates both NDVI and RVI were relevant in the construction of plot-yield prediction models. Based on the aforementioned, a linear function with two vegetation indices (namely NDVI and RVI) at R5 stage was established for the second round of the yield-prediction models assessment (Table S6).

3.5. Establishment and Evaluation of Yield-Prediction Models Using Normalized Difference Vegetation Index (NDVI) and Ration Vegetation Index (RVI) at R5

The second round yield-prediction models were established from the 17 material groups and listed in Table 4 (the model equations listed in Table S7). As indicated above, the program took a random half of the lines for establishing yield-prediction model and the other random half for validation of the established model. Linear models composed of NDVI and RVI at R5 were established for each of the 17 material groups. In Table 4, the established models were evaluated based on their modelling precision, including the modelling determination coefficient RM2 and the modelling root mean square error (RMSEM) and their verification precision, including the verification determination coefficient RV2 and the verification root mean square error (RMSEV). For a comprehensive evaluation to balance the modelling and verification, these two parts were summed up as RS2 and RMSES, respectively. In Table 4, the model MA1 presented the largest RS2 = 1.30, in turn followed by MA1+B1, MB1, MA5, MA2, MA6 and MB5 with RS2 1.21, 1.19, 1.19, 1.13, 1.12 and 1.06. Their corresponding RMSES were 0.541, 0.651, 0.740, 0.580, 0.503, 0.519 and 0.674, respectively. These models were established from modelling sample size from 48 to 266 lines from a single yield-test. As for the models MA4, MB4, and MA4+B4 based on modelling a sample size of 275~551 lines composed from three sets of yield-tests, their RS2 were all 0.91 and RMSES were 0.724, 0.802 and 0.819, respectively. The other models were inferior to the above ones with respect to their precision.

3.6. Establishment and Evaluation of Yield-Prediction Models Using NDVI and RVI at Multiple Stages

The 17 material sets and yield-prediction models in Table 4 involved only two vegetation indices at a single growth stage R5, utilization of more vegetation indices at multiple growth stages might improve the model precision, which was conducted using the MATLAB procedure. From the 1stYYT 2015 and 2ndYYT 2015 data, all the 10 vegetation indices and growth stages were screened for best plot-yield prediction-models, the maximum coefficient of determination for models with the 10 vegetation indices reached 0.69 and 0.59. Since 1stYYT 2015 (A1 and B1) in Table 4 was the material set from which the best model came, its major results are introduced here. The yield-prediction models based on combinations of two growth stages and three growth stages of vegetation index when 9 VIs involved, the maximum of the model R2 was 0.73. The best combination of the three growth stages were R2, R5, and R6; when 10 vegetation index variables were introduced, the maximum model R2 was 0.74. As the number of growth stages and vegetation indices increased, the model RM2 increased but not significantly, Table S6 shows that two growth stages models are better than single-stage models, the combinations of R2 and R5, then R6 and R5, R4 and R5 are in turn better than the others among the two-stage models. However, not very large difference was among the vegetation index numbers involved, so less number (2 vegetation indices) was preferred for simplicity of the models.
Based on the above results, the third round yield-prediction models for the 17 material sets with two growth stages (R5 + R4 for each material set and R5 + R2, R5 + R6 and R5 + R4 for A1 and A6 material sets) and two vegetation indices (NDVI and RVI), in a total of 21 yield-prediction models were established using the MATLAB procedure and then evaluated further. As indicated before, half a set of breeding lines was used for modelling and half set for validation. The results were summarized in Table 5 (the model equations listed in Table S8).
Based on the results that the precision of the yield-prediction models composed of two vegetation indices at two growth stages were better than those composed of two vegetation indices at R5 single growth stage in term of determination coefficient (RM2, RV2 and RS2) and root mean squares error (RMSEM, RMSEV and RMSES) for all the 17 material sets. Among the different material sets, the best set of models were those obtained from 1stYYT 2015, i.e., models of MA1+B1-2, MA1-1, MA1-2, MA1-3 and MB1-2; the second were those from 2ndYYT 2015, i.e., models of MA2+B2-2, MA2-2, MB2-2; the third were those of MA6+B6-2, MA6-1, MA6-2 and MA6-3, but not MB6-2, and MA5-2 and MB5-2; the fourth were those from the total of the three sets of breeding lines, i.e., models of MA4+B4-2, MA4-2, MB4-2. This situation coincides roughly with the situation of the R5 single growth-stage models that the model precision depends on their source materials. Those from a same test were usually better than those from different tests even the sample size (number of total lines) increased, such as MA1+B1-2 and MA2+B2-2 but not MA3+B3-2 are better than MA4+B4-2.
Among the 1stYYT2015 models, the RS2 of MA1-1, MA1-2, MA1-3, MB1-2 and MA1+B1-2 models (-1 means R5 and R2, -2 means R5 and R6, -3 means R5 and R4) were 1.41, 1.34, 1.32, 1.24 and 1.22 with the RMSES 0.457, 0.540, 0.541, 0.640 and 0.631, respectively. Among the 2ndYYT 2015 models, the RS2 of MA2+B2-2, MA2-2 and MB2-2 models were 1.28, 1.17 and 1.00 with the RMSES 0.703, 0.606 and 0.603, respectively. Among the material sets with two years data, the RS2 of MA6+B6-2, MA6-1, MA6-2, MA6-3 and MB6-2 were 1.03, 1.17, 1.15, 1.17, and 0.44 with the RMSES 0.680, 0.517, 0.550, 0.550 and 0.709, respectively. The RS2 of MA5-2 and MB5-2 were 1.26 and 1.09 with their RMSES 0.622 and 0.615, respectively. Among the combined material sets, the RS2 of MA4+B4-2, MA4-2 and MB4-2 models were 0.94, 0.94 and 0.93 with their RMSES 0.761, 0.653 and 0.814. From the above, the superior models were constructed from A1, A1+B1, B1, A2+B2, A5, A6 material sets, the superior growth stage combination was R5+R4, provided the best vegetation index combination was NDVI and RVI. All the models were potential for breeding line yield selection except those of MA3+B3-2, MA3-2, MB3-2 and MB6-2, while MA4+B4-2, MA4-2 and MB4-2 were for further checking.

3.7. Further Comparison and Selection of Best-Fitted Plot-Yield Prediction Models for Yield Breeding Programs

The verification of the models in Table 5 was limited in using the other half of breeding lines in the same material set, while the recognized yield-prediction model was to be used for a broad range of breeding materials, so these models should be further validated with more breeding materials. Our method was twofold: one was to evaluate the verification root-mean-square-errors (RMSEV) for all the breeding line sets tested (in a total of 1103 lines), the other was to evaluate the coincidence between the model-predicted and breeders’ actual yield selection results.
Table 6 shows the results from the evaluation of verification root-mean-square-errors (RMSEV). All the models were evaluated with the three sets of yield-tested breeding lines 1stYYT 2015 (A1 + B1), 2ndYYT 2015 (A2 + B2), 2ndYYT 2016 (A3 + B3) and their total set (A4 + B4). The models MA1-2, MA2+B2-2, MA2-2 and MA6-2 are models with less RMSEV for all the four breeding line sets in addition to higher determination coefficient in Table 5, while the models MA4+B4-2, MA4-2 and MB4-2 were of small RMSEV for all the four material sets but with medium size of determination coefficient in Table 5.
The results of evaluation of the coincidence between the model-predicted and breeders’ actual yield selection are shown in Table 7. The coincidence was good in the four material sets for the above 4 models (MA1-2, MA2+B2-2, MA2-2 and MA6-2) selected from Table 6. After a further comparison comprehensively, the models of MA1-2, MA6-2 and MA4-2 were good in coincidence rates for all the selection categories (eliminated, reserved and promoted) in all the populations and were chosen for utilization in plot-yield prediction in yield breeding programs (see Table 7 and its notes for details).
MA1-2 is a linear model derived from the material set which is a first part with 133 breeding lines of the 1stYYT 2015, with its yield ranging between 1.836 and 4.680 t ha−1, growth period ranging between 99 d and 112 d. MA4-2 is also a linear model derived from the material set which is a first part with 275 breeding lines of the three sets of tests, with its yield ranging between 1.656 and 4.757 t ha−1, growth period ranging between 96 d and 116 d. MA6-2 is also a linear model derived from the material set which is a group of the selected and retained breeding lines from 1stYYT 2015 and 2ndYYT 2015 with two years’ data of 106 breeding lines, with its yield ranging between 2.380 and 4.925 t ha−1, growth period ranging between 101 d and 116 d. The formulae of the three recommended and other prediction models are listed in Table S8 with their corresponding hyperspectral reflectance bands. The three plot-yield prediction models can be used for breeding lines in yield-test nurseries within the corresponding yield and growth period range, single model or all the three models can be used simultaneously in a same yield-test nursery.
In addition, the 21 models in Table 5 were also validated with the NJRIKY (A + B) population to imitate the plant-to-line selection precision. Tables S9 and S10 showed that the above models of MA1-2 and MA4-2 (but not MA6-2) were also suitable for yield-prediction of the plant-derived-line selection.

4. Discussion

From the above, in order to establish prediction models for plot-yields in soybean breeding programs using UAV-based hyperspectral remote sensing, the optimal techniques of the four major elements in model construction were explored, then the plot-yield prediction models were established after five linked steps, with the optimal models selected, such as MA1-2, MA4-2 and MA6-2 for yield-test nurseries and the former two for plant-derived line nursery.
Comparing the present results to those in the literature, our four element results used in five linked steps to obtain the three optimal prediction models based on UAV-hyperspectral reflection are more systematic and advanced in comparison to others. Zhang et al. [28] used the active remote-sensor GreenSeeker to measure the canopy NDVI by the seedling, flowering, podding, and seed-filling stages in a total of 1,272 soybean lines, including the breeding lines and recombinant inbred lines. Among the single stage yield prediction models, the seed-filling stage was the best, having the highest coefficient of determination and lower standard errors. The yield prediction model that was constructed from NDVI at the flowering, podding, and seed-filling stages of all breeding lines was the best with an R2 value of 0.66. Wu et al. [27] obtained the canopy spectral reflectance information of 30 soybean cultivars using the FieldSpec Pro FR2500 Analytical SpectralDevice (ASD) and constructed a large number of spectral parameters. The multiple regression values of the yield obtained with NPH1280 at flowering stage (R2), V_Area1190 at full podding stage (R4), and NPH560 at initial seed filling stage (R5) were found to provide the best yield prediction with an R2 value of 0.68. Qi [54] systematically studied a method to monitor soybean yields based on the FieldSpec Pro FR2500 hyperspectral spectrometer, but the application of the method is limited due to its low accuracy, low efficiency, and inability to obtain the data in real time and for a large area. Sankaran et al [67] found that the vehicle-mounted platform achieved rapid and non-destructive acquisition of plant phenotype information under field conditions. However, this method has a limitation in a crop-planting scheme and low operational efficiency in large areas. Anyway, our results on vegetation index selection (NDVI and RVI), R5 growth stage for remote-sensing are basically in accordance with the previous results, but our results on regression type was consistently multiple linear regression model while curvilinear regressions involved in other reports. The especially meaningful element results in the present study is those of the sampling-unit size of hyperspectral reflectance which is a specific requirement due to the UAV-based remote sensing covering a whole plot influenced by neighboring plots.

4.1. The Major Elements and Potential Utilization of the Established Plot-Yield Prediction Models

Remote sensing (RS) measurements can provide timely information on plant growth and development, responses to dynamic weather conditions and management practices and, therefore, the final crop yield potentials [33]. Based on crop-specific spectral reflectance features, crop yields can be predicted by constructing remote-sensing models that incorporate multiple vegetation indices [68,69,70]. In the present study for establishing plot-yield prediction model based on the capture of UAV-based hyperspectral reflectance data, the major elements were considered as the reflectance-sampling unit-size in a plot, selection of vegetation indices along with their corresponding reflectance spectrum, selection of growth period for capture of hyperspectral reflectance data (these three elements involving hyperspectral reflectance data capture) and selection of regression pattern, combination of vegetation indices and combination of growth periods (these three elements involving hyperspectral reflectance data analysis).
As for the potential utilization of the established plot-yield prediction models, reliable and early assessment of the breeding lines’ yield is of paramount importance. Plant breeders have to rapidly obtain plot yields of a large numbers of lines under field conditions [71,72,73]. The soybean breeders both private and public have to make a decision before starting the winter nursery on their breeding lines to be promoted into the higher rank nursery (promoted), or eliminated, or retained as a repeater, especially since a large number of breeding lines have to be treated in modern plant-breeding programs. The plot-yield prediction models can be of relevant help before harvesting and after harvesting for breeders to treat their breeding lines in a short time. For pre-harvest utilization in the Shandong Shofine Seed Technology Co. Ltd., as the plot-yield prediction models recommended from the present study is concerned, MA1-2, MA6-2 and MA4-2 all involve R5 and R4 two growth stages for remote sensing, there is enough time (about one month from R5 to harvesting) for calculating the predicted plot-yield and field checking. Based on the prediction, the selection plan for breeding lines can be prepared, and some of the inferior lines can be eliminated in advance to save labor for harvesting.
After harvesting with the plot-yield results come out, breeders can make a direct selection of the elite breeding lines according to the harvested plot-yield with reference to the predicted plot-yields. This is especially helpful if the field experiment was damaged due to some reasons and could not provide an exact yield measurement. As indicated above, the models of MA1-2, MA6-2 and MA4-2 can be used for model-assisted selection for yield-tests in higher ranks of nurseries. While MA1-2 and MA6-2 can be used for plant-derived line selection in early nursery. At this stage, the plant-derived-line experiment is usually without replication, therefore, the real field selection with reference to model-based selection must be more efficient and effective than the ordinary procedure.
However, since the environment and breeding lines vary from program to program, the best models may be different from each other, but the established model construction elements and methods can be used to establish local models for pre-harvest yield-selection and post-harvest integrated yield-selection in advanced breeding nurseries as well as yield potential prediction in plant-derived-line nurseries in soybean breeding programs.

4.2. Potential Improvement of Plot-Yield Prediction Models in Soybean Breeding Program

From the present results, different material tests may provide different model precision, such as MA1+B1-2 better than MA3+B3-2, and MA1-2 and MB1-2 better than MA2-2 and MB2-2. Different years (environment) may cause different model precision even for a same material set, such as MA6-2 better than MB6-2. Thus it was recognized that the model precision depends on their source breeding lines, those from a same test was usually better than those from different tests even the sample size (number of total lines) increased, such as MA1+B1 and MA2+B2 (but not MA3+B3) were better than MA4+B4, and different material tests may provide different model precision, such as MA1+B1 is better than MA3+B3, and MA1 and MB1 better than MA2 and MB2, and different year (environment) may cause different model precision even for a same set of breeding lines, such as MA6 better than MB6. Based on the above points, the optimal models were selected as MA1-2, MA2+B2-2, MA2-2 and MA6-2 with less RMSEV and higher RV2 and MA4+B4-2, MA4-2 and MB4-2 with small RMSEV and medium size of RV2, and finally combined with the real breeding decision, MA1-2, MA6-2 and MA4-2 were chosen for utilization in practical breeding programs in Shandong Shofine Seed Technology Co. Ltd.
In choosing the best models, the modelling RM2 and modelling RMSEM (calculated from the random half population), verification RV2 and verification RMSEV (calculated from another half population) and their sums RS2 and RMSES were compared and used. However, the three sets of indicators for MA1-2, MA6-2 and MA4-2 were 0.71, 0.63, 0.55 (RM2) and 0.308, 0.290, 0.381 (RMSEM), 0.63, 0.52, 0.39 (RV2) and 0.232, 0.260, 0.272 (RMSEV) and 1.34, 1.15, 0.94 (RS2) and 0.540, 0.550, 0.653 (RMSES), respectively (Table 5). It is obvious that the determination coefficients are not very high even the RMSEs are relatively low. Therefore, we have to consider how to improve the models for a more precise prediction. Since in the present study we have noticed with regard to the optimal combination of the model construction elements that an increase of vegetative indices in a model did not increase RM2 very much (Table S6) and an increase of hyperspectral reflectance stages did not increase RM2 very much but increased RV2 obviously (Table 4 and Table 5), two additional elements might be potential for the improvement of model precision. One is the precision of the experiment, the other is the representativeness of the breeding lines used for model establishment. In the present study, the error term CVs were 19.18% and 15.89% for 1stYYT 2015 and 2ndYYT 2015, respectively, this is a somewhat larger experiment error, it may have caused the not high enough determination coefficient. However, the error term CV of the 2ndYYT 2016 was 12.81% which is less than the other two yield-tests. The models established from 1stYYT 2015 (A1 + B1) are all better in RM2, but the models constructed from 2ndYYT 2016 (A3 + B3) are all poorer in RM2. While the models based on A4 + B4 which were combined from the three set of the tested breeding lines are all good in RM2. Therefore, experimental precision is not the only reason, it must be related to the representativeness of the breeding lines. Thus, for the establishment of a precise prediction model, both experiment precision and the representativeness of the breeding lines should be well-controlled.

4.3. Innovation Potential of Plant Breeding Nursery System Using UAV-Based Hyperspectral Reflectance Techniques

From the above, it is commonly understood that plant breeding efficiency can be improved by using UAV-based remote-sensing platforms which exhibit a large potential to provide yield-prediction even before harvesting so that the next breeding plan can be arranged in advance [67,74,75]. In the present study, an eight-rotors UAV deployed with digital camera and hyperspectral camera was used for field-based phenotyping for an experiment with thousands of breeding plots. In the results on sampling unit-size, Figure 5 shows that the relationship between the coefficients of variation of the red-band, near-infra-red band, NDVI, RVI and VOG1 and the reflectance-sampling unit-sizes varied like a concave basin, very high CVs at the very small size and the very large size of sampling area, this means too small a sampling unit caused large fluctuation due to the heterogeneity of the canopy and too large sampling unit caused large fluctuation due to the influence of border area between neighboring plots. While the sampling unit located in the central part with size between about 20% to 80% of the plot (2.1~8.1 m2, 1.2~5.2 m2 and 1.0~2.7 m2 for three different tests), the CVs were about the similar without very large difference, indicating the homogeneity of the hyperspectral reflectance between the central 20% to 80% of a plot if the plant in a plot has a normal uniform growth. This means that even 20% of the plot size can obtain the hyperspectral reflectance data as precise as 80% of the plot size. To make sure of the data precision and full-use of the data, we used the larger sampling unit data in our model establishment.
However, the flat or near-flat CV distribution in the central 20%–80% of a plot (Figure 5) implies that in using UAV-based hyperspectral reflectance for plot-yield prediction, the plot size can be reduced to certain size providing the border area influence excluded, and that even the replication number can be reduced if a single plot can be of representativeness. If so, there might be potential in increasing the breeding lines tested or increasing the breeding scope, especially the breeding test scope can be enlarged without worrying about the soil homogeneity challenge to breeding programs. In addition, the yield-testing ability at the plant-derived-line stage can be raised and even the prediction model for first and second-year yield-test can be used for plant-derived-line selection, like the present results that the prediction MA1-2 and MA6-2 can fit for plant-derived line yield prediction. The reason for that is the high density of the hyperspectral reflectance points and canopy homogeneity in a small area.

5. Conclusions

In the establishment of plot-yield prediction models for soybean breeding programs using UAV-based hyperspectral remote sensing, four model construction elements were studied individually with the results being: (i) the suitable sampling unit-size in a plot was the central part of 20%-80% plot size (the high end was used in model construction to make a full use of the information); (ii) NDVI and RVI and their combination along with their best spectra combinations of near-infrared and red spectrum were the best vegetation indices for yield-prediction; (iii) R5 was the best growth stage for a single-period model, while R5 and R4 were the best combination for a two-period prediction model; (iv) linear regression was suitable for plot-yield prediction model construction in comparison to exponential and logarithm regression. Seventeen prediction models composed of NDVI and RVI vegetation indices at R5 growth stage and then 21 prediction models composed of the two vegetation indices at two growth stages (R5 plus another one) were established. In choosing the best models, the modelling RM2 and modelling RMSEM, verification RV2 and verification RMSEV, and their sums RS2 and RMSES were evaluated and compared. Integrated with the coincidence rate between the model-predicted results and the real selection results, the models of MA1-2, MA6-2 and MA4-2 were chosen for utilization in real breeding programs. Here MA1-2 is a linear model appropriate for local yield in 1.836~4.680 t ha−1, a growth period in 99 d~112 d; MA4-2 is also a linear model appropriate for local yield in 1.656~4.757 t ha−1, a growth period in 96 d~116 d; MA6-2 is also a linear model appropriate for local yield in 2.380~4.925 t ha−1, a growth period in 101 d~116 d. The established model construction elements and methods could be used in the establishment of local models for pre-harvest yield-selection and post-harvest integrated yield-selection in advanced breeding nurseries as well as yield potential prediction in plant-derived-line nurseries, furthermore, these models can be used jointly for plot-yield prediction in soybean breeding programs.

Supplementary Materials

The following are available online at https://www.mdpi.com/2072-4292/11/23/2752/s1, Table S1. The experiment design of four sets of lines tested in 2015-2016, Table S2. Main parameters of digital camera and two kinds of hyperspectral reflectance measurement instrument, Table S3. The reflectance-sampling unit-sizes for measuring the UAV hyperspectral reflectance in three yield test experiments, Table S4. The regression models of soybean yield on hyperspectral reflectance in terms of NDVI and RVI at R5 growth stage, Table S5. Regression model codes and data sets included, Table S6. The correlation relationship between yield and different vegetation index combinations at different growth stage combinations in the 1stYYT 2015 experiment, Table S7. The established regression models of yield on R5 single-period UAV hyperspectral reflectance data for various sets of breeding lines, Table S8. The established major plot-yield prediction models using NDVI and RVI constructed from two growth-period UAV hyperspectral reflectance data, Table S9. Comparisons of the verification RMSE in NJRIKY among models listed in Table 5, Table S10. Comparisons of coincidence between the breeders’ actual yield selection results and the model-predicted selection results among the 21 models listed in Table 5 for the NJRIKY yield test. (Coincidence rate expressed in % while actual selection results expressed in number of lines). Figure S1. Flowchart showing the UAV data processing, Figure S2. The canopy spectral reflectance from 21 different reflectance-sampling unit sizes in 2ndYYT 2015 (A), 1stYYT 2015 (B), NJRIKY test 2015 (C).

Author Contributions

Conceptualization, J.Z. and J.G.; methodology, X.Z. (Xiaoyan Zhang) and J.G.; software, X.Z. (Xiaoqing Zhao) and G.Y.; validation, X.Z. (Xiaoyan Zhang), J.L. and J.G.; formal analysis, J.G.; investigation, X.Z. (Xiaoyan Zhang); resources, J.C. and C.L.; data curation, X.Z. (Xiaoyan Zhang) and J.G.; writing—original draft preparation, X.Z. (Xiaoyan Zhang); writing—review and editing, J.G.; visualization, J.G.; supervision, J.Z.; project administration, J.Z.; funding acquisition, J.G.

Funding

This research was funded by the National Key R & D Program for Crop Breeding in China (grant number 2018YFD0100800, 2017YFD0101500, 2017YFD0102002), the Natural Science Foundation of China (grant number 31671718, 31571695), the MOE 111 Project (grant number B08025), Special Fund for Agro-scientific Research in the Public Interest (grant number 201203026), Cyrus Tang Innovation Center for Seed Industry, the MOE Program for Changjiang Scholars and Innovative Research Team in University (grant number PCSIRT_17R55). This work was also supported through the grants from the MARA CARS-04 program, the Jiangsu Higher Education PAPD Program, the Fundamental Research Funds for the Central Universities and the Jiangsu JCIC-MCP. The funders had no role in work design, data collection and analysis, and decision and preparation of the manuscript.

Acknowledgments

The authors are grateful to X. Yao for valuable comments on the manuscript.

Conflicts of Interest

The authors declare no conflict of interest.

References

  1. Zaman-Allah, M.; Vergara, O.; Araus, J.; Tarekegne, A.; Magorokosho, C.; Zarco-Tejada, P.; Hornero, A.; Alba, A.; Das, B.; Craufurd, P.; et al. Unmanned aerial platform-based multi-spectral imaging for field phenotyping of maize. Plant Methods 2015, 11, 35. [Google Scholar] [CrossRef] [PubMed]
  2. Yu, N.; Li, L.; Schmitz, N.; Tian, L.; Greenberg, J.; Diers, B. Development of methods to improve soybean yield estimation and predict plant maturity with an unmanned aerial vehicle based platform. Remote Sens. Environ. 2016, 187, 91–101. [Google Scholar] [CrossRef]
  3. Gai, J. Experiment Statistics; China Agriculture Press: Beijing, China, 2014. [Google Scholar]
  4. Clevers, J. A simplified approach for yield prediction of sugar beet based on optical remote sensing data. Remote Sens. Environ. 1997, 61, 221–228. [Google Scholar] [CrossRef]
  5. Wei, X.; Xu, J.; Guo, H.; Jiang, L.; Chen, S.; Yu, C.; Zhou, Z.; Hu, P.; Zhai, H.; Wan, J. DTH8 suppresses flowering in rice, influencing plant height and yield potential simultaneously. Plant Physiol. 2010, 153, 1747–1758. [Google Scholar] [CrossRef] [PubMed]
  6. Ilker, E.; Tonk, F.A.; Tosun, M.; Tatar, O. Effects of direct selection process for plant height on some yield components in common wheat (Triticum aestivum) genotypes. Int. J. Agric. Biol. 2013, 15, 795–797. [Google Scholar]
  7. Alheit, K.; Busemeyer, L.; Liu, W.; Maurer, H.; Gowda, M.; Hahn, V.; Weissmann, S.; Ruckelshausen, A.; Reif, J.; Würschum, T. Multiple-line cross QTL mapping for biomass yield and plant height in triticale (×Triticosecale Wittmack). Theor. Appl. Genet. 2014, 127, 251–260. [Google Scholar] [CrossRef]
  8. Nigon, T.; Mulla, D.; Rosen, C.; Cohen, Y.; Alchanatis, V.; Knight, J.; Rud, R. Hyperspectral aerial imagery for detecting nitrogen stress in two potato cultivars. Comput. Electron. Agric. 2015, 112, 36–46. [Google Scholar] [CrossRef]
  9. Jay, S.; Maupas, F.; Bendoula, R.; Gorretta, N. Retrieving LAI, chlorophyll and nitrogen contents in sugar beet crops from multi-angular optical remote sensing: Comparison of vegetation indices and PROSAIL inversion for field phenotyping. Field Crops Res. 2017, 210, 33–46. [Google Scholar] [CrossRef]
  10. Yang, G.; Liu, J.; Zhao, C. Unmanned Aerial Vehicle Remote Sensing for Field-Based Crop Phenotyping: Current Status and Perspectives.Front. Plant Sci. 2017, 8, 1111. [Google Scholar] [CrossRef]
  11. Babar, M.; Reynolds, M.; Ginkel, M.; Klatt, A.; Raun, W.; Stone, M. Spectral Reflectance to Estimate Genetic Variation for In-Season Biomass, Leaf Chlorophyll, and Canopy Temperature in Wheat. Crop Sci. 2006, 46, 1046–1057. [Google Scholar] [CrossRef]
  12. Waddington, S.; Ransom, J.; Osmanzai, M.; Saunders, D. Improvement in the yield potential of bread wheat adapted to Northwest Mexico. Crop Sci. 1986, 26, 698–703. [Google Scholar] [CrossRef]
  13. Calderini, D.; Dreccer, M.; Slafer, G. Genetic improvement in wheat yield and associated traits. A re-examination of previous results and the latest trends. Plant Breed. 1995, 114, 108–112. [Google Scholar] [CrossRef]
  14. Sayre, K.; Rajaram, S.; Fischer, R. Yield potential progress in short bread wheat in Northern Mexico. Crop Sci. 1997, 37, 36–42. [Google Scholar] [CrossRef]
  15. Reynolds, M.; Rajaram, S.; Sayre, K. Physiological and genetic changes of irrigated wheat in the post-green revolution period and approaches for meeting projected global demand. Crop Sci. 1999, 39, 1611–1621. [Google Scholar] [CrossRef]
  16. Hansen, P.; Schjoering, J. Reflectance measurement of canopy biomass and nitrogen status in wheat crops using normalized difference vegetation indices and partial least squares regression. Remote Sens. Environ. 2003, 86, 542–553. [Google Scholar] [CrossRef]
  17. Pimstein, A.; Karnieli, A.; Bansal, S.; Bonfil, D. Exploring remotely sensed technologies for monitoring wheat potassium and phosphorus using field spectroscopy. Field Crops Res. 2011, 121, 125–135. [Google Scholar] [CrossRef]
  18. Lobos, G.; Matus, I.; Rodriguez, A.; Romero-Bravo, S.; Araus, J.; Pozo, D. Wheat genotypic variability in grain yield and carbon isotope discrimination under Mediterranean conditions assessed by spectral reflectance. J. Integr. Plant Biol. 2014, 56, 470–479. [Google Scholar] [CrossRef]
  19. Weber, V.; Araus, J.; Cairns, J.; Sanchez, C.; Melchinger, A.; Orsini, E. Prediction of grain yield using reflectance spectra of canopy and leaves in maize plants grown under different water regimes. Field Crops Res. 2012, 128, 82–90. [Google Scholar] [CrossRef]
  20. Lin, W.; Yang, C.; Kuo, B. Classifying cultivars of rice (Oryza sativa L.) based on corrected canopy reflectance spectra data using the orthogonal projections of latent structures (O- PLS) method. Chemom. Intell. Lab. Syst. 2012, 115, 25–36. [Google Scholar] [CrossRef]
  21. Zhao, D.; Reddy, K.; Kakani, V.; Read, J.; Koti, S. Canopy reflectance in cotton for growth assessment and lint yield prediction. Europ. J. Agronomy. 2007, 26, 335–344. [Google Scholar] [CrossRef]
  22. Kaul, M.; Hill, R.L.; Walthall, C. Artificial neural networks for corn and soybean yield prediction. Agric. Syst. 2005, 85, 1–18. [Google Scholar] [CrossRef]
  23. Christenson, B.; Schapaugh, W.; An, N.; Price, K.; Fritz, A. Characterizing changes in soybean spectral response curves with breeding advancements. Crop Sci. 2014, 54, 1585–1597. [Google Scholar] [CrossRef]
  24. Liu, J.; Zhao, C.; Yang, G.; Yu, H.; Zhao, X.; Xu, B.; Niu, Q. Review of field-based phenotyping by unmanned aerial vehicle remote sensing platform. Trans. Chin. Soc. Agric. Eng. 2016, 32, 98–106. [Google Scholar] [CrossRef]
  25. Li, D.; Cheng, T.; Jia, M.; Zhou, K.; Lu, N.; Yao, X.; Tian, Y.; Zhu, Y.; Cao, W. PROCWT: Coupling PROSPECT with continuous wavelet transform to improve the retrieval of foliar chemistry from leaf bidirectional reflectance spectra. Remote Sens. Environ. 2018, 3, 1–14. [Google Scholar] [CrossRef]
  26. Miller, J.; Schepers, J.; Shapiro, C.; Arneson, N.; Eskridge, K.; Oliveira, M.; Giesler, L. Characterizing soybean vigor and productivity using multiple crop canopy sensor readings. Field Crops Res. 2018, 216, 22–31. [Google Scholar] [CrossRef]
  27. Wu, Q.; Qi, B.; Gai, J. A tentative study on utilization of canopy hyperspectral reflectance to estimate anopy growth and seed yield in soybean. Ronomica Sini. 2013, 39, 309–318. [Google Scholar] [CrossRef]
  28. Zhang, N.; Qi, B.; Zhao, J. Prediction for soybean grain yield using active sensor greenseeker. Acta Agron. Sin. 2014, 40, 657–666. [Google Scholar] [CrossRef]
  29. Duan, T.; Chapman, S.; Guo, Y.; Zheng, B. Dynamic monitoring of NDVI in wheat agronomy and breeding trials using an unmanned aerial vehicle. Field Crops Res. 2017, 210, 71–80. [Google Scholar] [CrossRef]
  30. Walter, J.; Edwards, J.; McDonald, G.; Kuchel, H. Photogrammetry for the estimation of wheat biomass and harvest index. Field Crops Res. 2018, 216, 165–174. [Google Scholar] [CrossRef]
  31. Zheng, H.; Cheng, T.; Yao, X.; Deng, X.; Tian, Y.; Cao, W.; Zhu, Y. Detection of rice phenology through time series analysis of ground-based spectral index data. Field Crops Res. 2016, 198, 131–139. [Google Scholar] [CrossRef]
  32. Atzberger, C.; Darvishzadeh, R.; Immitzer, M.; Schlerf, M.; Skidmore, A.; Maire, G. Comparative analysis of different retrieval methods for mapping grassland leaf area index using airborne imaging spectroscopy. Int. J. Appl. Earth Obs. Geoinf. 2015, 43, 19–31. [Google Scholar] [CrossRef]
  33. Campos, I.; González-Gómez, L.; Villodre, J.; González-Piqueras, J.; Suyker, A.; Calera, A. Remote sensing-based crop biomass with water or light-driven crop growth models in wheat commercial fields. Field Crops Res. 2018, 216, 175–188. [Google Scholar] [CrossRef]
  34. Chapman, S.; Merz, T.; Chan, A.; Jackway, P.; Hrabar, S.; Dreccer, M.; Holland, E.; Zheng, B.; Ling, T.; Jimenez-Berni, J. Pheno-Copter: A Low-Altitude, Autonomous Remote-Sensing Robotic Helicopter for High-Throughput Field-Based Phenotyping. Agronomy 2014, 4, 279–301. [Google Scholar] [CrossRef]
  35. Sankaran, S.; Khot, L.; Espinoza, C.; Jarolmasjed, S.; Sathuvalli, V.; Vandemark, G.; Miklas, P.; Carter, A.; Pumphrey, M.; Knowles, N.; et al. Low-altitude, high-resolution aerial imaging systems for row and field crop phenotyping: A review. Eur. J. Agron. 2015, 70, 112–123. [Google Scholar] [CrossRef]
  36. Ballesteros, R.; Ortega, J.; Hernández, D.; Moreno, M. Applications of georeferenced high-resolution images obtained with unmanned aerial vehicles. Part I: Description of image acquisition and processing. Precis. Agric. 2014, 15, 579–592. [Google Scholar] [CrossRef]
  37. Candiago, S.; Remondino, F.; Giglio, D.; Dubbini, M.; Gattelli, M. Evaluating Multispectral Images and Vegetation Indices for Precision Farming Applications from UAV Images. Remote Sens. 2015, 7, 4026–4047. [Google Scholar] [CrossRef]
  38. Tucker, C. A comparison of satellite sensors for monitoring vegetation. Photogramm. Eng. Remote Sens. 1978, 44, 1369–1380. [Google Scholar]
  39. Huete, A.; Didan, K.; Miura, T.; Rodriguez, E.; Gao, X.; Ferreira, L. Overview of the radiometric and biophysical performance of the MODIS vegetation indices. Remote Sens. Environ. 2002, 83, 195–213. [Google Scholar] [CrossRef]
  40. Hatfield, J.; Prueger, J. Value of using different vegetative indices to quantify agricultural crop characteristics at different growth stages under varying management practices. Remote Sens. 2010, 2, 562–578. [Google Scholar] [CrossRef]
  41. Samseemoung, G.; Soni, P.; Jayasuriya, H.; Salokhe, V. Application of low altitude remote sensing (LARS) platform for monitoring crop growth and weed infestation in a soybean plantation. Precis. Agric. 2012, 13, 611–627. [Google Scholar] [CrossRef]
  42. Wiegand, C.; Richardson, A.; Escobar, D.; Gerbermann, A. Vegetation indexes in crop assessment. Remote Sens. Environ. 1991, 35, 105–119. [Google Scholar] [CrossRef]
  43. Peñuelas, J.; Isla, R.; Filella, I.; Araus, J. Visible and near infrared reflectance assessment of salinity effects on barley. Science 1997, 37, 198–202. [Google Scholar] [CrossRef]
  44. Lewis, J.; Rowland, J.; Nadeau, A. Estimating maize production in Kenya using NDVI: Some statistical considerations. Int. J. Remote Sens. 1998, 19, 2609–2617. [Google Scholar] [CrossRef]
  45. Aparicio, N.; Villegas, D.; Casadesus, J.; Araus, J.; Royo, C. Spectral vegetation indices as nondestructive tools for determining durum wheat yield. Agron. J. 2000, 92, 83–91. [Google Scholar] [CrossRef]
  46. Ma, B.; Dwyer, L.; Costa, C.; Cober, E.; Morrison, M. Early prediction of soybean yield from canopy reflectance measurements. Agron. J. 2001, 93, 1227–1234. [Google Scholar] [CrossRef]
  47. Shanahan, J.; Schepers, J.; Francis, D.; Varvel, G.; Wilhelm, W.; Tringe, J.S.; Schlemmer, M.; Major, D. Use of remote-sensing imagery to estimate corn grain yield. Agron. J. 2001, 93, 583–589. [Google Scholar] [CrossRef]
  48. Royo, C.; Villegas, D.; Garcia, D.; Moral, L.; Elhani, S.; Aparicio, N.; Rharrabti, Y.; Araus, J. Comparative performance of carbon isotope discrimination and canopy temperature depression as predictors of genotype differences in durum wheat yield in Spain. Aust. J. Agric. Res. 2002, 53, 561–569. [Google Scholar] [CrossRef]
  49. Royo, C.; Aparicio, N.; Villegas, D.; Casadesus, J.; Monneveux, P.; Araus, J. Usefulness of spectral reflectance indices as durum wheat yield predictors under contrasting Mediterranean conditions. Int. J. Remote Sens. 2003, 24, 4403–4419. [Google Scholar] [CrossRef]
  50. Prasad, B.; Carver, B.; Stone, M.; Babar, M.; Raun, W.; Klatt, A. Genetic analysis of indirect selection for winter wheat grain yield using spectral reflectance indices. Crop Sci. 2007, 47, 1416–1425. [Google Scholar] [CrossRef]
  51. Prasad, B.; Carver, B.; Stone, M.; Babar, M.; Raun, W.; Klatt, A. Potential use of spectral reflectance indices as a selection tool for grain yield in winter wheat under great plains conditions. Crop Sci. 2007, 47, 1426–1440. [Google Scholar] [CrossRef]
  52. Marti, J.; Bort, J.; Slafer, G.; Araus, J. Can wheat yield be assessed by early measurements of normalized difference vegetation index? Ann. Appl. Biol. 2007, 150, 253–257. [Google Scholar] [CrossRef]
  53. Koester, R.; Skoneczka, J.; Cary, T.; Diers, B.; Ainsworth, E. Historical gains in soybean (Glycine max Merr.) seed yield are driven by linear increases in light interception, energy conversion, and partitioning efficiencies. Exp. Bot. 2014, 65, 3311–3321. [Google Scholar] [CrossRef] [PubMed]
  54. Qi, B. A Study on Prediction Technology of Yield and Vegetative Growth Using Hyperspectral Remote Sensing in Soybean Breeding. Ph.D. Thesis, Nanjing Agricultural University, Nanjing, China, 2014. (In Chinese with English Abstract). [Google Scholar]
  55. Turner, D.; Lucieer, A.; Watson, C. An automated technique for generating georectified mosaics from ultra-high resolution Unmanned Aerial Vehicle (UAV) imagery, based on Structure from Motion (SFM) point clouds. Remote Sens. 2012, 4, 1392–1410. [Google Scholar] [CrossRef]
  56. Zhao, X.; Yang, G.; Liu, J.; Zhang, X. Estimation of soybean breeding yield based on optimization of spatial scale of UAV hyperspectral image. Trans. Chin. Soc. Agric. Eng. 2017, 33, 110–116. [Google Scholar] [CrossRef]
  57. Rouse, J., Jr.; Haas, R.; Schell, J.; Deering, D. Monitoring vegetationsystems in the great plains with Erts. NASA 1974, 351, 309–317. [Google Scholar]
  58. Pearson, R.L.; Miller, L.D. Remote mapping of standing crop biomass for estimation of the productivity of the short-grass Prairie, Pawnee National Grasslands, Colorado[C]//1371146123. In Proceedings of the Eighth International Symposium on Remote Sensing of Environment, Ann Arbor, MI, USA, 2–6 October 1972; Willow Run Laboratories, Environmental Research Institute of Michigan. pp. 1357–1381. [Google Scholar]
  59. Vogelmann, J.; Rock, B.; Moss, D. Red edge spectral measurements from sugar maple leaves. Title Remote Sens. 1993, 14, 1563–1575. [Google Scholar] [CrossRef]
  60. Gitelson, A.; Merzlyak, M.; Lichtenthaler, H. Detection of red edge position and chlorophyll content by reflectance measurements near 700 nm. J. Plant Physiol. 1996, 148, 501–508. [Google Scholar] [CrossRef]
  61. Gitelson, A.; Merzlyak, M. Spectral reflectance changes associated with autumn senescence of aesculus hippocastanum, L. and acer platanoides, L. leaves. spectral features and relation to chlorophyll estimation. J. Physiol. 1994, 143, 286–292. [Google Scholar] [CrossRef]
  62. Richardson, A.; Wiegand, C. Distinguishing vegetation from soil background information. Photogramm. Eng. Remote Sens. 1977, 43, 1541–1552. [Google Scholar]
  63. Roujean, J.; Breon, F. Estimating PAR absorbed by vegetation from bidirectional reflectance measurements. Remote Sens. Environ. 1995, 51, 375–384. [Google Scholar] [CrossRef]
  64. Rondeaux, G.; Steven, M.; Baret, F. Optimization of soil-adjusted vegetation indices. Remote Sens. Environ. 1996, 55, 95–107. [Google Scholar] [CrossRef]
  65. Huete, A.; Justice, C.; Liu, H. Development of vegetation and soil indices for MODIS-EOS. Remote Sens. Environ. 1994, 49, 224–234. [Google Scholar] [CrossRef]
  66. Broge, N.; Mortensen, J. Deriving green crop area index and canopy chlorophyll density of winter wheat from spectral reflectance data. Remote Sens. Environ. 2002, 81, 45–57. [Google Scholar] [CrossRef]
  67. Sankaran, S.; Khot, L.R.; Carter, A. Field-based crop phenotyping: Multispectral aerial imaging for evaluation of winter wheat emergence and spring stand. Comput. Electron. Agric. 2015, 118, 372–379. [Google Scholar] [CrossRef]
  68. Gonzalez-Dugo, V.; Hernandez, P.; Solis, I.; Zarco-Tejada, P. Using High-Resolution Hyperspectral and Thermal Airborne Imagery to Assess Physiological Condition in the Context of Wheat Phenotyping. Remote Sens. 2015, 7, 13586–13605. [Google Scholar] [CrossRef]
  69. Overgaard, S.; Isaksson, T.; Kvaal, K.; Korsaeth, A. Comparisons of two hand-held, multispectral field radiometers and a hyperspectral airborne imager in terms of predicting spring wheat grain yield and quality by means of powered partial least squares regression. J. Near Infrared Spectrosc. 2010, 18, 247–261. [Google Scholar] [CrossRef]
  70. Yu, K.; Kirchgessner, N.; Grieder, C.; Walter, A.; Hund, A. An image analysis pipeline for automated classification of imaging light conditions and for quantification of wheat canopy cover time series in field phenotyping. Plant Methods 2017, 13. [Google Scholar] [CrossRef]
  71. Araus, J.; Cairns, J. Field high-throughput phenotyping: The new crop breeding frontier. Trends Plant Sci. 2014, 19, 52–61. [Google Scholar] [CrossRef]
  72. White, J.; Andrade-Sanchez, P.; Gore, M.; Bronson, K.; Coffelt, T.; Conley, M.; Feldmann, K.; French, A.; Heun, J.; Hunsaker, D. Field-based phenomics for plant genetics research. Field Crop Res. 2012, 133, 101–112. [Google Scholar] [CrossRef]
  73. Deery, D.; Jimenez-Berni, J.; Jones, H.; Sirault, X.; Furbanks, R. Proximal remote sensing buggies and potential applications for field-based phenotyping. Agronomy 2014, 5, 349–379. [Google Scholar] [CrossRef]
  74. Pinter, P.J., Jr.; Hatfield, J.; Schepers, J.; Barnes, E.; Moran, M.; Daughtry, C.; Upchurch, D. Remote sensing for crop management. Photogramm. Eng. Remote Sens. 2003, 69, 647–664. [Google Scholar] [CrossRef]
  75. Zhao, C. Advances of Research and Application in Remote Sensing for Agriculture. Trans. Chin. Soc. Agric. Mach. 2014, 45, 277–293. [Google Scholar] [CrossRef]
Figure 1. The field experiment and canopy hyperspectral reflectance measurement using an unmanned aerial vehicle (UAV) equipped with a remote-sensing monitoring system. (A) A map showing Jiaxiang district in Jining City, Shandong Province. (B) An unmanned aerial vehicle image of 894 soybean plots of the second-year yield-test (2ndYYT 2016) field (acquired on 2 August 2016). The resolution of the UAV is 0.01 m while the flight altitude is 50 m. The extraction area of each plot is 2.1~8.1 m2 in different yield-tests, the number of spectral points collected per plot was 21,000~81,000 (2.1~8.1 m2/(0.01 × 0.01)).
Figure 1. The field experiment and canopy hyperspectral reflectance measurement using an unmanned aerial vehicle (UAV) equipped with a remote-sensing monitoring system. (A) A map showing Jiaxiang district in Jining City, Shandong Province. (B) An unmanned aerial vehicle image of 894 soybean plots of the second-year yield-test (2ndYYT 2016) field (acquired on 2 August 2016). The resolution of the UAV is 0.01 m while the flight altitude is 50 m. The extraction area of each plot is 2.1~8.1 m2 in different yield-tests, the number of spectral points collected per plot was 21,000~81,000 (2.1~8.1 m2/(0.01 × 0.01)).
Remotesensing 11 02752 g001
Figure 2. A DJI Spreading Wings S1000+ equipped with Cubert UHD185 (for obtaining stable soybean canopy hyperspectral reflectance data) and Sony DSC-QX100 (For hyperspectral image stitching correction).
Figure 2. A DJI Spreading Wings S1000+ equipped with Cubert UHD185 (for obtaining stable soybean canopy hyperspectral reflectance data) and Sony DSC-QX100 (For hyperspectral image stitching correction).
Remotesensing 11 02752 g002
Figure 3. Correlation between canopy spectral reflectance and soybean plot yield in 2ndYYT 2015 (A), 1stYYT 2015 (B), NJRIKY test 2015 (C).
Figure 3. Correlation between canopy spectral reflectance and soybean plot yield in 2ndYYT 2015 (A), 1stYYT 2015 (B), NJRIKY test 2015 (C).
Remotesensing 11 02752 g003
Figure 4. The contour map of determination coefficients (R2) in linear regression of plot yield on any two-band NDVI and RVI at R5 stage in the 2ndYYT 2015 (A) and 1stYYT 2015 (B). Zone a and Zone b (dark red) are the high correlation zone which showing that the sensitive band is located between 550 nm and 750 nm.
Figure 4. The contour map of determination coefficients (R2) in linear regression of plot yield on any two-band NDVI and RVI at R5 stage in the 2ndYYT 2015 (A) and 1stYYT 2015 (B). Zone a and Zone b (dark red) are the high correlation zone which showing that the sensitive band is located between 550 nm and 750 nm.
Remotesensing 11 02752 g004
Figure 5. Coefficient of error variation (CV) of the hyperspectral reflectance values at red and near-infrared band and CV of the three vegetation indices values varied with sampling areas at R5 stage. (A2 + B2 = 2ndYYT 2015, A1 + B1 = 1stYYT 2015 and A + B = NJRIKY).
Figure 5. Coefficient of error variation (CV) of the hyperspectral reflectance values at red and near-infrared band and CV of the three vegetation indices values varied with sampling areas at R5 stage. (A2 + B2 = 2ndYYT 2015, A1 + B1 = 1stYYT 2015 and A + B = NJRIKY).
Remotesensing 11 02752 g005
Table 1. The spectral vegetation indices used in the present study.
Table 1. The spectral vegetation indices used in the present study.
Vegetation IndexFull Name of IndexAlgorithm FormulaReference
NDVINormalized Difference Vegetation Index(Rx1Rx2)/(Rx1 + Rx2)[57]
RVIRatio Vegetation IndexRx1/Rx2[58]
VOG1Vogelmann Red Edge Index 1R740/R720[59]
GNDVIGreen Normalized Difference Vegetation Index(R780R550)/(R780 + R550)[60]
NDVI705Normalized Difference Vegetation Index705(R750R705)/(R750 + R705)[61]
PVIPerpendicular Vegetation Index(RNIR − aRRed − b)/(1 + a2)[62]
RDVIRenormalized Difference Vegetation Index(R800R670)/(R800 + R670)[63]
OSAVIOptimized Soil-Adjusted Vegetation Index(1 + 0.16)(R800R670)/(R800 + R670 + 0.16)[64]
EVIEnhanced Vegetation Index2.5(RNIRR680)/(1 + RNIR + 6R680 − 7.5R460)[65]
DVIDifference Vegetation IndexRNIR − RRed[66]
Note: Rx1 and Rx2 represent hyperspectral reflectance bands in near infrared and visible red region, respectively.R740, R720, R780, etc. represent hyperspectral reflectance of bands at 740, 720 and 780 nm, etc.
Table 2. The frequency distribution of plot yield means averaging over replications in three sets of soybean breeding lines and one set of plant-to-lines tested in 2015–2016.
Table 2. The frequency distribution of plot yield means averaging over replications in three sets of soybean breeding lines and one set of plant-to-lines tested in 2015–2016.
Material
Data Set
Class Limit (t ha−1) Range
(t ha−1)
Mean
(t ha−1)
GCV
(%)
CV
(%)
F-Value
<2.02.0–2.32.3–2.62.6–2.92.9–3.23.2–3.53.5–3.83.8–4.1>4.1Σ
1stYYT 2015628535983808672655321.83–4.993.3234.8519.183.30 **
2ndYYT 201512172542445351392741.65–4.913.5029.3515.893.41 **
2ndYYT 20166931588059351632971.72–4.413.0626.9012.814.87 **
NJRIKY2015166121101371150004411.08–3.392.1433.1533.310.99 **
Note: : sum; CV: coefficient of variation; GCV: genotypic coefficient of variation; ** indicates significance at 0.01 level. 1stYYT 2015, 2ndYYT 2015, 2ndYYT 2016 and NJRIKY 2015: the first-year yield-test in 2015, the second-year yield-test in 2015, the second-year yield-test in 2016, and the NJRIKY (plant-to-lines population) yield-test in 2015, respectively.
Table 3. The sensitive bands and determination coefficient ranks of the 10 vegetation indices calculated from hyperspectral reflectance and plot yield in two sets of breeding lines yield-tested in 2015.
Table 3. The sensitive bands and determination coefficient ranks of the 10 vegetation indices calculated from hyperspectral reflectance and plot yield in two sets of breeding lines yield-tested in 2015.
ItemR2R4R5R6
Breeding line yield-test1s
tYYT
2nd
YYT
1s
tYYT
2nd
YYT
1st
YYT
2nd
YYT
1st
YYT
2nd
YYT
Sensitive band (nm)λ1750482750514634514550550
λ2770590770606674606710710
Vegetation indexNDVI11222122
RVI22111211
GNDVI44449939
PVI599101010103
OSASI37354454
EVI910567546
RDVI63693865
VOG188888778
DVI10610353910
NDVI70575776687
Maximum R20.580.080.360.190.680.500.540.33
Note: λ1 and λ2: two sensitive bands. R2, R4, R5 and R6: growth stages of soybean at the full flowering stage (R2), the full podding stage (R4), the initial seed filling stage (R5), and the full seed filling stage (R6). 1st YYT and 2nd YYT: the first-year yield-test in 2015, the second-year yield-test in 2015.
Table 4. Comparisons among the regression models of yield on R5 single-period UAV hyperspectral reflectance data for various sets of breeding lines.
Table 4. Comparisons among the regression models of yield on R5 single-period UAV hyperspectral reflectance data for various sets of breeding lines.
Model CodeSensitive Band (nm)Material No.Model Precision Verification PrecisionSum Precision
λ1λ2ModelVerifi-CationRM2RMSEM
(t ha−1)
RV2RMSEV
(t ha−1)
RS2RMSES
(t ha−1)
MA1+B16186742662660.680.4100.530.2411.210.651
MA16386741331330.720.3000.580.2411.300.541
MB16346781331330.700.3870.490.3531.190.740
MA2+B25146061371370.600.3820.420.2611.020.643
MA251461468690.700.3310.430.1721.130.503
MB251458268690.450.4200.250.4110.700.831
MA3+B35345701481490.250.4050.130.4070.380.812
MA353857074740.330.3730.220.4070.550.780
MB349075474750.350.3820.050.3910.400.773
MA4+B44866185515520.460.4540.450.3550.910.809
MA45707302752760.520.3770.390.3470.910.724
MB44946182762760.510.4650.400.3480.910.812
MA5486586165116510.700.3560.490.2241.190.580
MB547873848 148 10.680.3780.380.2961.060.674
MA6+B65547302132130.500.4290.390.3380.890.767
MA66386661061070.610.3010.510.2181.120.519
MB66947221061070.300.3620.110.3700.410.732
Note: The established model equations are listed in Table S7. λ1 and λ2 are the two sensitive bands. RMSE is root mean square error. In the Model Precision column, RM2 is the model determination coefficient, RMSEM is the model root mean square error calculated from the difference between the predicted value and the observed value in the lines set from which the model is developed. In the Verification Precision column, RV2 is the verification determination coefficient, RMSEV is the verification root mean square error calculated from the difference between the value predicted from the established model and the observed value in the lines set used for verification. In the Sum Precision column, RS2 and RMSES are sums of model and verification determination coefficient (RM2 + RV2) and root mean square error (RMSEM + RMSEV), respectively. The same is true for later tables. 1 These two material sets were tested two years, therefore, the number of observations for modelling and verification are two times of the number of lines.
Table 5. The major comprehensive yield prediction models using NDVI and RVI constructed from two growth-period UAV hyperspectral reflectance data.
Table 5. The major comprehensive yield prediction models using NDVI and RVI constructed from two growth-period UAV hyperspectral reflectance data.
ModelSensitive Bands (nm)Material No.Model PrecisionVerification PrecisionSum Precision
R5
λ1
R5
λ2
R4
λ1
R4
λ2
Mo-delVerificationRM2RMSEM
(t ha−1)
PRV2RMSEV
(t ha−1)
PRS2RMSEs
(t ha−1)
MA1+B1-2 (R5 + R4)6186747507702662660.710.3642.68E-630.510.2671.84E-471.220.631
MA1-1 (R5 + R2)6386747227301331330.740.3152.36E-350.670.1428.98E-331.410.457
MA1-2 (R5 + R4)6386745548501331330.710.3081.57E-340.630.2326.98E-281.340.540
MA1-3 (R5 + R6)6386745866981331330.730.3331.53E-340.590.2085.62E-281.320.541
MB1-2 (R5 + R4)6346787547701331330.710.3851.94E-330.530.2554.44E-231.240.640
MA2+B2-2 (R5 + R4)5146066186701371370.650.3489.88E-280.630.3552.91E-151.280.703
MA2-2 (R5 + R4)51461451857068690.680.2932.73E-150.490.3132.40E-091.170.606
MB2-2 (R5 + R4)51458278685068690.610.3749.93E-130.390.2298.72E-091.000.603
MA3+B3-2 (R5 + R4)5345707067141481490.290.4311.62E-100.120.3370.00010.410.768
MA3-2 (R5 + R4)53857063473074740.420.4259.76E-080.310.1130.0030.730.538
MB3-2 (R5 + R4)49075470271474750.290.4113.22E-050.190.3250.00010.480.736
MA4+B4-2 (R5 + R4)4866185547425515520.520.4451.41E-850.420.3168.75E-650.940.761
MA4-2 (R5 + R4)5707305547422752760.550.3811.24E-400.390.2722.76E-340.940.653
MB4-2 (R5 + R4)4946186426782762760.500.4753.08E-400.430.3395.58-340.930.814
MA5-2 (R5 + R4)486586622742165 1165 10.670.3591.42E-370.530.2631.75E-271.260.622
MB5-2 (R5 + R4)47873863473848 148 10.680.3452.93E-100.410.2707.49E-071.090.615
MA6+B6-2 (R5 + R4)5547306227382132130.570.4022.90E-370.460.2781.09E-301.030.680
MA6-1 (R5 + R2)6386667547701061070.630.3031.88E-210.540.2141.86E-191.170.517
MA6-2 (R5 + R4)6386667547741061070.630.2904.71E-210.520.2604.35E-171.150.550
MA6-3 (R5 + R6)6386665547101061070.640.3015.77E-220.530.2491.89E-191.170.550
MB6-2 (R5 + R4)6947227067741061070.330.3971.67E-080.110.3120.00040.440.709
Note: The established model equations are listed in Table S8. λ1 and λ2: two sensitive bands. MA1-1, MA1-2 and MA1-3: the models based on the yield of A1 material set and the corresponding hyperspectral reflectance data of R5 and R2, R5 and R4, R5 and R6, respectively; MA6-1, MA6-2 and MA6-3: the models based on the yield of A6 material set and the corresponding hyperspectral reflectance data of R5 and R2, R5 and R4 and R5 and R6, respectively; MA4+B4-2, MA4-2, MB4-2, MA1+B1-2, MA6+B6-2, MA2+B2-2, MA3+B3-2, etc.: the models based on the yield of A4+B4, A4, B4, A1+B1, A6+B6, A2+B2, A3+B3 etc. material set and the corresponding hyperspectral reflectance data of R5 and R4, respectively. RM2, RV2, RS2, RMSE, RMSEM and RMSEV: the same as in Table 4. P: P values of model significant test, expressed in exponential notation, such as, 2.68E-63, that is 2.68 multiplied by 10−63. 1 These two material sets were tested for two years, therefore, the number of observations for modelling and verification are two times the number of lines.
Table 6. Comparisons of the verification RMSE in A1 + B1, A2 + B2, A3 + B3 and A4 + B4 among models in Table 5.
Table 6. Comparisons of the verification RMSE in A1 + B1, A2 + B2, A3 + B3 and A4 + B4 among models in Table 5.
ModelGrowth Period Range (d)Yield Range/
(t ha−1)
RMSEV of (A1 + B1)
(t ha−1)
RMSEV of (A2 + B2)
(t ha−1)
RMSEV of (A3 + B3)
(t ha−1)
RMSEV of (A4 + B4)
(t ha−1)
MA1+B1-299~1131.831~4.9950.4400.5360.9320.632
MA1-199~1121.836~4.6800.4731.037--
MA1-299~1121.836~4.6800.4330.4860.6630.517
MA1-399~1121.836~4.6800.4630.509--
MB1-299.7~1131.831~4.9950.4600.5471.6200.940
MA2+B2-2103~1161.656~4.9170.5870.4280.5610.545
MA2-2106~1161.656~4.7570.5450.4211.1370.732
MB2-2103~1162.043~4.9171.5551.6351.6551.604
MA3+B3-296~1161.724~4.4106.6516.9405.2606.390
MA3-296~1161.724~4.3041.8812.0291.6941.873
MB3-299~1151.820~4.4100.8433.7950.4371.996
MA4+B4-296~1161.656~4.9950.4420.4620.4750.457
MA4-296~1161.656~4.7570.4750.4560.4540.465
MB4-299~1161.820~4.9950.4560.4710.4710.464
MA5-299~1142.380~4.9250.9561.3462.2141.488
MB5-296~1163.283~4.5580.7081.3850.5010.888
MA6+B6-296~1162.380~4.9250.5810.5330.4440.536
MA6-1101~1162.380~4.9250.5220.553--
MA6-2101~1162.380~4.9250.5010.5471.0220.690
MA6-3101~1162.380~4.9250.5680.702--
MB6-296~1162.380~4.9250.8622.0710.4281.215
Note: RMSEV: the verification RMSE value. Model: All models are listed in Table 5.
Table 7. Comparisons of coincidence between the breeders’ actual yield selection results and the model-predicted selection results among the models listed in Table 5 for the three yield-tests in 2015–2016 (Coincidence rate expressed in % while actual selection results expressed in number of lines).
Table 7. Comparisons of coincidence between the breeders’ actual yield selection results and the model-predicted selection results among the models listed in Table 5 for the three yield-tests in 2015–2016 (Coincidence rate expressed in % while actual selection results expressed in number of lines).
Model A1 + B1A2 + B2A3 + B3A4 + B4
EliResProSumEliResProSumEliResProSumEliResProSum
Actual selection1772031525326011896274142131242973794522721103
MA1+B1-269.556.763.262.831. 749.265.651.120.49.1013.845.140.958.546.7
MA1-181.458.615.153.840.048.338.543.1----44.338.922.136.6
MA1-266.756.271.764.133.353.4734.056.259.233.616. 744.458.648. 967.7556.8
MA1-3100.00033.3100.00021.9----100.00055.2
MB1-284.254.252.063.533.344.980.254.799.30047.581.836.157.457.0
MA2+B2-223.245.871.745.735.060.278.161.011.384.934. 845.820.661.170.649.5
MA2-229.468.048.749.640.073.755.259.91.418.995.716.520.655.354. 843.3
MB2-21.1099.328.800100.035.000100.07.70.5099.324.7
MA3+B3-200100.028.600100.035.000100.07.70099.624.6
MA3-29.060.124.332.938.327.166.743.444.462.930.451.526.952.439.740.5
MB3-291.08.4033.556.759.37.340.573.241.717.454.978.931.44.041.0
MA4+B4-264.460.662.562.471.755.938.553.341.670. 58.751.957.062.449.357.3
MA4-261.668.039.557.750.055.983.364.233.187.8054.649.170.651.558.5
MB4-263.360.161.861.770.054.250.056.247.960.68.750.558.658.952.957.3
MA5-294.433.527.652.196.714.47.329.997.20.8046.895.819.018.045.2
MB5-2081.84.032.30010035.033.873.513.049.812.758.238.637.7
MA6+B6-217.547.376.345.716. 743.294.855.547.265.9051.928.551.876.149.8
MA6-12.335.594.141.2038.191.748.5----1.125.984.931.9
MA6-242.947.382.956.016.744.982.351.895.81.5046.558.633.475.452.4
MA6-331.151.784.954.310.042.487.551.1----16.134.378.338.9
MB6-294.419.2038.773.339.88.336.166.259.930.460.680.536.55.544.0
Note: Comparisons of consistence between the breeders’ actual yield selection results and the model-predicted yield selection results among the 21 models are listed in this table; the breeding lines were treated as to be eliminated (Eli, yields lower than 3.00 t ha−1) to be reserved (Res, yields between 3.00 t ha−1 and 3.75 t ha−1) and to be promoted (Pro, yields above 3.75 t ha−1) in A1 + B1, A2 + B2 and A3 + B3. The models used in model-predicted yield selection are those listed in Table 5 and Table S8.
Back to TopTop