UAV Remote Sensing Estimation of Rice Yield Based on Adaptive Spectral Endmembers and Bilinear Mixing Model

The accurate estimation of rice yield using remote sensing (RS) technology is crucially important for agricultural decision-making. The rice yield estimation model based on the vegetation index (VI) is commonly used when working with RS methods, however, it is affected by irrelevant organs and background especially at heading stage. The spectral mixture analysis (SMA) can quantitatively obtain the abundance information and mitigate the impacts. Furthermore, according to the spectral variability and information complexity caused by the rice cropping system and canopy characteristics of reflection and scattering, in this study, the multi-endmember extraction by the pure pixel index (PPI) and the nonlinear unmixing method based on the bandwise generalized bilinear mixing model (NU-BGBM) were applied for SMA, and the VIE (VIs recalculated from endmember spectra) was integrated with abundance data to establish the yield estimation model at heading stage. In two paddy fields of different cultivation settings, multispectral images were collected by an unmanned aerial vehicle (UAV) at booting and heading stage. The correlation of several widely-used VIs and rice yield was tested and weaker at heading stage. In order to improve the yield estimation accuracy of rice at heading stage, the VIE and foreground abundances from SMA were combined to develop a linear yield estimation model. The results showed that VIE incorporated with abundances exhibited a better estimation ability than VI alone or the product of VI and abundances. In addition, when the structural difference of plants was obvious, the addition of the product of VIF (VIs recalculated from bilinear endmember spectra) and the corresponding bilinear abundances to the original product of VIE and abundances, enhanced model reliability. VIs using the near-infrared bands improved more significantly with the estimation error below 8.1%. This study verified the validation of the targeted SMA strategy while estimating crop yield by remotely sensed VI, especially for objects with obvious different spectra and complex structures.


Introduction
Rice (Oryza sativa L.) is one of the largest staple food crops in the world, and it feeds approximately half of the global population [1,2]. Accurate yield estimation can give references to the adjustment of the pattern of farming of the rice producing areas. Resorting to the remote sensing (RS) technique, such as unmanned aerial vehicle (UAV) images, the acquisition of yield in advance, non-destructively and cost-efficiently, benefits for coping with fluctuations of the grain trade market and ensuring state food security [3][4][5]. Vegetation indices (VIs), calculated from the remote sensing spectrum with mathematical methods, can reflect the vegetation growth situation as simple and empirical metrics [6,7]. Additionally, the changes of crops captured by spectral measures, directly determine their ultimate yield. Therefore, VIs have also exhibited good potential in remote estimation of crop yield especially at large scales [8,9]. Parametric regression models based on VIs are by far the oldest and largest group of variable estimation approaches [10], including simple linear functions and complex non-linear functions in general [11]. Linear regression between VIs and yield has been proposed to accurately estimate the yield of many cash crops, such as wheat [12], cotton [13], maize [14], rapeseed [15] etc. Among them, taking rice as the research objective, a series of studies have been developed to relate the rice spectra to its final yield. Swain [16] developed a linear regression model of normalized difference vegetation index (NDVI) of UAV images and rice yield with a 0.728 coefficient of determination (R 2 ) and a 0.458 ton/ha root mean square error (RMSE) at panicle initiation stage. Siyal et al. [17] observed that there was a positive and strong relationship with a R 2 of 0.940 between rice crop yield (2006 to 2013) and NDVI calculated from Landsat imagery at the peak of the growing season. Zhou et al. [18] found that the optimal vegetation index NDVI based on multispectral images showed a linear relationship with the grain yield and gained a higher R 2 value (0.750). Indeed, developing a linear function based on VIs is a common and widely used approach to estimate rice yield.
However, the utilization of VIs, derived from individual pixels, is affected by background clutter and irrelevant organs to photosynthesis and production [15,19]. There may be a considerable discrepancy between pixel sizes of remote sensing images and much smaller sizes of crop plants, and pixels of interest are frequently a combination of numerous disparate ground objects [20]. In plant production systems, sensors can capture the entire canopy, grasses and soil, and VIs, calculated from the spectra of such mixed pixels, may encompass some unexpected information of the components not relevant to yield [21,22]. This negative influence is more remarkable for rice at heading stage. Paddy rice is the only crop that needs abundant water in the planting environment [23], and water body, as an extra ground object, still exists at heading stage and constitutes the background together with soil. Moreover, panicles of rice plants gradually distribute in the closed rice canopy at this stage. A range of previous studies have found that the uneven emergence of panicle is associated with poor estimation ability at heading stage [18,24,25]. As noted that spectral mixture analysis (SMA) can effectively provide complementary information up to the subpixel level and mitigate the impact of other ground objects, the need to quantitatively decompose or unmix, has been gradually recognized while establishing a VI-based rice yield estimation model [22,26].
Specifically, the vegetal state of rice in field cannot be neglected from estimating the yield, but there are rarely SMA approaches fitting rice characteristics at heading stage. In the actual production, there is not a uniform rice variety or farming practice as usual. Multi-variety cultivars and different managements indicate the great diversity of growth rate, morphological structure, physicochemical parameter, phenology, etc. in the same period [23,27,28], especially at the late growth stage, as shown in Figure 1. Inevitably, this makes identified differences in the spectra of rice plants in remote sensing images over wide areas. Consequently, the spectral variability of the foreground is enhanced and obvious in remote sensing images of paddy fields. In addition, rice plants inherently have high spectral complexity, owing to the complicated processes of electromagnetic propagation and their parameterization, which will finally bias the application of RS [29]. When rice steps into the heading stage, crisscrossed leaves cover water and soil partially or completely, however, due to the high transmission of rice leaves, mainly in the nearinfrared (NIR) range, multiple reflection and scattering of light rays occur in the scene. Ray and Murray [30] and Prasad et al. [31] have confirmed that even near 100% plant cover, the background influences the observed spectra due to NIR light penetrating the canopy and interacting with the background. The data of sensor incoming light interacting with all the objects within inescapably contain complex information. Therefore, the unique reflectance In view of the above-mentioned problem of paddy rice, we successively considered the two important steps of SMA. In the first step, a mixed pixel was decomposed into a collection of constituent spectra or endmembers, and there was a need to find endmembers which could effectively cope with the spectral variability and represent rice components. Normally, endmember extraction algorithms (EEAs) are employed in this step, which are more adaptive and dynamic than field measurement. Representative EEAs include the pure pixel index (PPI) [32], N-finder algorithm (N-FINDR) [33], vertex component analysis (VCA) [34], etc. Endmember extraction plays an important role in the two-step strategy, but the spectral-based EEAs without considering spectral variability usually result in unmixing errors [35]. To minimize the effect of spectral variability of the same objects, the multiple-endmember extraction strategy has been adopted in several fields, for example, the discrimination of tree species [36] and mineral detection [37]. In the second step, an appropriate unmixing model, developed for multiple reflection and scattering in paddy fields, was demanded to derive a set of corresponding fractions or abundances, that indicate the proportion of each endmember present in the pixel. There are two typical unmixing models, one is the linear mixing model (LMM), and the other is the nonlinear mixing model (NLMM). The LMM is a simple and widely used model, which assumes Remote Sens. 2021, 13, 2190 4 of 25 that the light reaching the pixel interacts with just one material [38]. Based on the LMM, the bilinear mixing model (BMM), one of NLMMs, further considers second interactions between components and is more applicable to multiple-level and hybrid scenes such as vegetated terrain, with low model complexity and scene parameter dependence [21]. A range of novel BMMs have been proposed and applied in SMA, and experiments on synthetic datasets or real spectral images have been conducted to demonstrate the efficiency and advantages of BMMs in some scenes, including forest/grassland ecotone [39], mining site [40], city [41], etc. Current researches have proved the combination of abundance data and VIs can improve the estimation of crop yield, but for rice, there is still room for improving the application effect of SMA [15,22]. So in this study, we integrated a multi-endmember extraction strategy and a BMM to investigate a novel method of rice yield estimation, according to the properties of paddy rice.
The motivation of this research was to assess what a complementary role subpixel information played on the spectral characteristics VIs reflected in yield estimation, according to the characteristics of rice at heading stage. After obtaining multi-band images by an UAV system at heading stage, this study explored to improve a VI-based approach for rice yield estimation by combining with adaptive endmember and abundance information acquired from the multi-endmember extraction strategy and a BMM. The feasibility of the approach was verified in two paddy fields of various settings, which represented the most probable situations of foreground variability in practice.

Study Area
The first study area was located in the Multi-Variety Hybrid Rice Experiment and Research Base of Wuhan University, in the southeast of Lingshui City, Hainan Province, China (18 • 31 47.1 N, 110 • 3 34.9 E), as shown in Figure 2a. Forty two varieties of hybrid rice were planted in the 42 plots and the size of each plot was about 70 m 2 . The varieties were typical cultivars in southern China, such as Luoyou-8, Hongyou-3348, Zhenyou-6, etc. In order to ensure that cultivars were the only variable in this study area, the field management for these plots was similar, including fertilizer supply (12 kg/ha) and planting density (around 20 plants/m 2 ). In our study, the seedlings were transplanted on 5 January 2018, and UAV flights were conducted at booting stage (18 March 2018) and heading stage (1 April 2018) of rice growth. Each UAV flight was arranged to obtain the images of all rice plots between 10:00 and 14:00.
The second study area was located in the Rice Experiment and Research Base of Huazhong Agricultural University, Wuxue City, Hubei Province, China (30 • 6 42 N, 115 • 35 21 E), as shown in Figure 2c. At this paddy field, a single variety hybrid rice, Shenliangyou-584, was cultivated in 24 plots and the size of each plot was about 20 m 2 . The field management for these plots were similar except for the fertilizer supply. Eight levels of nitrogen fertilizer (0, 3, 5.5, 8.5, 11, 14, 16.5, and 19.5 kg/ha) were utilized, and the planting density was about 15 plants/m 2 . At booting stage (13 August 2015) and heading stage (29 August 2015), UAV flights were conducted to collect panoramic images of the study area between 10:00 and 14:00.

Yield Data Collection
At maturity, all the rice plants in each plot were harvested and determined the grain yield manually. After a series of treatments, including cleaning and drying, the seeds were dry enough whose weight did not change. Then, all the concerning data were measured to calculate the observed yield y: where I denotes the impurity content, W and D represent the total weight and plant density in kg, respectively, and N is the number of hills of weighted rice.

Surface Reflectance and Vegetation Index Derived from UAV Data
The multispectral images of the study area in Study Area 1 (Lingshui) were acquired with a Mini-MCA system (Mini-MCA 12, Tetracam, Inc., Chatsworth, CA, USA) on 18 March and 1 April 2018, which was displayed in Figure 3a. The light multispectral camera Mini-MCA consists of an array of twelve independent miniature image sensors. Each camera imager was equipped with a user-configurable bandpass filter centered at a wavelength of 490, 520, 550, 570, 670, 680, 700, 720, 800, 850, 900 or 950 nm, respectively, with a 10 nm bandwidth for the first 10 bands, 20 nm bandwidth for the 900 nm band and 40 nm bandwidth for the 950 nm band. Additionally, the multispectral images of Study Area 2 (Wuxue) were acquired with a Mini-MCA system (Mini-MCA 6, Tetracam, Inc., Chatsworth, CA, USA) on 13 and 19 August 2015, as shown in Figure 3b. The light multispectral camera Mini-MCA consists of an array of six independent miniature image sensors. The bandpass filter was centered at a wavelength of 490, 550, 670, 720, 800 or 900 nm, respectively, and the bandwidth was 10 nm. The images of different bands were co-registered to ensure the objects in the images of twelve or six channels were in a unified coordinate system and corresponded to the same pixel using the PixelWrench2 software (Tetracam Inc., Chatsworth, CA, USA). The Mini-MCA was mounted on a UAV (S1000, SZ DJI Technology, Co., Ltd., Shenzhen, China) which flew between 10:00 and 14:00 when the changes in the solar zenith angle were minimal (Figure 3c). The flight was conducted at 200 m above the ground to capture images with a spatial resolution of 108.33 mm in Study Area 1 (Lingshui), while in Study Area 2 (Wuxue), the pixel size was around 30 mm for images taken at 60 m approximately. Images were taken under stable weather conditions during cloud-free periods.
Radiometric calibration was performed using multiple calibration panels with the relatively constant reflectance in visible to NIR wavelength range, which was 0.03, 0.12, 0.24, 0.36, 0.56 or 0.80, respectively in Study Area 1 (Lingshui) and 0.06, 0.24, 0.48 or 1 in Study Area 2 (Wuxue). The ground targets as a reference for calibration were set near the rice plots prior to the flight for simultaneous imaging with objects. By referring to the panels, an empirical linear correction method was applied to transform image digital numbers (DNs) into surface reflectance ( ). The object reflectance can be calculated as: where λ and DN λ denote the surface reflectance and digital number, respectively, at wavelength λ; Gain λ and Offset λ represent gain and bias coefficients of the camera at wavelength λ, respectively. Referring to the DNs in the UAV imagery and actual reflectance of calibration panels, Gain λ and Offset λ can be calculated applying the least-square method and used in radiometric calibration at wavelength λ.
For each rice plot, we respectively defined a maximum rectangle same in shape and size as the region of interest (ROI) in the UAV image, as shown in Figure 2b,d. Each ROI included the same number of pixels, and the plot-level VI was retrieved by averaging all the per-pixel reflectance values within the rectangle. A total of 12 vegetation indices were selected from the literature as being the most used to characterize the vegetation status, as displayed in Table 1.

The Pure Pixel Index Endmember Extraction Method
In paddy fields, leaves, stems, and panicles constitute the canopy of rice as the foreground of the scene, and the background is mainly comprised of soil and water. Due to differences in the physicochemical property and spatial complexity, the pure components of the foreground and background show up as spectral variability and information complexity, even for the same object. In addition, multiple scattering inevitably occurs in rice fields. The propagation of light in paddy fields was briefly illustrated in Figure 4. Red-edge Chlorophyll Index (CIrededge) ρ800/ρ720 − 1 [42] Green Chlorophyll Index (CIgreen)

The Pure Pixel Index Endmember Extraction Method
In paddy fields, leaves, stems, and panicles constitute the canopy of rice as the foreground of the scene, and the background is mainly comprised of soil and water. Due to differences in the physicochemical property and spatial complexity, the pure components of the foreground and background show up as spectral variability and information complexity, even for the same object. In addition, multiple scattering inevitably occurs in rice fields. The propagation of light in paddy fields was briefly illustrated in Figure 4. In this study, the PPI method was employed to implement the semi-automatic extraction of endmembers in ENVI 5.3 (Exelis Visual Information Solutions Inc., Boulder, Colorado, USA). It assumed that all mixed pixels were located inside a simplex while the endmembers were located at the vertex of the simplex [53]. With the intention of easing computational complexity, masking ( Figure 5a) and dimension reduction were carried out successively. All the pixels in the data space were projected onto randomly generated unit In this study, the PPI method was employed to implement the semi-automatic extraction of endmembers in ENVI 5.3 (Exelis Visual Information Solutions Inc., Boulder, CO, USA). It assumed that all mixed pixels were located inside a simplex while the endmembers were located at the vertex of the simplex [53]. With the intention of easing computational complexity, masking ( Figure 5a) and dimension reduction were carried out successively. All the pixels in the data space were projected onto randomly generated unit vectors, and the endmembers were likely to be projected on both ends of the unit vectors. The number of times a data value resulted as an extremum point when projected onto both ends of the vector was recorded as the purity index. The higher the purity index of a pixel was, the greater probability the pixel had to be an endmember. The high PPI count pixels within the threshold were screened ( Figure 5b) and enumerated in n-Dimensional Visualizer ( Figure  5c). According to the clusters, the sample areas were delineated manually and the average spectrum of the pixels in a sample area was the spectrum of an endmember (Figure 5d).

NU-BGBM Bilinear Spectral Mixture Analysis
In order to estimate the abundances of endmembers extracted in Section 2.4 in an image pixel, we used MATLAB (MATLAB 2016a, MathWorks, Inc., Natick, MA, USA) to derive a novel bilinear unmixing method, the nonlinear unmixing method based on the bandwise generalized bilinear model (NU-BGBM) [54].
The LMM assumes that the reflectance y of one pixel in the imagery is a linear combination of endmembers E with their relative proportions a. On this basis, the BMM takes second-order scattering between different endmembers into consideration, which means adding an additional second-order interaction term to the LMM as follows: where a is the abundance vector, b i,j is the number of nonlinearities between the endmember e i and e j , is the Hadamard product operation, M is the number of endmembers, and n denotes the noise in imagery. The generalized bilinear model (GBM) sets the nonlinear coefficient b i,j = γ i,j a i a j , and the constraints imposed on the GBM can be written as: The real remote sensing image usually has strong signature variability [55], and the abundances may not meet the abundance sum-to-one constraint (ASC) in practice, so Li et al. [54] do not explicitly impose the condition for unmixing in NU-BGBM. Considering the fact that the remote sensing images in the real world are usually degraded by mixed noise [56], the NU-BGBM further subdivides the noise of imagery into the sparse noise S and dense Gaussian noise N, and solves with the alternative direction method of multipliers (ADMM). Mathematically, the NU-BGBM can be expressed as follows: where Y denotes the pixel matrix with a total of P column vectors (the number of pixels in the image), E and F are the endmember matrix and the bilinear endmember matrix respectively, and correspondingly, A and B are the abundance matrix and the bilinear abundance matrix, C (i,j),k = A i,k A j,k (k∈{1, . . . ,P}), and S and N denote the two types of noise matrix.
In addition, to solve the matrix equation, the unmixing model limits of the number of endmembers must be less than or equal to the number of bands.
We applied the NU-BGBM to obtain abundance images of foreground and background, as shown in Figure 6, taking the endmembers extracted by the PPI method as input (Figure 5d). The average value of each ROI in the foreground abundance image (Figure 6a) represented the plot-level foreground abundance.
where a is the abundance vector, bi,j is the number of nonlinearities between the endmember ei and ej,  is the Hadamard product operation, M is the number of endmembers, and n denotes the noise in imagery. The generalized bilinear model (GBM) sets the nonlinear coefficient bi,j = γi,jaiaj, and the constraints imposed on the GBM can be written as: The real remote sensing image usually has strong signature variability [55], and the abundances may not meet the abundance sum-to-one constraint (ASC) in practice, so Li et al. [54] do not explicitly impose the condition for unmixing in NU-BGBM. Considering the fact that the remote sensing images in the real world are usually degraded by mixed noise [56], the NU-BGBM further subdivides the noise of imagery into the sparse noise S and dense Gaussian noise N, and solves with the alternative direction method of multipliers (ADMM). Mathematically, the NU-BGBM can be expressed as follows: where Y denotes the pixel matrix with a total of P column vectors (the number of pixels in the image), E and F are the endmember matrix and the bilinear endmember matrix respectively, and correspondingly, A and B are the abundance matrix and the bilinear abundance matrix, C(i,j),k = Ai,kAj,k(k∈{1,…,P}), and S and N denote the two types of noise matrix.
In addition, to solve the matrix equation, the unmixing model limits of the number of endmembers must be less than or equal to the number of bands.
We applied the NU-BGBM to obtain abundance images of foreground and background, as shown in Figure 6, taking the endmembers extracted by the PPI method as input (Figure 5d). The average value of each ROI in the foreground abundance image (Figure 6a) represented the plot-level foreground abundance.

Data Analysis Between UAV Data and Rice Yield
In this study, the correlations between the image data and the final yield were assessed and analyzed. We employed the Pearson correlation coefficient (r) to statistically analyze correlation and built linear models to compare R 2 and RMSE. Firstly, a normal

Data Analysis between UAV Data and Rice Yield
In this study, the correlations between the image data and the final yield were assessed and analyzed. We employed the Pearson correlation coefficient (r) to statistically analyze correlation and built linear models to compare R 2 and RMSE. Firstly, a normal correlation analysis on the rice yield data and VIs at booting and heading stage was conducted. The difference between these two rice growth stages was discussed and analyzed. For improving the correlation of yield and VIs effectively, the relationship between (1) yield vs. VI, (2) yield vs. VI × A, (3) yield vs. VI E × A, (4) yield vs. (VI E × A + VI F × B) were evaluated successively, where VI E and VI F , respectively represented the VIs which were recalculated from endmember spectra and bilinear endmember spectra (the Hadamard product of endmember reflectance) obtained in Section 2.4 (Figure 5d), with the subscripts E and F specially marked. VI × A is a common approach to combine abundances and VIs [15,22]. In this paper, from the point of view of physics, we considered that the spectral properties of a mixed pixel were determined by each component (VI E ) and corresponding abundance fraction (A). Therefore, the spectral characteristic of the foreground in an image (VI foreground ) could be described by VI Eforeground and A foreground , which respectively denoted the VI E and A of the foreground endmembers. The equation was as follows: From the point of view of mathematics, we regarded the BMM as a simple math calculation equation whose independent variable was e and dependent variable was y. Taking three endmembers as an example, Equation (5) could be rewritten as: y i = a i1 e 1 +a i2 e 2 +a i3 e 3 +b i1 e 1 e 2 +b i2 e 1 e 3 +b i3 e 2 e 3 +s i +n i where y i denoted the ith pixel and e 1 , e 2 , and e 3 were the three endmember spectra within the instantaneous field of view (IFOV). We considered that the equation was also reasonable when the endmembers were replaced by the VIs calculated from it. Thus, VI yi could be calculated as follows: VI y i = a i1 VI e 1 +a i2 VI e 2 +a i3 VI e 3 +b i1 VI e 1 e 2 +b i2 VI e 1 e 3 +b i3 VI e 2 e 3 +s i +n i The equation represented the complex relationship between the spectral properties of mixed pixels and ground objects. In this case, the terms related to foreground were summed up as VI foreground : where VI Fforeground and B foreground , respectively denoted the VI F and B of the foreground endmembers; α = 1, when F foreground was the product of two foreground endmembers; and α = 0.5, when F foreground was the product of one foreground endmember and one background endmember. In order to keep the simplicity of symbols, the A, VI E , and VI F in the text all represented data related to foreground endmembers.

Algorithm Establishment Using Leave One Out Cross-Validation
In view of the small number of experimental samples in this paper, we used a leave one out cross-validation method to establish the final yield estimation model. Leave one out cross-validation is one of the most widely applied cross validation methods in model establishment and validation for the full use of experimental data [57]. It selects one of the N samples as the verification set, and the remaining N-1 samples as the training set. The training and validating process is repeated N times, and the coefficients and accuracy of the final algorithm are produced as: where Coef denotes the coefficients of the algorithm; R 2 and RMSE are the average of coefficients of determination and estimation error (E i ), respectively; N = 42 in Lingshui City and N = 23 in Wuxue City (one of the rice yield data was obviously wrong).

Correlations of Vegetation Index with Yield
In Table 2, the Pearson correlation coefficients of VIs with yield at heading stage were generally lower than that at booting stage in both of the two study areas. Among the tested indices, NDVI and GNDVI had relatively stable and high Pearson correlation coefficients. In Study Area 1 (Lingshui), all r values were less than 0.340 at heading stage, while at booting stage, most r values were more than 0.400. EVI and EVI2 showed an extremely weak correlation, and PRI had a negative correlation with yield. In Study Area 2 (Wuxue), PRI was not calculated owing to the lack of reflectance in 520 and 570 nm, and the Pearson correlation coefficient of VARI with yield at heading stage was abnormal. Except the two, most indices showed strong correlations with rice yield with r exceeding 0.700 at booting stage and below 0.700 at heading stage. On the whole, most VIs showed a weaker correlation with rice yield at heading stage than that at booting stage.

Relationship between Foreground Abundance Data and Rice Yield
As shown in Figure 5c, for the image of rice at heading stage, the high PPI pixels enumerated in the n-Dimensional Visualizer mainly belonged to two clusters-one was elongated and on a large scale, and the other was on the contrary. The larger cluster consisted of pixels with a higher probability of being foreground endmembers. There were significant spectral differences of pixels at both ends of the larger cluster. Thus, to alleviate the adverse effects of the differences, for data from Study Area 1 (Lingshui), we tried to manually delineate this scene component into one to six sample sections, as shown in Figure 7, clustering pixels with a more similar spectrum together. Correspondingly, we output one to six foreground endmembers and one background endmember, and the spectra were displayed in Figure 8.
The six results in Figure 8 were inputted in the NU-BGBM, respectively, and the sum of each foreground abundance fraction was regarded as the foreground abundance of the delineating sample strategy, designated as A 1 , A 2 , A 3 , A 4 , A 5 , and A 6 . We compared the Pearson correlation coefficients of the six abundance values with yield. Moreover, in the processing, a RMSE parameter calculated in NU-BGBM, which denoted the accuracy of unmixing, was also referenced. The two types of parameters were listed in Table 3. In Study Area 1 (Lingshui), at heading stage, while extracting five foreground endmembers, the unmixing accuracy reached the highest (RMSE was below 0.009 kg/m 2 ) and the correlation of A 5 and yield was the strongest (r was 0.474).  For Study Area 2 (Wuxue), we manually delineated the scene component into one to five sample sections, owing to the restrictive condition of the NU-BGBM, and then input the five results into the unmixing model. Table 3 also showed the Pearson correlation coefficients of yield and the foreground abundance and the accuracy of NU-BGBM under the five strategies. While extracting four foreground endmembers, the unmixing accuracy was the highest (RMSE = 0.027 kg/m 2 ) and the correlation of A 4 and yield was the strongest (r = 0.760).

Spectral Mixture Analysis in Rice Field
According to Table 3, we chose the optimal endmember combination and output the final abundance images of both study areas using NU-BGBM, respectively in Figures 9 and 10. There were evident spectral differences between foreground and background, and the input foreground endmembers had various reflectance, especially in Green, Red edge, and NIR range, as shown in Figure 8. Correspondingly, there were significant differences that existed in the abundance images of foreground and background. Each rice plot was bright to varying degrees in the foreground abundance images. Additionally, in the background abundance image, generally, pixels in rice plots had visibly lower values than that in the other region. However, pixels near the ridges in Figure 9a and in some plots in Figure 10e had a non-zero value, which indicated that the model might not unmix with an extremely ideal effect. In addition, the maximum value of background abundance was too large (about 35), while that of foreground abundance was near 1.5.

Rice Yield Estimation Using Vegetation Index and Abundance Data
To take full advantage of the endmember spectral variability and describe the contribution of different foreground components, we calculated VI E × A and (VI E × A + VI F × B) as new indices, further highlighting the foreground information at heading stage. The correlations with yield were compared to VI and VI × A, measured by the Pearson correlation coefficients, as shown in Table 4. During processing, in view of the non-singleness of foreground endmembers, the VI × A was the product sum of the VI and abundances of each foreground endmember, so were VI E × A and (VI E × A + VI F × B). As results, generally, after combined with abundances, twelve VIs produced relatively higher Pearson correlation coefficients than VIs alone. Of particular note was the more obvious improvement while using VI E than directly multiplying A by VI. For the rice yield of Study Area 1 (Lingshui), the (VI E × A + VI F × B) showed the strongest correlation among all four products, and the r value of (NDRE E × A + NDRE F × B) reached the highest (0.558), Remote Sens. 2021, 13, 2190 14 of 25 however, the r values of (VI E × A + VI F × B) and yield of Study Area 2 (Wuxue) were generally lower than that of VI E × A, and the maximum r value was 0.760 (NDVI E × A).   For further analysis, regression analysis had been used between yield and the four VI products and the results were shown in Figure 11. We developed four linear relationships using 42 and 23 samples, respectively and gained R 2 and RMSE. Among the four independent variables, VI E × A showed more satisfactory fitting results in both two study areas, in stark contrast to VI. For all tested indices, using the product of NDRE E , MTCI E , and OSAVI E , and abundances to regress with rice yield was more accurate with higher R 2 values and lower RMSE.
The statistical results in Study Area 1 (Figure 11a) showed that it helped estimate the yield through taking the bilinear term into consideration. In addition, (NDRE E × A + NDRE F × B) had the best goodness of fit with yield (R 2 was 0.312), and the biggest increase was on the R 2 of OSAVI (from 0.067 to 0.300). The optimal RMSE decreased from 0.073 kg/m 2 (VI) to 0.064 kg/m 2 (VI E × A + VI F × B), which represented a reduction of 12%.
For the result in Study Area 2 (Figure 11b), the addition of VI F × B reduced the estimation ability of VI E × A. NDVI E × A, NDRE E × A, and OSAVI E × A had the best fitting result with yield (R 2 was 0.577), and R 2 of VARI improved from 0.149 to 0.446 with a most extent. The RMSE decreased from 0.030 kg/m 2 (VI) to 0.027 kg/m 2 (VI E × A), with a reduction of 10%.
Using the optimal products of the two indices NDRE and OSAVI, the leave one out cross-validation was utilized to build the final rice yield estimation model. For the two study areas, the specific estimation formulas and the goodness of fit between the estimated yield and measured yield were obtained and respectively shown in Figure 12a,b.

Discussion
This study was carried out to improve the estimation ability of VIs for rice yield at heading stage based on remote sensing images. Using the quantity abundance information and qualitative spectral information of each mixed composition obtained from SMA, our results showed the development of the accuracy of rice yield estimated at heading stage.
In this study, there were 42 varieties of hybrid rice cultivated in the 42 plots with a similar field management in Study Area 1 (Lingshui), and in Study Area 2 (Wuxue), there were eight levels of nitrogen fertilizer of one single variety of rice. These two settings represented possible planting situations in daily life and productions, and multiple varieties might generate more significant spectral diversity, affecting the linear grain yield model [58,59]. We obtained the UAV data both at booting stage and heading stage. Results in Table 2 revealed the correlation between 12 representative VIs and the yield was weaker at heading stage than that at booting stage. The cause of this was probably that the emergence of panicle could lead to changes in canopy reflectance, and the VIs were also affected [18]. Reliably, the predictive ability for yield decreased during the heading stage based on UAV imagery data. Consequently, SMA was utilized to improve the predictive ability of VI in rice yield estimation at heading stage.
The spectral variability for rice plants was intuitively presented on one UAV imagery and direct-viewing as a large cluster in the PPI processing. It was noted that extracting only one single endmember spectrum for the scene component resulted in a poor correlation of foreground abundances with yield, as shown in Table 3. Additionally, highly correlated endmembers could cause error in spectral mixture models [60]. Therefore, the multiple endmember extraction based on the PPI approach was tested. We extracted multiple

Discussion
This study was carried out to improve the estimation ability of VIs for rice yield at heading stage based on remote sensing images. Using the quantity abundance information and qualitative spectral information of each mixed composition obtained from SMA, our results showed the development of the accuracy of rice yield estimated at heading stage.
In this study, there were 42 varieties of hybrid rice cultivated in the 42 plots with a similar field management in Study Area 1 (Lingshui), and in Study Area 2 (Wuxue), there were eight levels of nitrogen fertilizer of one single variety of rice. These two settings represented possible planting situations in daily life and productions, and multiple varieties might generate more significant spectral diversity, affecting the linear grain yield model [58,59]. We obtained the UAV data both at booting stage and heading stage. Results in Table 2 revealed the correlation between 12 representative VIs and the yield was weaker at heading stage than that at booting stage. The cause of this was probably that the emergence of panicle could lead to changes in canopy reflectance, and the VIs were also affected [18]. Reliably, the predictive ability for yield decreased during the heading stage based on UAV imagery data. Consequently, SMA was utilized to improve the predictive ability of VI in rice yield estimation at heading stage.
The spectral variability for rice plants was intuitively presented on one UAV imagery and direct-viewing as a large cluster in the PPI processing. It was noted that extracting only one single endmember spectrum for the scene component resulted in a poor correlation of foreground abundances with yield, as shown in Table 3. Additionally, highly correlated endmembers could cause error in spectral mixture models [60]. Therefore, the multiple endmember extraction based on the PPI approach was tested. We extracted multiple foreground endmembers depending on the value of RMSE in NU-BGBM. When the RMSE of unmixing model reached the minimum, the input endmembers and correspondingly output abundance images were selected and applied in our follow-up experiment.
In the process of unmixing, we used a bilinear mixing model NU-BGBM to account for the interaction of second scattering of photons over the multi-level and hybrid scene. Aimed at rice plants, with a complex three-dimensional structure and relatively high transmission [61,62], it was more suitable for unmixing by the BMM than the LMM, especially at heading stage, when the differences of rice characters were enlarged and the canopy closure caused the similar abundance values of foreground. The Pearson correlation coefficients had been also calculated from yield and the VI products derived from the results of the fully constrained least squares (FCLS) linear SMA [63], and were listed in Table 5. By comparison with Table 4, for multi-variety hybrid rice plots in Study Area 1 (Lingshui), significantly, the abundances obtained by the LMM did not play a supplementary role in the relationship between VI and yield, most likely due to the abundance saturation phenomenon under the ASC (the values were closed to 1). In contrast, the NU-BGBM relatively weakened the saturation and obtained higher Pearson correlation coefficients. Similarly, the NU-BGBM performed better for single plantation rice plots in Study Area 2 (Wuxue), while the FCLS also had a slight improvement. To have an integrative consideration, the NU-BGBM had a more profound physical meaning and higher precision, and owned obvious superiority in a multi-variety hybrid scene. In addition to model errors, influence factors on the unmixing effect mainly include observation noise, environmental conditions, and input endmember spectra [38]. At heading stage, rice plants were in the utmost luxuriance and almost entirely covered the paddy field. With the restriction of spatial resolution, pixels near ridges were universally composed of soil and water and might be shaded by weeds and rice leaves, causing the unideal performance of unmixing in this region in Figure 9a. The situation was visibly alleviated in Study Area 2 (Wuxue) with a higher spatial resolution (Figure 10a). Moreover, areal fractions of bare soil could be overestimated in all unmixing models due to the increased radiance of bare soil resulting from side scattering of NIR radiation by adjacent plants [64]. In addition, there were suspected unmixing errors occurring in Figure 10e. To exclude the applicability of NU-BGBM, we output the background abundance image by FCLS and found that the image was similar to Figure 10e. Therefore, the reason for this was that the input endmembers were limited by the purity of pixels in the image and influenced the unmixing results, and the relatively high RMSEs of the NU-BGBM in Study Area 2 (Wuxue) in Table 3 also proved this. Without the limitation of ASC, it was normal for the NU-BGBM to output abundance values larger than 1. The exception values near 35 in background abundance images occurred near ridges. We compared the input background endmember spectrum and the image reflectance near ridges, and found that the spectral shapes of the two were similar but the values of the latter were higher.
Considering the previous analysis, we thought it was the input background endmember that caused the unideal result of the background abundance images. Nevertheless, we paid more attention to the unmixing results of the rice plots in the foreground abundance images, which directly participated in subsequent processing. As shown in Figures 9 and 10, the abundance images of the foreground were relatively reasonable. Different plots and different levels of canopies displayed obvious brightness heterogeneity, respectively. All the foreground components together constituted rice plants and contributed to the final rice spectrum.
Each pixel is composed of the reflectance of foreground and background in the paddy fields. VIs, directly derived from image spectra, always mix background information. In the purpose of eliminating the interference of background and making good use of the spectral difference of several foreground endmembers, VI E and VI F were proposed to combine with corresponding abundance data. VI E represented the spectral characteristics of every endmember, and VI F further reflected the information of bilinear interactions about foreground endmembers on the basis of VI E. In addition, each foreground component fraction, A and B, denoted the contribution of every endmember to the whole pixel reflectance. The product sums of VI E (VI F ) and A (B) of all the foreground endmembers were the final synthetical parameter which depicted the pure spectral features and subpixel spatial information on rice plants, as Equations (6) and (9). Compared with the traditional combination method VI × A, the novel VI E × A and (VI E × A + VI F × B) provided better estimation results of yield apparently in Figure 11. Additionally, the utilization of VI E and VI F enhanced the stability of VI-based models as a safeguard against the abnormal spectrum from UAV, which was a conclusion drawn from VARI of Study Area 2 (Wuxue) at heading stage in Figure 13.  35 in background abundance images occurred near ridges. We compared the input background endmember spectrum and the image reflectance near ridges, and found that the spectral shapes of the two were similar but the values of the latter were higher. Considering the previous analysis, we thought it was the input background endmember that caused the unideal result of the background abundance images. Nevertheless, we paid more attention to the unmixing results of the rice plots in the foreground abundance images, which directly participated in subsequent processing. As shown in Figures 9 and 10, the abundance images of the foreground were relatively reasonable. Different plots and different levels of canopies displayed obvious brightness heterogeneity, respectively. All the foreground components together constituted rice plants and contributed to the final rice spectrum. Each pixel is composed of the reflectance of foreground and background in the paddy fields. VIs, directly derived from image spectra, always mix background information. In the purpose of eliminating the interference of background and making good use of the spectral difference of several foreground endmembers, VIE and VIF were proposed to combine with corresponding abundance data. VIE represented the spectral characteristics of every endmember, and VIF further reflected the information of bilinear interactions about foreground endmembers on the basis of VIE. In addition, each foreground component fraction, A and B, denoted the contribution of every endmember to the whole pixel reflectance. The product sums of VIE (VIF) and A (B) of all the foreground endmembers were the final synthetical parameter which depicted the pure spectral features and subpixel spatial information on rice plants, as Equations (6) and (9). Compared with the traditional combination method VI × A, the novel VIE × A and (VIE × A + VIF × B) provided better estimation results of yield apparently in Figure 11. Additionally, the utilization of VIE and VIF enhanced the stability of VI-based models as a safeguard against the abnormal spectrum from UAV, which was a conclusion drawn from VARI of Study Area 2 (Wuxue) at heading stage in Figure 13.  In the visible spectrum (400-700 nm), leaf reflectance is mainly affected by photosynthetic pigments, and in the NIR domain (700-1300 nm), the magnitude of reflectance is governed by the cell arrangement mode and vegetation structure [65]. Plants differently managed in terms of fertilization will change chlorophyll contents and reflectances more in red and green wavelengths [66], while the diversity of spatial structure and morphological character among rice cultivars is relatively more significant, as shown in Figure 1. Therefore, the spectral difference of canopy was mainly in the visible range for Study Area 2 (Wuxue), and in the NIR range for Study Area 1 (Lingshui), with the UAV spectral data as a reference. Primarily, second-scattering occurs in the NIR, with high transmittance of leaves, while nonlinear influences are minimal in the visible region [31,64]. The BMM, further taking second-order scattering into consideration, has the probability of achieving better results in multi-variety paddy plots. Additionally, the greater difference of spectral signature among rice varieties than that among rice at different fertile levels leads to a more significant spectral variability. Thus, the effects of our method, mainly based on the multi-endmember strategy and the BMM, varied in different experimental settings. The NU-BGBM further considered the spatial structure of rice plants and then gained the additional bilinear endmember vegetation index matrix VI F and abundance matrix B. What function of VI F × B had depended on the hierarchy and stereospecificity in the scene? It could be observed from Figure 11a that the addition of VI F × B helped estimate the rice yield in the complex space environment. In addition, for rice plots with a small structural difference in Study Area 2 (Figure 11b), the consideration of secondary reflections played an opposite role in the correlation of VI E × A and yield, inducing the model distortion. Therefore, it was considered that the more obvious the heterogeneity of rice plants was, which meant the more significant structural difference of plants was, the better effectiveness the NU-BGBM had.
The vast majority of VIs performed better goodness of fit in the relationship of the final VI product and yield. The obvious improvement could verify the effectiveness of the proposed method, with the increase of around 0.2 of R 2 in the two study areas. Among them, the performance of VIs using spectral bands in the NIR enhanced most significantly. The products of NDRE and OSAVI produced stable and optimal estimation both in the two study areas, and then the one out cross-validation approach was utilized, as shown in Figure 12a,b. The combination of NDRE E (OSAVI E ) and abundances could more accurately estimate the rice yield at heading stage, increasing the R 2 values to 0.3 from 0.1 in Study Area 1 and to 0.6 from 0.4 in Study Area 2. For the same rice plots in Study Area 2 (Wuxue), Duan et al. [22] used the product of VI and abundances calculated from field measured spectra to estimate the yield for rice at heading stage, and the R 2 also reached 0.6.
For in-season rice grain assessment, the booting stage might be the optimal time using remote sensing data [18,61,67], and Table 2 also confirmed the opinion. To validate the effectiveness of our method, we processed and analyzed data at booting stage and gained the results. The optimal R 2 of Study Area 1 (Lingshui) and Study Area 2 (Wuxue), respectively were 0.292 (NDRE E × A + NDRE F × B) and 0.558 (MTCI E × A), and the RMSEs of the two areas did not decline evidently (about 2%), which indicated that the method still improved the estimation ability of VIs at booting stage but was less effective than that at heading stage. With the insignificant difference of rice plants, the slight enhancement at this stage further confirmed the pertinence of our method, which aimed at the dynamic growth situation of various rice organs and the spectral variability and structural complexity of rice canopy. Apparently, after our processing of UAV images, the accuracy of yield estimation at the two stages was at the same level, and the discrimination of heading stage of paddy rice was easier on account of the relatively significant morphological features.
To be concluded, we extracted multiple foreground endmembers by PPI and then input endmember spectra in the NU-BGBM, directed against the spectral variability and complicated spatial structures of rice plants. In the process, the number of output foreground endmember was dependent on the minimal RMSE of the NU-BGBM. VIs were recalculated from endmember spectra, denoted by VI E , and multiplied with corresponding abundances (VI E × A) to further highlight the foreground. Similarly, VI F × B in (VI E × A + VI F × B) was calculated from bilinear endmember and abundances, and was selectively added when rice plants had a sophisticated spatial structure. The products finally used at heading stage could achieve a result for yield estimation as good as or better than that at booting stage. Among all the tested VI E , the products of NDRE and OSAVI had the highest R 2 and the most consistent performances in the two tests. The whole processing was based on one UAV imagery acquired at heading stage, adaptively and effectively.

Conclusions
In this study, we developed a full UAV-based approach to improve the estimation of rice yield at heading stage using the vegetation index and abundance of multi-endmembers, with consideration of second-order scattering in paddy plots. Many factors, such as rice varieties, growth status, etc., result in the spectral variability. In addition, the potential of VIs for remote estimating yield is interfered obviously by the existence of background and irrelevant organs, and weakened more for rice with characteristics of strong transmission and complicated structure. Thus, the PPI was calculated to extract multiple endmembers and the NU-BGBM was applied to acquire the abundances of the foreground which were more approximate to the true value. In order to distinguish and amplify the spectral difference, we proposed a novel parameter VI E , which indicated VIs recalculated from an endmember spectrum, and then multiplied VI E by the corresponding abundance A. The simple linear function was established by the sum of VI E × A of every foreground spectrum and yield. Moreover, aimed at several cultivars, (VI E × A + VI F × B) would perform better in yield estimation in the complicated scene, where VI F and B denoted VIs recalculated from a bilinear endmember spectrum and the corresponding bilinear abundances. The integration of plot-level VI E (VI F ) and abundance information could estimate rice yield more accurately than using VI alone or VI × A at heading stage. Moreover, the final estimating accuracy was as reliable as that at booting stage. Among all the test VIs, NDRE E and OSAVI E combined with abundances were the most accurate for yield estimation of rice under multiple varieties scene or different nitrogen fertilizer treatments with estimation errors below 8.1%. The strategy of SMA, using multiple endmembers and the BMM, can improve the accuracy of VI-based rice yield estimation models especially when fields are under non-homogenous management, and provide a reference for the development of precision agriculture. In the follow-up study, the automation of the algorithms is worth considering and the potential of the SMA strategy proposed in this paper for improving the accuracy of the multi-stage rice yield estimation model is worth exploring.