Estimation of Canopy Biomass Components in Paddy Rice from Combined Optical and SAR Data Using Multi-Target Gaussian Regressor Stacking

: Crop biomass is a critical variable to make sound decisions about ﬁeld crop monitoring activities (fertilizers and irrigation) and for estimating rice crop biomass. The presented workﬂow will improve the estimation of crops biomass components from satellite data for e ﬀ ective crop growth monitoring.


Introduction
Rice is an important global food crop, and its biophysical parameters need to be collected on time to obtain timely information on crop health and yield prediction [1][2][3][4][5][6]. Accurate estimation of biophysical parameters is important as they are the key inputs for photosynthesis processes and the primary indicators of crop health [7]. Among the biophysical parameters, aboveground biomass increases continuously with plant production, a key indicator of growth changes in crop canopies, and it is also the best predictor of future crop yield [8]. Crop biomass can be accurately and reliably calculated to improve crop fertilizer applications, disease, and weed management [9][10][11].
Traditionally, crop biomass is measured using disruptive, time-consuming in situ methods. However, these methods are not only labor intensive, but they are also prone to errors and conducted within small regions. More recent estimations are based on remotely sensed data, such as from optical vegetation indices (VIs), synthetic aperture radar (SAR) imagery, and light detection and ranging (LiDAR) point cloud data [12][13][14]. Spectroscopic and terrestrial laser scanning data have also been used for the estimation of biomass components [15,16]. These ground-based techniques are not effective for monitoring large-scale biomass components in crops. For the estimation of crop biomass, optical imagery data have been widely used [17][18][19]. The estimation purpose has been achieved by means of regression models, which are constructed based on direct relationships between aboveground biomass and the spectral reflectance values [14]. However, a more accurate estimation of crop biomass from satellite imagery remains challenging due to spectral saturation under high biomass conditions [14]. The problem of spectral saturation is a challenge for both data types derived from SAR and optical remote sensing platforms [14,20].
SAR-based vegetation indices have been constructed from full-polarization radar imagery for the estimation of crop growth parameters [21]. Some recent studies also focused on the development of vegetation indices from dual-polarization SAR imagery for estimating biophysical parameters such as soil moisture content [22], crop water content [23], and crop yield [24]. However, little research has been devoted to the development of hybrid indices by fusing optical and SAR indices. The combination of optical and SAR image data have been used to estimate leaf area index (LAI), crop biomass, and forest biomass with good agreement between estimated values and the actual values [25]. The best performing indices from Landsat-8 imagery were combined with LiDAR matrices for improving forest aboveground biomass estimation [26]. HJ optical indices have been also combined with Radarsat-2 derived indices for better estimation of LAI and aboveground biomass [21]. More recently, the combination of Sentinel-1 and Sentinel-2 imagery data were used for biomass estimation with good performance [27][28][29]. Since the optical and SAR image data respond differently to crop characteristics, their complementary information can help better estimate crop conditions [21,30]. Nevertheless, studies on integrating Sentinel-1 and Sentinel-2 to estimate biomass components have not yet been conducted. The complementary information from optical and SAR image data has to be investigated using different machine learning techniques and combination scenarios. A vast amount of studies have been conducted on the estimation of total biomass using data derived from SAR and optical imagery [8,14,20,[30][31][32][33][34]. However, limited studies focused on the estimation of biomass by component except the attempts done by Kross et al. [14] using RapidEye imagery and Inoue et al. [20] from C-band SAR imagery data [20]. Biomass estimations by component (leaf, stem, stem and leaf, and total biomass) play an essential role in crop productivity forecasts, as crop yields result from the accumulation and transport of substances between various organs [35][36][37][38][39].
Estimation of crop biomass has been achieved from SAR imagery by using different canopy descriptors combined with backscatter coefficients or using different physical models [40,41]. Among the estimation methods, parametric water cloud models and a more sophisticated Michigan Microwave canopy scattering (MIMICS) model have been widely used in many other studies [42][43][44][45][46]. However, the applications of such models were limited due to the intensive input requirements, high complexities of inversion, and inability to resolve all biomass components. Recent studies also applied non-parametric machine learning algorithms, such as support vector machines (SVM), artificial neural networks (ANN), random forests (RF), and k-nearest neighbor (KNN) to estimate crop biomass [47]. Non-parametric regression algorithms have drawbacks on balancing trade-off between fitting the data and smoothing, requiring a large amount of data for training. The hyperparameters of machine learning regression algorithms are hard to tune, and they are also computationally intensive [48]. Gaussian-based regression algorithms present several advantages over existing physical, empirical, semi-empirical, and other non-parametric machine learning algorithms. As non-parametric methods, the multi-target Gaussian process regressor stacking (MGPRS) model can be adjusted to the required level of linear, quadratic, or polynomial degrees as those relationships exhibited in the given data [49]. The conventional multi-target Gaussian process regression (GPR) model builds the spatial covariance matrix over the features data, then the kernel hyperparameters are learned by maximizing the likelihood of observations. However, in the MGPRS the covariance matrix is extended to cover the correlation between both features and targets. It involves a larger number of hyperparameters to be optimized.
Various machine learning techniques have been used for the estimation of aboveground biomass of field crops and particularly in rice [50]. The estimation methods are selected based on the trade-off between the performance in terms of a given biophysical parameters, interpretability of results, and computational time. The most extensively used methods for rice aboveground biomass estimation include multiple linear regression, support vector machine, and random forest [47,50]. Previous studies have shown that GPR outperforms support vector regression, kernel ridge regression, and neural networks for vegetation parameter estimation [51].
Gaussian process-based regression methods have been used for the estimation of vegetation parameters [48,52]. GPR can tackle the problem of overfitting or underfitting as they do not have a finite number of parameters and the complexities of their models are not fixed [52]. MGPRS is particularly useful when the amount of in situ data is limited. The MGPRS model is capable of automatically discovering the quality (noise and uncertainty) of each dataset and includes this information in the regression model to balance the trustworthiness. MGPRS exploits the correlation between all the targets and outputs and provides a robust framework for incorporating substantial knowledge in the Gaussian process regression. The proposed method MGPRS has been used for multivariate system reliability analysis and has yielded convincing results [53]. This study proposes to combine dual-polarimetric Sentinel-1 and Sentinel-2 data with MGPRS algorithm for the estimation of crop biomass components. MGPRS can be scaled using limited amount data and does not require a large size of data for training. To the best of our knowledge, MGPRS has not been used in agricultural remote sensing applications so far.
Some studies have demonstrated the utility of polarimetric SAR data with different polarimetric decomposition techniques for the estimation of vegetation parameters [21]. From Freeman-Durden decomposition, the pixels were classified into three scattering components: volume scattering, double bounce scattering, single bounce or surface scattering. Among these components, volume scattering was highly correlated with LAI and aboveground biomass, since LAI and biomass were highly affected by volume changes [21]. However, polarimetric decomposition is impossible for dual-polarization SAR missions such as Sentinel-1A. By exploiting different interaction mechanisms of SAR signals with the target, new indices can be constructed and tested for the estimation of biomass components using dual-polarization SAR imagery. The aim of this study was to construct and test new SAR indices, which were used independently and fused with selected optical indices for the estimation of rice crop biomass components.
Thus, the objectives of this study were 1) to evaluate the potential of the new combined SAR and optical indices for the estimation of crop biomass components, 2) to determine if the hybrid SAR and optical indices will improve the estimation of biomass components, and 3) to validate the estimation performance of MGPRS against ground measurements and compare it with the widely used single-target Gaussian regression (SGPR) method.

Study Area and Ground Data
The study was conducted in Xinghua County, Jiangsu province (114 • 38 00"-122 • 00 00"E), (30 • 00 00"-35 • 30 00"N), China, as shown in Figure 1. The study area is characterized by hot and rainy summers, cold and rainy winters, and dry springs. Rice and wheat are the major cereal crops grown in the study area. Rice cultivation begins around the end of May to the end of October. The ground-based rice crop biophysical parameters were collected during the field campaign of 2018 to 2019. Over 60 points in 2018 and over 120 points in 2019 were investigated in detail. At each sampling field, over two transects were selected for vegetation sampling, which involves a collection of plant height, LAI, and dry biomass components through manual methods.

Preprocessing of Sentinel-1 and 2 Data
The Sentinel-1A data from European radar C-band imaging system were acquired from June to September of 2019. The images were collected by single look up complex (SLC) in interferometry width (IW) mode with a 250-km swath width. After range Doppler terrain correction, a 10-m square pixel size was achieved. Sentinel-1A imagery has VV and VH polarizations and a 12-day repeat cycle. The Sentinel-2 multispectral instrument (MSI) collects imagery in the spectral range of 0.4 to 2.4 µm Ground biophysical parameter measurements were conducted over the reference plots selected over Xinghua county ( Figure 1). Field measurements were conducted three times over the full growth cycle, in parallel with Sentinel-1/2 acquisition dates over the reference plots. The rice growth stages for each specific field trip and the image acquisition dates are indicated in Table 1. Sampling points were established over Xinghua county. The geographic coordinates of each plot were traced using a Trimble Geoxh 6000 receiver. Two independent measurements were conducted over each sampling point by taking two reference readings underneath the canopy and two readings over the canopy. Sample plants were taken by a destructive way of measurements over 0.5 × 0.5 m 2 sample plots for collecting aboveground biomass. The crop samples were collected by component and each was dried separately in the oven. The collected sample plants were dried at 70 • C using an oven with a mounted digital temperature controller. After we dried the samples in the oven for 24 h, a constant weight was achieved for each crop sample. The dried crop samples were weighted to obtain the biomass per component. The final collected biomass was converted to kg/m 2 .

Preprocessing of Sentinel-1 and 2 Data
The Sentinel-1A data from European radar C-band imaging system were acquired from June to September of 2019. The images were collected by single look up complex (SLC) in interferometry width (IW) mode with a 250-km swath width. After range Doppler terrain correction, a 10-m square pixel size was achieved. Sentinel-1A imagery has VV and VH polarizations and a 12-day repeat cycle. The Sentinel-2 multispectral instrument (MSI) collects imagery in the spectral range of 0.4 to 2.4 µm with 13 bands. Sentinel-2 imagery has spatial resolutions of 10 m (four bands), 20 m (six bands), and 60 m (three bands). All Sentinel images were acquired on similar dates and downloaded from the Copernicus Open Access Hub (https://scihub.copernicus/dhus/#/home).
Orbit file application, geometric correction, thermal noise removal, multi-look operation, speckle filtering, radiation calibration, speckle noise removal, geocoding, and stacking were applied to each of the downloaded raw SAR images. Preprocessing of each image was conducted using the ESA's SNAP software (https://step.esa.int/downloads/). Sentinel-2 images were acquired in the form of fixed cartographic geometry at L1C with a projected coordinate of WGS84 with UTM projection. The main preprocessing procedures done on Sentinel-2 raw data included atmospheric correction, radiometric calibration, and thermal noise removal. After preprocessing, RGB composites were created to increase separation among the land cover types for image interpretation. The RGB composite with field measured GPS points were used to extract rice fields at the region of interest (ROI) points. The extraction of Rice fields for the whole study area was done using our field visit in combination with the RGB composite from Sentinel-1 and Sentinel-2 imagery and one-class support vector classification (OCSVC) [54].

Gaussian Process Regression
The Gaussian process (GP) is a collection of random variables, of which any finite number has a joint distribution. The mean vector of this joint distribution is generally assumed to be a zero vector, and the covariance matrix is obtained using a covariance function defined over a pair of input values. While Gaussian process regression (GPR) is usually used with an index set of time, the index set for the GPR used in this study is the possible input values of the training sets. Although it is continuous, sampling a function from the GP is generally done by selecting a set of input points. Theoretically, a function can be represented as a vector of finite size. However, as we only have to make predictions for finitely many points, we can simply draw outputs for these points by using a multivariate normal distribution with a covariance matrix generated by a kernel. GPR techniques have been used for biophysical parameter estimations from optical imagery [55]. The general introduction of GPR used here can be found [56,57]. In the following subsections, we briefly describe the standard GP and multivariate GP formulations adopted for this study.

Single-Target Gaussian Process Regression
Gaussian process regression models are non-parametric kernel-based probabilistic models. A single-target Gaussian process (SGPR) attempts to approximate the target output f (x) where X ∈ R d by interpreting it as a probability distribution function. It addresses the question of predicting the values of response variables using the training set drawn from the unknown distribution and given the new input vector. Suppose that we have a set of training points X = (x 1 . . . . . . , x n ) T and the associated output observation Y = (y(x 1 ) . . . . . . , y(x n )) T . Since GP is a stochastic process where any finite subset of random variables follows a joint Gaussian distribution, the prior joint distribution of the observations y together with f (x * ) at a test point x * [57].
The prediction meanf (x * ) and prediction variance σ 2 (x * ) are respectively given as: where K * = K X, x * ∈ R nx1 denotes the covariance between the training points and the test point x * . The prediction variance of y(x * ) is given by To usef (x * ) and σ 2 (x * ) for prediction, we need to infer the hyperparameters θ in the covariance function K by minimizing the negative long marginal likelihood (NLML) as This approach is to fit these parameters by maximizing the long marginal likelihood of the training data.

Multi-Target Gaussian Process Regressor Stacking
The MGPRS method is inspired by using stacked generalization to deal with multi-label regression and classifications [58]. In this model, a single Gaussian process was learned as a single target. Instead of directly using this model for predictions, MGPRS includes an additional training stage where a second set of d target variables are learned, with one for each target Y i (i (1, . . . , d)).
A transformed training set can be learned on each meta-model ).

Model Setup for the Estimation of Biomass Components with MGPRS
For each stack of Sentinel-1A and Sentinel-2 images, different combinations of vegetation indexes were calculated. In supervised learning, it is expected that the points with a similar predictor x i naturally have close response (target) values y i . In Gaussian processes, the covariance function expresses this similarity. The covariance between two latent variables is specified by f (x i ) and f (x j ) where both x i and x j are d-by-1 vectors. Otherwise speaking, it determines how the response at each point x i is affected by responses at the other points x j (i j, i = 1, 2, . . . . . . . . . , n).
The SAR and optical vegetation indices of Sentinel platforms are used to predict vegetation parameters with the MGPRS technique. As stated in the previous studies, the vegetation parameter obeys Gaussian distributions, with the mean and variance given by the following [59].
where k * is a vector used for checking the similarities between the test and training samples, and k * * is the prior covariance. In Equations (8) and (9), k is the kernel function that evaluates the similarities between the test vegetation index and all the training vegetation index values. The automatic relevance determination vector kernel was calculated by: where v is a scaling factor, and B is the dimension of the input (in this study, B = 6). σ b controls the spread of the relations for each dimension of the input, and high values of σ b mean lower informative content of the corresponding dimension for the regression and vice versa. σ n is the noise standard deviation and σ ij is Kronecker's symbol.
We implement a single output Gaussian process regression method using a MATLAB function, which was accessed online https://uk.mathworks.com/help/stats/fitrgp.html. The Gaussian Multi-Target Regressor stacking was implanted using MATLAB functions developed based on [58], and which is also freely available online https://github.com/mksadoughi/Multi-output-Gaussian-Process.

Selection of Active (SAR) and Passive (Optical) Vegetation Descriptors
The volume scan resulting from polarimetric decomposition of full polarized RADARSAT-2 imagery was highly correlated with LAI and aboveground biomass [21]. However, the Sentinel-1A SAR imagery used here is not full-polarimetric, and decomposition of polarimetry cannot be exploited. A new technique is introduced here to boost the dual-polarimetric SAR indices as of volume scattering and to be used for the estimation of biomass components. During the construction of the SAR polarization index, the average backscatter coefficient of each region of interest (ROI) was calculated. According to the previous studies, the highest backscatter values over rice fields were observed at the vegetative stage, while the lowest backscatter values were observed at the flooding stage [60].
To find representative vegetative indices over the main growth stages, we applied the optimal temporal backscatter change threshold (σ T • ).  Table 2 and vegetation parameters were calculated. Then, the best performing vegetation indices with high correlation coefficients were selected.
While optical imagery can provide vegetation spectral information, SAR imagery can provide vegetation structure information. Thus, the synergistic use of SAR and optical remote sensing data is highly valuable for the estimation of vegetation parameters. After choosing the best performing SAR and optical vegetation indices, we established the SAR and optical multiplication vegetation index (SOMVI) and SAR and optical difference vegetation index (SODVI) as follows: where SAR_VI is the best performing SAR index and Optical_VI the best performing optical index.

Statistical Metrics for Evaluating Model Performance
As described in Section 2.1, the field surveys for collecting vegetation parameters of sample points were done three times per year for 2018 and 2019 in Xinghua County (Table 1). A total of 120 samples in 2018 were chosen for training, and 120 samples in 2019 were for validation. We conducted all statistical analysis in the software MATLAB R2018b (Matrix Laboratory, USA). The results of the selected machine learning methods (SGPR and MGPRS) for the retrieval of LAI and biomass were compared with ground truth data through the root mean square error (RMSE) and correlation coefficient (r 2 ), which were calculated as where y i and y i are the ground truth measurements and estimated variables, respectively, for point i.
x and y are the arithmetic means of ground truth measurements and estimated variables, respectively, and n is the number of sample points. A graphical representation of model development, training and validation is displayed in Figure 2.

Correlations of Optical and SAR Indices with Biomass Components
SAR and optical vegetation indices were selected and used for correlation analysis with ground collected biomass components. RDVI1 and SND VH_VV performed best among all the indices. The best performing optical index, RDVI1, resulted in r 2 = 0.52 for leaf biomass, r 2 = 0.42 for stem biomass, r 2 = 0.39 for stem and leaf biomass, and r 2 = 0.46 for aboveground biomass. The enhanced radar index (SND VH_VV ) resulted in r 2 = 0.32 for leaf biomass, r 2 = 0.37 for stem biomass, r 2 = 0.47 for stem and leaf biomass, and r 2 = 0.41 for aboveground biomass. A relatively higher correlation was observed from optical indices with r 2 = 0.52 for leaf biomass than the highest from SAR indices SND VH_VV with r 2 = 0.47 for stem and leaf biomass. After selecting the best performing vegetation indices from each sensor, different (SAR and optical) combinations were created. Among these combinations, SOMVI resulted in an r 2 = 0.55 for leaf biomass, r 2 = 0.44 for stem biomass, r 2 = 0.52 for stem and leaf biomass, and r 2 = 0.47 for aboveground biomass. SODVI resulted in an r 2 = 0.58 for leaf biomass, r 2 = 0.46 for stem biomass, r 2 = 0.49 for stem and leaf biomass, and r 2 = 0.52 for aboveground biomass. Overall, the combined vegetation indices (SOMVI and SODVI) from the two sensors performed better than the indices used independently from each sensor ( Table 3).

SGPR and MGPRS Predictions Using SAR or Optical Indices
In this section, the estimation of biomass components was done using selected optical and SAR indices as input for the single and multi-target Gaussian regression models. The highest r 2 and lowest RMSE was found using enhanced SAR index (SND VH_VV ) with r 2 = 0.60 and RMSE = 0.22 kg/m 2 for leaf biomass, r 2 = 0.56 and RMSE = 0.34 kg/m 2 for stem biomass, r 2 = 0.59 and RMSE = 0.14 kg/m 2 for stem and leaf biomass, and r 2 = 0.6 and RMSE = 0.24 kg/m 2 for AGB. From the selected optical indices, better statistical results were achieved using RDVI1 with r 2 = 0.52 and RMSE = 0.60 kg/m 2 for leaf biomass, r 2 = 0.62 and RMSE = 0.51 kg/m 2 for stem biomass, r 2 = 0.54 and RMSE = 0.55 kg/m 2 for stem and leaf biomass, and r 2 = 45 and RMSE = 1.08 kg/m 2 for AGB ( Figure 3). Both RDVI1 and SND VH_VV were used as input for the estimation of biomass components using MGPRS regression (Figure 4). Figure 4 shows that high accuracies were obtained from MGPRS predictions than from SGPR predictions. Using RDVI1 and SND VH_VV as input for MGPRS, relatively better performance was obtained for leaf biomass with r 2 = 0.70 and RMSE = 0.14 kg/m 2 than other biomass components (stem, stem and leaf, and AGB).

SGPR Prediction Using Hybrid Indices
In this section, selected hybrid vegetation indices (SOMVI and SODVI) were used as input for SGPR ( Figure 5). The best performance was found by using SODVI as an input variable ( = 0. 76 for AGB. The differences in and RMSE between SOMVI and SODVI variables used as input for SGPR were 0.02 and 0.03 / for stem biomass, 0.08 and 0.54 / for stem and leaf biomass, respectively. The difference in r 2 using the two variables were relatively less for leaf biomass than that for leaf and stem biomass (0.05 vs. 0.1).

Figure 5. Comparisons between measured and estimated biomass components from single-target
Gaussian process regression (SGPR) for (A) leaf biomass using SOMVI, (B) stem biomass using SOMVI, (C) leaf biomass using SODVI, (D) stem biomass using SODVI, (E) leaf and stem biomass using SOMVI, (F) AGB using SOMVI, (G) stem and Leaf biomass using SODVI, and (H) AGB using SODVI. The green line represents a linear regression between the observed and estimated values, and the dashed line represents the 1:1 line.

MGPRS Prediction Using Hybrid Indices
The results of cross-validation using SOMVI and SODVI as input for MGPRS are shown in Figure 6. In overall comparison, higher statistical results were achieved by using SOMVI and SODVI as input for MGPRS than using SOMVI and SODVI as input for SGPR for the estimation of biomass Figure 5. Comparisons between measured and estimated biomass components from single-target Gaussian process regression (SGPR) for (A) leaf biomass using SOMVI, (B) stem biomass using SOMVI, (C) leaf biomass using SODVI, (D) stem biomass using SODVI, (E) leaf and stem biomass using SOMVI, (F) AGB using SOMVI, (G) stem and Leaf biomass using SODVI, and (H) AGB using SODVI. The green line represents a linear regression between the observed and estimated values, and the dashed line represents the 1:1 line.
The difference in r 2 and RMSE between SOMVI and SODVI variables used as input for MGPRS model were 0.10 and 0.25 kg/m 2 for leaf biomass, 0.13 and 0.72 kg/m 2 for stem biomass, 0.12 and 0.14 kg/m 2 for stem and leaf biomass, and 0.16 and 0.4 kg/m 2 for AGB, respectively. The difference in r 2 and RMSE between MGPRS and SGPR by using the best performing hybrid vegetation index (SODVI) as input were 0.03 and 0.12 kg/m 2 for leaf biomass, 0.21 and 0.20 kg/m 2 for stem biomass, 0.10 and 0.13 kg/m 2 for stem and leaf biomass, and 0.16 and 0.32 kg/m 2 for AGB, respectively. The highest difference in statistical matrices of measured and predicted variables using the two predictive models observed for AGB with r 2 = 0.17 and RMSE = 0.23 kg/m 2 .
Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 24 Figure 6. Comparisons between measured and estimated biomass components using multi-target Gaussian process regressor stacking (MGPRS) for (A) leaf biomass using SOMVI, (B) stem biomass using SOMVI, (C) leaf biomass using SODVI, (D) stem biomass using SODVI, (E) stem and leaf biomass using SOMVI, (F) AGB using SOMVI, (G) leaf and stem using SODVI, and (H) AGB using SODVI. The green line represents a linear regression between the observed and estimated values and the dashed line represents the 1:1 line. Figure 6. Comparisons between measured and estimated biomass components using multi-target Gaussian process regressor stacking (MGPRS) for (A) leaf biomass using SOMVI, (B) stem biomass using SOMVI, (C) leaf biomass using SODVI, (D) stem biomass using SODVI, (E) stem and leaf biomass using SOMVI, (F) AGB using SOMVI, (G) leaf and stem using SODVI, and (H) AGB using SODVI.
The green line represents a linear regression between the observed and estimated values and the dashed line represents the 1:1 line. As shown in Figures 5 and 6, MGPRS showed higher accuracies than SGPRS. The validated MGPRS model was used to map the total biomass predicted using the field measured variables and SODVI as input (Figure 7). The validated values for the entire study site vary from 0.28 to 3.50 kg/m 2 , with a mean value of 2.12 kg/m 2 . The map in Figure 7 displays the spatial variability in AGB between rice fields. Remote Sens. 2020, 12, x FOR PEER REVIEW 17 of 24 Figure 7. Spatial distribution of AGB estimated by MGPRS using SODVI as input. This map was created for August 10 using blended high-resolution Sentinel images. On this date, the growth stages of rice fields were generally between the beginning and end of panicle emergence.

Interpretation of the Used Vegetation Indices
In this study, different types of SAR and optical indices were identified and regressed with ground sampled biomass components. Among the selected optical and SAR indices, RDVI1 and SND _ showed high correlations with biomass components. RDVI1, RDVI3, and EVI were highly correlated with biomass components than other selected optical indices ( Table 3). The better estimation accuracy of these vegetation indices was mainly due to the visible and NIR bands which are highly sensitive to canopy biomass components. The effects of PAI and aboveground biomass on the visible and NIR bands were thoroughly investigated and reported by previous studies. In addition to the red and NIR bands, the EVI includes the blue band, which can be used for the Figure 7. Spatial distribution of AGB estimated by MGPRS using SODVI as input. This map was created for August 10 using blended high-resolution Sentinel images. On this date, the growth stages of rice fields were generally between the beginning and end of panicle emergence.

Interpretation of the Used Vegetation Indices
In this study, different types of SAR and optical indices were identified and regressed with ground sampled biomass components. Among the selected optical and SAR indices, RDVI1 and SND VH_VV showed high correlations with biomass components. RDVI1, RDVI3, and EVI were highly correlated with biomass components than other selected optical indices ( Table 3). The better estimation accuracy of these vegetation indices was mainly due to the visible and NIR bands which are highly sensitive to canopy biomass components. The effects of PAI and aboveground biomass on the visible and NIR bands were thoroughly investigated and reported by previous studies. In addition to the red and NIR bands, the EVI includes the blue band, which can be used for the reduction of atmospheric and soil background noise. Among the best performing optical indices, RDVI1 performs relatively better than EVI, which is mainly due to the red band reflectance expressed in RDVI1. The red band in RDVI1 is more important for biomass estimation than the blue and NIR indices expressed in EVI. Our results confirmed previous findings and showed that the red and NIR bands are more sensitive to crop biomass components than the blue and SWIR bands. Jin et al. [21] reported that, the volume span of polarimetric variables were highly correlated with aboveground biomass, as volume change is highly influenced by ground biophysical parameters [69]. The used image here is Sentinel-1A dual polarimetric, where the volume span cannot be derived with polarimetric decomposition. The volume scattering can be enhanced from the change in the magnitude of collected signals at different dates and growth stages of crop. The magnitude of volume scattering is high at the booting stage due to the volume diffusion, while at the flooding or transplanting stage, surface scattering was dominant. The wide range of radar signals across growth stages can be explained by the multiple scattering double bounce of radar waves with rice stems (booting stage). At the flooding or transplanting stage, the physical structure of crop canopies was not different from other surface materials. This resulted in less proportion of volume diffusion. In this study, enhanced radar polarization indices were constructed from the difference of backscatter values between these two growth stages. Among the constructed radar polarization indices, SND VH_VV and σ T_VH • σ T_VV • resulted in high correlation with biomass components. The VH backscatter travels farther on the ground than VV. On the other hand, VV has higher wave attenuation than VH. Our finding showed that ratio-based enhanced radar vegetation indices (SND VH_VV and σ T_VH • σ T_VV • ) outperformed other selected radar vegetation indices. This result is in parallel with previous findings [21].
The combined optical and SAR indices were developed by integrating the best performing SAR and optical indices. The hybrid vegetation indices, SOMVI and SODVI, were strongly correlated to biomass parameters than each SAR and optical indices used independently ( Table 2). The better results obtained from the hybrid vegetation indices can be explained by the advantages of optical imagery in providing spectral information and of SAR imagery in providing structural or morphological information.

Comparison of Biomass Estimation Using Hybrid Indices and Regression Techniques
Gaussian-based regression algorithms have already shown better performance than other commonly used machine learning models [51,70,71]. Accordingly, single-target Gaussian regression models have been increasingly used for the estimation of vegetation parameters in the literature. In this study, we introduced MGPRS as the best option for the joint estimation of crop biomass components and compared its efficiency with frequently used SGPR [48,55,72]. The estimation accuracies using independently developed optical and SAR indices used as input for SGPRS were better than those achieved by direct correlation with biomass components (Table 4). Crop biomass estimation was considerably improved when the independently developed vegetation indices (RDVI1 and SNDVH_VV) were used as input for MGPRS (r 2 = 0.70, RMSE = 0.14 kg/m 2 for leaf biomass; r 2 = 0.62, RMSE = 0.37 kg/m 2 for stem biomass; r 2 = 0.65, RMSE = 0.30 kg/m 2 for stem and leaf biomass; r 2 = 0.67, RMSE = 0.19 kg/m 2 for AGB). The retrieval accuracy of leaf biomass is quite promising using SODVI as input for MGPRS. Similar to leaf biomass, the retrieval accuracies of stem biomass, stem and leaf biomass, and AGB were improved using SODVI as input for MGPRS. The margin of estimation for stem biomass was below the 1:1 line, showing serious underestimations using SODVI as input for SGPR. Nevertheless, using SODVI as input for MGPRS, the margin of estimates spread across the 1:1 line with a higher accuracy (r 2 = 0.84 and RMSE = 0.40 kg/m 2 ). A similar trend was observed for stem and leaf biomass and AGB (Figures 5 and 6). Among the estimated biomass components, high variation between SGPR and MGPRS using SODVI as input was observed for AGB as statistical indicators improved from r 2 = 0.67 and RMSE = 0.48 kg/m 2 (SGPR) to r 2 = 0.87 and RMSE = 0.16 kg/m 2 (MGPRS). The spread of estimates from 1:1 was also higher for SGPR than for the estimates of MGPRS. The narrow margin of estimation from the 1:1 line for AGB ( Figure 6) indicates the advantage of using MGPRS to tackle the problem of saturation for high biomass. Combining SAR with optical imagery can decouple the soil and background noises for dense vegetation [25]. Most of the indices saturated when there was around 400 g/m 2 of leaf biomass, and about 800 g/m 2 of total biomass [14]. The combination of different optical indices from satellite imagery can be effectively used for the estimation of biomass only at the pre-heading stages. For the biomass estimation of the whole season, Cheng et al. (2017) proposed a spectroscopy-based estimation of biomass from hyperspectral measurements [16]. Field data measured from handheld instruments have limitations in large scale multi-temporal monitoring of biomass components. The combined usage of optical and SAR imagery with a MGPRS model is an available option for multi-temporal mapping of biomass components using Sentinel-1 and Sentinel-2 images.
The overall better performance of MGPRS over SGPR can be explained by the use of predicted output as input to correct the next predictions in MGPRS. Previously, studies have pointed out that considerations of the previous outputs to transform the given training data help multi-target regression methods to reduce model bias at the cost of increasing model variance [49]. Meanwhile, in the case of SGPR, it is difficult to gain an advantage on the previous outputs to improve the overall prediction performance of the given prediction model. Both parametric and non-parametric machine learning algorithms have been used to estimate crop biophysical parameters [43,73]. Among the parametric machine learning algorithms, physically-based models have been used in most previous studies [74][75][76]. These physically-based models are complicated and have many assumptions. For these limitations of physical models, non-parametric machine learning methods become the best option. Among the non-parametric methods, GP models showed higher performance in exploiting multi-sensor information and retrieving biophysical parameters [55]. Since GP models can deduce function information in the prior form, they can allow a fine and accurate trade-off between fitting the data and smoothing.

The Overall Performance of the Used Predictive Models
The biomass estimation results with SODVI and SOMVI variables as input for MGPRS (r 2 = 0.83, RMSE = 0.43 kg/m 2 for leaf biomass; r 2 = 0.84, RMSE = 0.40 kg/m 2 for stem biomass; r 2 = 0.81, RMSE = 0.41 kg/m 2 for stem and leaf biomass; r 2 = 0.87, RMSE = 0.16 kg/m 2 for AGB) showed r 2 and RMSE values comparable to or better than other related studies. The good results achieved in this study can be justified by the high predictive capability of combined indices, which exploit the spectral information from the optical sensor and structural information from the SAR sensor. The semi-empirical water cloud model regularized by different machine learning methods (SVM, MOSVM, ANN and RF) represents the common way to predict the biophysical parameters using spaceborne SAR imagery [40,42,77]. One important problem of using the empirical retrieval method is its limited transferability in the sense of different types of vegetation and sites. Parametrization and simulation of the training data is time-consuming and inadequate parametrization can lead to poor accuracy of estimation. Parametric and non-parametric regression algorithms have been directly used with different SAR and optical vegetation indices, and acceptable results have been reported [21,47,78]. These models have drawbacks in making trade-offs between data fitting and smoothing and involve large quantities of data for training. The non-parametric regression prediction model used here (MGPRS) is advantageous over the semi-empirical water cloud model and other parametric and non-parametric regression models used so far. MGPRS does not have fixed model complexities like other non-parametric models and can be modified to the appropriate level of function according to the relationship exhibited by the given data. MGPRS is not prone to underestimation and overfitting problems. The empirical and semi-empirical water cloud models have been extensively used for crop biophysical parameter estimation from SAR imagery [42]. However, the soil and vegetation background noise, which occurs at later growth stages, hampers the retrieval accuracy [7]. Previous findings pointed out that modifying the empirical models using optical indices for the estimation of LAI can decouple soil and vegetation background noise and can improve estimation accuracy [7]. The estimation of biophysical parameters using the empirical and semi-empirical models are complicated and not efficient for in-field crop monitoring activities. The main reasons include poor retrieval accuracy at later growth stages due to soil and background noise, extensive input requirements for the parametrization of the models and risk of being prone to errors, and inability of being parametrized for all biomass components.
In this study, the biomass components predicted with MGPRS using SODVI as input are comparable or higher than existing studies. The joint estimation using MGPRS from SAR-optical hybrid indices is a good option for biophysical parameter estimations over the growth stages and can improve in-season crop growth monitoring.

Conclusions
This study demonstrated the utility of satellite imagery from Sentinel constellations for obtaining crop biomass components at a large scale using hybrid vegetation indices and MGPRS. Correlation analysis was done between constructed vegetation indices and field-measured biomass components. The best performing SAR and optical indices were used to create hybrid vegetation indices (SOMVI and SODVI). They were regressed with field sampled biomass components and were also used as input for regression techniques (MGPRS and SGPR). For all scenarios, the hybrid vegetation indices showed better estimation performance than each index from optical and SAR imagery used independently. The validation accuracy of the used regression model with MGPRS resulted in lower RMSE and higher r 2 values (r 2 = 0.83, RMSE = 0.43 kg/m 2 for leaf biomass; r 2 = 0.84, RMSE = 0.40 kg/m 2 for stem biomass; r 2 = 0.81, RMSE = 0.41 kg/m 2 for stem and leaf biomass; r 2 = 0.87, RMSE = 0.16 kg/m 2 for AGB). The combined indices from optical and SAR imagery can decouple soil and background noises for SAR imagery data and tackle the optical saturation problem. The methodology proposed here can be used for biomass estimation over multiple growth stages using Sentinel missions. It can be concluded that MGPRS with combined vegetation indices can be of particular interest for operational crop growth monitoring, as it can estimate multiple biomass components simultaneously.