Improving the Retrieval of Crop Canopy Chlorophyll Content Using Vegetation Index Combinations

: Estimates of crop canopy chlorophyll content (CCC) can be used to monitor vegetation productivity, manage crop resources, and control disease and pests. However, making these estimates using conventional ground-based methods is time-consuming and resource-intensive when deployed over large areas. Although vegetation indices (VIs), derived from satellite sensor data, have been used to estimate CCC, they suffer from problems related to spectral saturation, soil background, and canopy structure. A new method was, therefore, proposed for combining the Medium Resolution Imaging Spectrometer (MERIS) terrestrial chlorophyll index (MTCI) and LAI-related vegetation indices (LAI-VIs) to increase the accuracy of CCC estimates for wheat and soybeans. The PROSAIL-D canopy reﬂectance model was used to simulate canopy spectra that were resampled to match the spectral response functions of the MERIS carried on the ENVISAT satellite. Combinations of the MTCI and LAI-VIs were then used to estimate CCC via univariate linear regression, binary linear regression and random forest regression. The accuracy using the ﬁeld spectra and MERIS data was determined based on ﬁeld CCC measurements. All the MTCI and LAI-VI combinations for the selected regression techniques resulted in more accurate estimates of CCC than the use of the MTCI alone (ﬁeld spectra data for soybeans and wheat: R 2 = 0.62 and RMSE = 77.10 µ g cm − 2 ; MERIS satellite data for soybeans: R 2 = 0.24 and RMSE = 136.54 µ g cm − 2 ). The random forest regression resulted in better accuracy than the other two linear regression models. The combination resulting in the best accuracy was the MTCI and MTVI2 and random forest regression, with R 2 = 0.65 and RMSE = 37.76 µ g cm − 2 (ﬁeld spectra data) and R 2 = 0.78 and RMSE = 47.96 µ g cm − 2 (MERIS satellite data). Combining the MTCI and a LAI-VI represents a further step towards improving the accuracy of estimation CCC based on multispectral satellite sensor data.


Introduction
Chlorophyll is the main photosynthetic leaf pigment, playing a critical role by converting solar radiation into stored chemical energy [1]. Canopy chlorophyll content (CCC) is calculated based on the leaf area index (LAI) and leaf chlorophyll content (LCC) and expressed per unit leaf area. This measure is useful for monitoring the productivity and growth status of vegetation [2,3]. Over the past few decades, extensive research has found that CCC is the primary driving force for estimating gross primary productivity (GPP) [1,4], so its accurate determination is extremely important for agricultural applications.
The methods currently used to determine CCC consist of two approaches: (1) A laboratory-based approach, and (2) non-destructive remote sensing technology. The first is highly accurate but also time consuming, resource intensive, and destructive, limiting its large-scale application [5]. The development of remote-sensing technology has enabled the estimation of CCC using satellite data with various temporal and spatial resolutions [6][7][8].
The vegetation index (VI) approach is widely used to estimate CCC due to its simplicity, convenience, and high computational efficiency [9][10][11]. Retrieving CCC requires a vegetation index that is sensitive to both LCC and the LAI, since both of these influence CCC. A variety of VIs have been published that are based on field measurements and simulated datasets obtained from the radiation transfer model [8,9,12]. The MERIS terrestrial chlorophyll index (MTCI) was developed by utilizing red and red-edge position bands and is commonly used to estimate CCC based on both MERIS and other hyperspectral data [13].
Due to the structural characteristics of leaves, the canopy architecture and soil background can significantly affect the optical properties of leaves and canopies [14][15][16][17]. Therefore, issues can arise when estimating CCC using VIs calculated based on canopy reflectance. Several studies have indicated that a strong correlation exists between the MTCI and CCC, but this gradually weakens as the LAI increases [18,19]. The spectral saturation associated with high LAI values has always been an issue when estimating canopy population parameters and when using VIs to retrieve CCC. Various studies have indicated that vegetation spectra and VIs can become saturated when high LAI levels are observed for different types of vegetation [20,21]. The vegetation indices sensitive to the LAI are named LAI-related vegetation indices (LAI-VIs) and derived from spectral reflectance at red and near-infrared bands. LAI-VIs also encounter saturation issues when used for estimating LAIs. Some modified LAI-VIs have been developed to reduce the saturation produced by high LAI values [16,22,23]. However, little is known about CCC retrieval when in the presence of high LAI values. Additionally, the MTCI is observed to have a weaker relationship with CCC at low coverage [24]; this is because the soil background has a significant impact on the red-edge reflectance when the canopy is sparse or has low coverage [25].
More information related to leaf chlorophyll and canopy structure can be obtained by using a combination of multiple VIs rather than a single VI [26,27]. A combination of the chlorophyll-related VI (CHL-VI) and LAI-VI has previously been used for estimating LCC and is resistant to LAI variations [28][29][30]. The ratio of the transformed chlorophyll absorption in reflectance index (TCARI) to the optimized soil-adjusted vegetation index (OSAVI), called the TCARI/OSAVI, has been used to accurately retrieve crop chlorophyll with hyperspectral airborne imagery [28]. Several other ratio-based VIs have been developed and applied for the ground and satellite-based remote sensing of crop and forest LCC [29,30]. The combination of the CHL-VI and LAI-VI with multiple regression or cost functions is a radiative transfer model inversion that has also improved LCC estimation [31]. However, these VI combinations are insensitive to the vegetation population and not useful for CCC retrieval. Most studies to date have used a combination of VIs for LCC estimation and focused on eliminating the influence of the LAI on the CHL-VI. However, only a few have proposed VI combinations for estimating CCC. One such study estimated grassland CCC using a combined VI based on two single VIs calculated from Landsat data, but it did not include a red-edge index such as the MTCI [32].
The most common statistics-based retrieval algorithm is linear regression (LR), which reflects empirical relationships between CCC and VIs [33]. However, the empirical formulae are occasionally incapable of representing nonlinear relationships in complex environmental conditions [34,35]. Machine learning has been widely applied to retrieve vegetation parameters by training spectral reflectance data based on simulations or field measurements, which has shown robustness and improved prediction accuracy [36,37]. In machine learning, the random forest is a classifier containing multiple decision trees and is used for classification and regression [38]. Past research has demonstrated that random forest regression (RFR) is a strong predictor for retrieving the biochemical components of vegetation. The method is widely used due to its accuracy, ease of use, and favorable stability [39,40]. Shah et al. [41] used random forest regression training with several VIs for retrieving the leaf chlorophyll content in wheat, showing good performance (RMSE = 3.62~3.91 µg cm −2 ). RFR has significant advantages for estimating biochemical components, but it has seldom been used to predict CCC. The purpose of this research was to introduce a VI combination approach for retrieving the CCC in crops with fewer uncertainties relative to the use of single VIs. This study combined the MTCI and LAI-VIs as a univariate variable, and two corresponding single VIs were included as binary variables. LR and RFR were used to determine the relationships between CCC and the two variables. The feasibility of the VI combination approach was verified through simulations with the PROSAIL-D model, and its reliability was validated using field canopy spectra and MERIS satellite data. Figure 1 illustrates the spatial distribution of the two study sites. The first was located at the National Station for Precision Agriculture, Xiaotangshan (XTS), Beijing, China (40 •  The canopy reflectance at the XTS site was measured using an ASD FieldSpec Pro  The canopy reflectance at the XTS site was measured using an ASD FieldSpec Pro spectrometer (Analytical Spectral Devices, Boulder, CO, USA) with a spectral range of 350-2500 nm and spectral resolutions of 3 nm (350-1050 nm) and 10 nm (1050-2500 nm). The measurements were obtained between 10:00 and 14:00 local time during clear and cloudless conditions, at a height of 1.3 m above the wheat canopy, with a 25 • field of view. The average canopy reflectance for each plot was obtained via 20 individual measurements [30]. The canopy spectral reflectance at the US-Ne2 and US-Ne3 sites was obtained using two inter-calibrated Ocean Optics USB2000 radiometers (Ocean Optics Inc., Dunedin, FL, USA) ranging from 400 to 1100 nm in spectral range, with a spectral resolution of 1.5 nm [44]. One radiometer upwardly measures the upwelling radiance of the crop at a height of about 5.5 m above the canopy with a 25 • field of view. The other downwardly measured the incident irradiance with a hemispherical field of view. The measurements were taken under clear sky conditions between 11:00 and 14:00 local time, and the reflectance was then calculated using methods described by Gitelson [1].

Measurement of Canopy Chlorophyll Content
At the XTS site, fresh wheat leaf samples were taken from the top of the canopy in a 1 m 2 area for each plot [30], and rapidly placed in a plastic box containing ice for transport to the laboratory. The chlorophyll concentration was determined using a spectrophotometer [45]. The LAI of the sample leaves was measured using the dry-weight method [46]. At the US-Ne2 and US-Ne3 sites, fresh leaves from the soybean plants were collected in six small plots (20 m × 20 m) within each site. The leaf pigment was extracted with 80% acetone and LCC was obtained using a spectrophotometer [47]. The LAI of sample leaves was measured using an area meter (Model LI-3100, Li-Cor Inc., Lincoln, NE, USA) [43]. LCC and LAI for six plots were then averaged as site-level values. The total chlorophyll parameter for the CCC was calculated by multiplying the LAI by the LCC. The statistical analyses of the measured wheat and soybean CCCs are shown in Table 2.

ENVISAT MERIS Data
MERIS is an imaging spectrometer with a medium-spectral resolution onboard the EN-VISAT platform of the European Space Agency (ESA). The instrument can sample surface reflectance in fifteen spectral bands with a range of 415-900 nm and has a temporal revisit time of 2-3 days. The data represent 15 spectral bands in the visible, near-infrared, and shortwave infrared regions with a spatial resolution of 300 m. The detailed specifications of the MERIS sensor are shown in Table 3. In this study, full-resolution surface reflectance products (for 25 June-24 September 2004), were produced by seven-day temporal synthesis from data collected at the original 2-3-day revisit frequency. The MERIS surface-reflectance product provides 13 bands, with bands 11 and 15 removed.

Vegetation Indices
Several LAI-VIs were selected to introduce LAI information into the MTCI. The normalized difference vegetation index (NDVI) was used for comparison, which exhibits saturation for different crops when the LAI is >2 [48][49][50]. Several researchers have modified the NDVI to mitigate the effect of saturation when estimating the LAI. A linearized NDVI (LNDVI) was derived by introducing a linearity-adjustment factor, β, into the NDVI equation. The LNDVI is more sensitive to spectral angles (reflectances) and has denser isolines with an increase in spectral angles (VI values) from the red to near infrared (NIR) space. Therefore, it has improve linearity and maintains a higher sensitivity to the fraction of the vegetation in densely vegetated areas [22]. Liu et al. [51] presented a stretched NDVI (S-NDVI) that was constructed using a scaling transformation function to eliminate saturation when the vegetation fraction became too large. In comparison with the NDVI, the S-NDVI did not reach saturation for LAIs of 2.5-5.0. In addition to such modified NDVIs, other spectral indices also maintain better relationships with the LAI for densely vegetated regions. The renormalized difference vegetation index (RDVI) [52] was proposed to combine the advan-tages of the difference vegetation index (DVI) [53] for low LAIs and those of the NDVI for high LAIs. Tan et al. [54] compared 56 hyperspectral vegetation indices and concluded that the RDVI had the greatest positive relationship with the LAI, and remained far from saturation in the presence of large LAIs. Haboudane et al. [16] designed a new triangular vegetation index (MTVI2) that proved to be the best predictor of the LAI. The MTVI2 was found to be insensitive to changes in chlorophyll and did not exhibit saturation at high LAIs. The vegetation indices used in this study are shown in Table 4.

Simulation of Canopy Reflectance Using the PROSAIL-D Model
The PROSAIL-D model was derived by coupling the PROSPECT-D leaf optical properties model [56] with the 4SAIL canopy bidirectional reflectance model [57]. The PROSAIL-D model was used to simulate MERIS observations and model the CCC based on VIs. It simulates upward and downward hemispherical radiation fluxes between 400 and 2500 nm with seven input parameters, and outputs the leaf spectral reflectance and transmittance. The 4SAIL model is used to simulate canopy reflectance with a series of input parameters. The input parameters for the PROSAIL-D model are shown in Table 5. As shown in Table 5, the LCC values were set between 10 and 80 µg cm −2 , at 10 µg cm −2 intervals. The leaf carotenoid content was set to 25% of the LCC due to its insensitivity to red and red-edge region reflectance. The LAI values ranged from 0.25 to  Figure 2) were used to represent different canopy structures and soil backgrounds, respectively. The five types of soil reflectance were determined using the field-measured spectra of bare, dry soil multiplied by different brightness coefficients. The fraction of diffuse incoming solar radiation (skyl) was calculated in the PROSAIL-D model according to the solar zenith angle. The solar zenith angles were set between 0 and 60 • , at 10 • intervals. The remaining fixed parameters were set according to either field measurements or the scientific literature [58].

CCC Retrieval Model
Three distinct methods were used to estimate CCC to retrieve CCC using VIs based on simulated canopy spectrum data from the PROSAIL-D model. Figure 3 summarizes the steps in retrieving canopy chlorophyll. The published VIs calculated from the simulated canopy reflectance were used based on the PROSAIL-D model presented in this study. First, linear regression was used to assess the relationships between CCC and the VIs. A random forest regression approach trained with VIs was then employed to estimate the CCC. Finally, field measurements and MERIS satellite data were used to validate the three types of constructed model.

Linear Regression Analysis
The performance of the combined VIs was assessed using simple linear regression. Firstly, univariate linear regression (ULR) was employed to model the relationship be-

CCC Retrieval Model
Three distinct methods were used to estimate CCC to retrieve CCC using VIs based on simulated canopy spectrum data from the PROSAIL-D model. Figure 3 summarizes the steps in retrieving canopy chlorophyll. The published VIs calculated from the simulated canopy reflectance were used based on the PROSAIL-D model presented in this study. First, linear regression was used to assess the relationships between CCC and the VIs. A random forest regression approach trained with VIs was then employed to estimate the CCC. Finally, field measurements and MERIS satellite data were used to validate the three types of constructed model. LNDVI, MTCI and SNDVI, MTCI and RDVI, and MTCI and MTVI2. We used the coefficient of determination (R 2 ), root mean square error (RMSE), bias (Bias), and normalized RMSE (NRMSE) to evaluate the fitness and predictive power of the models, respectively. They were calculated as follows: where is the predicted CCC, is the measured CCC, is the average measured CCC, is the maximum value of the CCC, is the minimum value of the CCC, and n is the number of measurements used. While linear regression analysis is simple to implement, particularly in uncomplicated variable spaces, complex vegetated areas are likely to require more advanced methods for their analysis.

Implementing the Random Forest Regression Approach
Random forest is an ensemble learning method that combines multiple decision trees for classification or regression [60]. It runs efficiently on large datasets with excellent performance and accuracy [38,61]. In this study, recursive partitioning was employed to divide the simulated dataset into 100 homogeneous subsets (100 trees), and the results of all the trees were then averaged. A random forest regression (RFR) was implemented in Matlab with the use of the MTCI and LAI-VIs as input features for estimating CCC. The CCC models were validated using ground spectral measurements and MERIS satellite data.  The performance of the combined VIs was assessed using simple linear regression. Firstly, univariate linear regression (ULR) was employed to model the relationship between the CCC and MTIC, MTCI × NDVI, MTCI × LNDVI, MTCI × SNDVI, MTCI × RDVI, and MTCI × MTVI2. Binary linear regression (BLR) models were then constructed in Matlab based on the relationship between CCC and the MTCI and NDVI, MTCI and LNDVI, MTCI and SNDVI, MTCI and RDVI, and MTCI and MTVI2. We used the coefficient of determination (R 2 ), root mean square error (RMSE), bias (Bias), and normalized RMSE (NRMSE) to evaluate the fitness and predictive power of the models, respectively. They were calculated as follows: whereŷ i is the predicted CCC, y i is the measured CCC, y is the average measured CCC, y max is the maximum value of the CCC, y min is the minimum value of the CCC, and n is the number of measurements used. While linear regression analysis is simple to implement, particularly in uncomplicated variable spaces, complex vegetated areas are likely to require more advanced methods for their analysis.

Implementing the Random Forest Regression Approach
Random forest is an ensemble learning method that combines multiple decision trees for classification or regression [60]. It runs efficiently on large datasets with excellent performance and accuracy [38,61]. In this study, recursive partitioning was employed to divide the simulated dataset into 100 homogeneous subsets (100 trees), and the results of all the trees were then averaged. A random forest regression (RFR) was implemented in Matlab with the use of the MTCI and LAI-VIs as input features for estimating CCC. The CCC models were validated using ground spectral measurements and MERIS satellite data. Figure 4 shows that all of the MERIS bands reached saturation due to the increase in LAI, particularly in the visible region, with bands 7 and 8 being the most severe, but the effect was slightly mitigated in the near-infrared region. The relationship plot for MTCI and CCC suggests that the saturation during CCC retrieval was largely caused by the LAI (Figure 5a). Figure 5a provides an overview of the saturation effect observed for the MTCI when the LAI is >2 at different LCCs. In summary, these results suggest that higher LAIs largely contributed to the abnormal saturation of the spectra and MTCI. The soil background and average leaf angle also affected the retrieval of the CCC based on the MTCI. Figure 5b suggests that the soil background had little impact on the MTCI and CCC estimation, especially at high LAIs. This was because the canopy reflectance contained less background reflectance due to the increase in vegetation coverage. By contrast, the effect of the average leaf angle increased with the LAI during CCC estimation, which was due to the influence of the complex canopy structure on the transmission of solar radiation (Figure 5c). Figure 5b,c also show that the LAI was the main cause of saturation for the MTCI. Figure 5b suggests that the soil background had little impact on the MTCI and CCC estimation, especially at high LAIs. This was because the canopy reflectance contained less background reflectance due to the increase in vegetation coverage. By contrast, the effect of the average leaf angle increased with the LAI during CCC estimation, which was due to the influence of the complex canopy structure on the transmission of solar radiation (Figure 5c). Figure 5b,c also show that the LAI was the main cause of saturation for the MTCI.

CCC Estimation Using the Simulated Dataset
The univariate linear regressions between CCC and the models are shown in Figure  6. The R 2 was selected to assess the ability of each model to prevent an ill-posed problem. By comparing Figure 6b-f with Figure 6a, it can be observed that the scatter plots for MTCI × LAI-VIs are more compact than for the MTCI, and the MTCI × LAI-VIs plots also feature ground and average leaf angle also affected the retrieval of the CCC based on the MTCI. Figure 5b suggests that the soil background had little impact on the MTCI and CCC estimation, especially at high LAIs. This was because the canopy reflectance contained less background reflectance due to the increase in vegetation coverage. By contrast, the effect of the average leaf angle increased with the LAI during CCC estimation, which was due to the influence of the complex canopy structure on the transmission of solar radiation (Figure 5c). Figure 5b,c also show that the LAI was the main cause of saturation for the MTCI.

CCC Estimation Using the Simulated Dataset
The univariate linear regressions between CCC and the models are shown in Figure  6. The R 2 was selected to assess the ability of each model to prevent an ill-posed problem. By comparing Figure 6b-f with Figure 6a, it can be observed that the scatter plots for MTCI × LAI-VIs are more compact than for the MTCI, and the MTCI × LAI-VIs plots also feature

CCC Estimation Using the Simulated Dataset
The univariate linear regressions between CCC and the models are shown in Figure 6. The R 2 was selected to assess the ability of each model to prevent an ill-posed problem. By comparing Figure 6b-f with Figure 6a, it can be observed that the scatter plots for MTCI × LAI-VIs are more compact than for the MTCI, and the MTCI × LAI-VIs plots also feature more linear trends. MTCI × MTVI2 showed the best performance, with an R 2 value (0.90) higher than that for the MTCI (0.69). pared in Table 6. The R 2 values were used to evaluate the predictive ability of each model and demonstrated a strong correlation between MTCI and LAI-VIs, and CCC. The binary regression models based on the MTCI and LAI-VIs were more representative than the univariate regression model using the MTCI alone. MTCI and RDVI (R 2 = 0.82), and MTCI and MTVI2 (R 2 = 0.80) had similar results, with excellent performance. However, the overall results for the binary regression models (Table 6) indicate reduced performance compared to the univariate regression models ( Figure 6).  The random forest approach was also tested, using the MTCI and LAI-VIs as inputs, for retrieving the CCC. The R 2 was selected to assess the predictive ability of each model, as shown in Table 7, where it is apparent that the CCC exhibits strong correlations with the MTCI and LAI-VIs with the RFR approach. The R 2 values are higher than for the two types of linear regression. The binary linear regressions were employed in Matlab and the results were compared in Table 6. The R 2 values were used to evaluate the predictive ability of each model and demonstrated a strong correlation between MTCI and LAI-VIs, and CCC. The binary regression models based on the MTCI and LAI-VIs were more representative than the univariate regression model using the MTCI alone. MTCI and RDVI (R 2 = 0.82), and MTCI and MTVI2 (R 2 = 0.80) had similar results, with excellent performance. However, the overall results for the binary regression models (Table 6) indicate reduced performance compared to the univariate regression models ( Figure 6). The random forest approach was also tested, using the MTCI and LAI-VIs as inputs, for retrieving the CCC. The R 2 was selected to assess the predictive ability of each model, as shown in Table 7, where it is apparent that the CCC exhibits strong correlations with the MTCI and LAI-VIs with the RFR approach. The R 2 values are higher than for the two types of linear regression.

Validation of CCC Estimation Using Field Canopy Spectral Measurements
Each model was evaluated by computing the RMSE between the field measurements and CCC values predicted based on the field canopy reflectance data. All of the selected VIs were calculated from field canopy spectra data that was resampled according to MERIS band settings. Figures 7-9 display the scatter plots for the predicted and true values produced by the models. As shown in Figure 7, the accuracy for MTCI × LAI-VIs was higher than that for the MTCI, and MTCI × MTVI2 showed the best performance (R 2 = 0.72, RMSE = 51.68 µg cm −2 , Bias = −51.57 µg cm −2 , and NRMSE = 17.60%). The results for the BLR model (Figure 8) suggest that the MTCI and LAI-VIs performed better than the MTCI alone; however, the performance was not comparable to that of the combined VIs with ULR. Both the ULR and BLR approaches universally overestimated the CCC, and this effect was most prominent with BLR. Thus, these methods cannot compensate for ill-posed VIs. Figure 9 indicates that the RFR approach performed the best with the scatter points much closer to a 1:1 linear pattern, especially for the soybeans. A comparison of all the models revealed that the random forest regression model trained using binary variables based on MTCI and MTVI2 showed the best performance (R 2 = 0.65, RMSE = 37.76 µg cm −2 , Bias = −4.11 µg cm −2 , and NRMSE = 12.89%). Therefore, the random forest regression model trained with binary variables based on the MTCI and LAI-VIs could effectively alleviate the influence of ill-posed VIs on the accuracy of CCC estimation.

Validation of CCC Estimation Using Field Canopy Spectral Measurements
Each model was evaluated by computing the RMSE between the field measurements and CCC values predicted based on the field canopy reflectance data. All of the selected VIs were calculated from field canopy spectra data that was resampled according to MERIS band settings. Figures 7-9 display the scatter plots for the predicted and true values produced by the models. As shown in Figure 7, the accuracy for MTCI × LAI-VIs was higher than that for the MTCI, and MTCI × MTVI2 showed the best performance (R 2 = 0.72, RMSE = 51.68 μg cm −2 , Bias = −51.57 μg cm −2 , and NRMSE = 17.60%). The results for the BLR model (Figure 8) suggest that the MTCI and LAI-VIs performed better than the MTCI alone; however, the performance was not comparable to that of the combined VIs with ULR. Both the ULR and BLR approaches universally overestimated the CCC, and this effect was most prominent with BLR. Thus, these methods cannot compensate for illposed VIs. Figure 9 indicates that the RFR approach performed the best with the scatter points much closer to a 1:1 linear pattern, especially for the soybeans. A comparison of all the models revealed that the random forest regression model trained using binary variables based on MTCI and MTVI2 showed the best performance (R 2 = 0.65, RMSE = 37.76 μg cm −2 , Bias = −4.11 μg cm −2 , and NRMSE = 12.89%). Therefore, the random forest regression model trained with binary variables based on the MTCI and LAI-VIs could effectively alleviate the influence of ill-posed VIs on the accuracy of CCC estimation.

Validation of CCC Estimation from MERIS Satellite Data
The MERIS satellite data obtained from the US-Ne2 site in 2004 were used to validate the reliability and accuracy of the built models. The measurements from the XTS site were not used due to the lack of sufficient time-series data. Figure 10 shows Landsat NDVI

Validation of CCC Estimation from MERIS Satellite Data
The MERIS satellite data obtained from the US-Ne2 site in 2004 were used to validate the reliability and accuracy of the built models. The measurements from the XTS site were not used due to the lack of sufficient time-series data. Figure 10 shows Landsat NDVI

Validation of CCC Estimation from MERIS Satellite Data
The MERIS satellite data obtained from the US-Ne2 site in 2004 were used to validate the reliability and accuracy of the built models. The measurements from the XTS site were not used due to the lack of sufficient time-series data. Figure 10 shows Landsat NDVI timeseries data of different resolutions at the US-Ne2 site in 2004. The 30 m Landsat NDVIs are consistent with the mean NDVIs derived from the 300 m-pixel MERIS data, which indicates the uniform growth of the soybeans at the site. Therefore, we used the ground-measured CCC to assess the usefulness of the MERIS spectral data for retrieving CCC. Table 8 shows the CCC estimation results for the three approaches. The VI-combination methods were found to be more accurate than using MTCI alone (R 2 = 0.24; RMSE = 136.54 µg cm −2 ). However, there was a limited improvement in accuracy when using the ULR and BLR approaches. These approaches showed reduced performance in alleviating the negative influence of the VIs on CCC estimation. However, the RFR approach resulted in good prediction accuracy. The relationship between the predicted and measured CCC for the RFR approach is shown in Figure 11. The validation results for the MTCI and LAI-VIs were better than those for the MTCI alone, and the MTCI and MTVI2 showed the best performance, achieving an accuracy of R 2 = 0.78, and RMSE = 47.96 µg cm −2 . It should be noted that the above analysis and conclusions were based on ground and satellitebased measurements. NDVIs are consistent with the mean NDVIs derived from the 300 m-pixel MERIS data, which indicates the uniform growth of the soybeans at the site. Therefore, we used the ground-measured CCC to assess the usefulness of the MERIS spectral data for retrieving CCC. Table 8 shows the CCC estimation results for the three approaches. The VI-combination methods were found to be more accurate than using MTCI alone (R 2 = 0.24; RMSE = 136.54 μg cm −2 ). However, there was a limited improvement in accuracy when using the ULR and BLR approaches. These approaches showed reduced performance in alleviating the negative influence of the VIs on CCC estimation. However, the RFR approach resulted in good prediction accuracy. The relationship between the predicted and measured CCC for the RFR approach is shown in Figure 11. The validation results for the MTCI and LAI-VIs were better than those for the MTCI alone, and the MTCI and MTVI2 showed the best performance, achieving an accuracy of R 2 = 0.78, and RMSE = 47.96 μg cm −2 . It should be noted that the above analysis and conclusions were based on ground and satellite-based measurements.

Role and Form of VI Combinations in CCC Estimation Models
The CCC is related to the LAI and LCC, and the estimation of chlorophyll content with remote sensing requires information on both variables [62]. The chlorophyll index mainly uses red-edge bands that are sensitive to LCC [10,13], while LAI-VIs use near-infrared bands that are sensitive to the LAI [16,52,55]. The MTCI and LAI-VIs were combined to improve the remote sensing of CCC for crops, in this study. The results indicate that the above VI combinations, with all three of the selected regression models, performed better in estimating the CCC than the MTCI alone. This implies that the VI combinations effectively fused the leaf area index and chlorophyll information.
The type of VI combination that is appropriate depends on whether the estimated vegetation parameter is at the canopy or leaf level. A combined multiplicative VI was used for the univariate regression model. Similarly to in this study, a SAR and optical multiplication vegetation index (SOMVI) has been proposed to improve the estimation of above ground biomass [63]. Several combined VIs based on CHL-VIs and LAI-VIs have also been proposed for LCC estimation using a univariate regression model [28][29][30]. Canopy biomass is a canopy population parameter that is similar to CCC, while LCC is a leaf-level chlorophyll parameter that is independent of the LAI. The combined VIs that are used for LCC estimation are often in the form of ratio indices such as TCARI/OSAVI [28], which reduces the influence of the LAI on the VI. However, multiplicative forms, such as the one used in this study, are not typically used.
The best VI combination can also differ according to the type of regression analysis model. Multiple regression or advanced machine-learning techniques can be used with more input variables to describe complex scenarios. For example, this study used a CHL-VI and a LAI-VI as binary variables for binary regression and the random forest model for CCC estimation. Multiple vegetation indices can also be used in the cost function to estimate vegetation parameters based on the direct inversion of the vegetation model [31]. For estimating the LCC in crops, a simple LUT (look-up table) has been proposed that is indexed using a CHL-VI and LAI-VI [31]. This matrix-based VI combination is a special form of physical model inversion, and has been proven to be better than linear regression models that use a VI with a single ratio. Clevers [12] used MTCI to estimate the CCC of soybeans based on linear regression and field spectra, achieving an RMSE of 86 µg cm −2 , similar to the result were obtained using the MTCI alone (RMSE = 77.10 µg cm −2 ). However, when using a combination of the MTCI and LAI-VI with three regression techniques, our results were far better, and the best performance for three regression approaches were RMSE = 51.68 µg cm −2 (ULR), RMSE = 63.47 µg cm −2 (BLR), and RMSE = 37.76 µg cm −2 (RFR). The results presented in this paper confirm that multiple regression models that integrate CHL-VIs and LAI-VIs, especially the random training model, show better generalization performance than the use of a single CHL-VI.

Influence of LAI-VIs Selection of VI Combinations
The key to the success of the combined approach employed was the excellent performance of the LAI-VIs in reducing the saturation associated with high LAIs. The combination of the MTCI and LAI-VIs resulted in a greater sensitivity to CCC compared to the use of a single MTCI. As shown in Figure 5, high LAIs are the main cause of saturation when estimating CCC. The purpose of adding LAI-VIs is to increase the sensitivity of the MTCI to a high CCC in the presence of a high LAI. Therefore, we investigated the performance of each VI combination in CCC estimation. The results in Sections 3.3 and 3.4 show that the modified NDVIs, especially the LNDVI, performed better than the NDVI. The RDVI and MTVI2, which also proved to be resistant to saturation, were used in the combined VIs and performed better than the NDVI. The sensitivity of LAI-VIs to high LAIs can affect the performance of the training regression model. The relationships between LAI-VIs and the LAI can also be affected by other factors such as canopy coverage, as demonstrated in previous studies, and different LAI-VIs can have different capabilities for estimating the LAI [28,64,65]. The MTVI2 is a modification of the triangular vegetation index (TVI) that preserves the sensitivity at high LAIs and reduces the effects of soil contamination [16]. The relationships between the LAI and LAI-VIs based on the simulated spectra, field spectra, and MERIS satellite data are shown in Figure 12: the MTVI2 was more closely related to the LAI than the other LAI-VIs were, in agreement with previous studies. This might be why the combined VI that included the MTVI2 showed the best results.

Influence of LAI-VIs Selection of VI Combinations
The key to the success of the combined approach employed was the excellent performance of the LAI-VIs in reducing the saturation associated with high LAIs. The combination of the MTCI and LAI-VIs resulted in a greater sensitivity to CCC compared to the use of a single MTCI. As shown in Figure 5, high LAIs are the main cause of saturation when estimating CCC. The purpose of adding LAI-VIs is to increase the sensitivity of the MTCI to a high CCC in the presence of a high LAI. Therefore, we investigated the performance of each VI combination in CCC estimation. The results in Sections 3.3 and 3.4 show that the modified NDVIs, especially the LNDVI, performed better than the NDVI. The RDVI and MTVI2, which also proved to be resistant to saturation, were used in the combined VIs and performed better than the NDVI. The sensitivity of LAI-VIs to high LAIs can affect the performance of the training regression model. The relationships between LAI-VIs and the LAI can also be affected by other factors such as canopy coverage, as demonstrated in previous studies, and different LAI-VIs can have different capabilities for estimating the LAI [28,64,65]. The MTVI2 is a modification of the triangular vegetation index (TVI) that preserves the sensitivity at high LAIs and reduces the effects of soil contamination [16]. The relationships between the LAI and LAI-VIs based on the simulated spectra, field spectra, and MERIS satellite data are shown in Figure 12: the MTVI2 was more closely related to the LAI than the other LAI-VIs were, in agreement with previous studies. This might be why the combined VI that included the MTVI2 showed the best results. Figure 12. Coefficient of determination produced by the linear regression between the LAI and LAI-VIs-including the NDVI, LNDVI, SNDVI, RDVI, and MTVI2-from the simulation dataset, field spectra, and MERIS satellite data.

Comparing the Performance of the Proposed CCC Estimation Models for Satellite Data with Variable LAIs
It is important to examine the performance of models for CCC estimation when using satellite data. Satellite data are affected by several factors such as noise and neighboring effects. It is still challenging to apply retrieval models to satellite data [66]. The results from this study, which only used the MTCI, confirm the findings from previous studies showing that low-LAI and high-LAI conditions can cause difficulties when using VIs to estimate vegetation chlorophyll [19][20][21]24]. The results indicate that the RFR model using a VI combination showed the best improvement in CCC estimation, while the perfor- Figure 12. Coefficient of determination produced by the linear regression between the LAI and LAI-VIs-including the NDVI, LNDVI, SNDVI, RDVI, and MTVI2-from the simulation dataset, field spectra, and MERIS satellite data.

Comparing the Performance of the Proposed CCC Estimation Models for Satellite Data with Variable LAIs
It is important to examine the performance of models for CCC estimation when using satellite data. Satellite data are affected by several factors such as noise and neighboring effects. It is still challenging to apply retrieval models to satellite data [66]. The results from this study, which only used the MTCI, confirm the findings from previous studies showing that low-LAI and high-LAI conditions can cause difficulties when using VIs to estimate vegetation chlorophyll [19][20][21]24]. The results indicate that the RFR model using a VI combination showed the best improvement in CCC estimation, while the performance improvement for MERIS satellite data varied for different LAI values (Figure 13b). The scatter points in Figure 13b are closer to a 1:1 line when LAI > 2, indicating that the MTCI and MTVI2 combination (using random forest regression) could reduce the overestimation of CCC that occurred when using the MTCI alone in the ULR model at high LAIs (Figure 13a). The proposed RFR method improved the large overestimation of values. However, some overestimation still occurred at low LAIs (LAI < 2) (Figure 13b). This is mainly because the chlorophyll VI calculated based on red-edge reflectance was affected by the soil background more than the NIR band when the canopy was sparse or at low coverage [25]. by the soil background more than the NIR band when the canopy was sparse or at low coverage [25].

Limitations of Model Simulations and Validation Data
This study was mainly aimed at enhancing the sensitivity of the MTCI to both leaf chlorophyll information and the LAI for CCC estimation. Therefore, wide ranges for the LCC, LAI, canopy structure, and soil background were used in the PROSAIL simulation to analyze the performance of the LAI-VIs and MTCI combination approach. In this study, the leaf parameters, such as the N, Cw, and Cm, were set to the defaults for wheat and soybean crops ( Table 5). The variations of other leaf parameters were of little significance for evaluating the effects of the proposed methods for fusing LAI and chlorophyll information. However, these parameters may affect the CCC estimation models for different crops based on spectrum simulations. Previous studies have demonstrated that Cw has little influence on the visible and red-edge bands commonly used for vegetation chlorophyll estimation [67], while N and Cm can affect chlorophyll estimation [27,68,69]. The leaf parameters, such as N, should be adjusted according to the specific crop type if the proposed VI combination approach is applied to other crops.
Although this study considered the variation in the solar zenith angle to represent daily and seasonal changes, the simulations and test data used for the modeling and validation were limited to nadir observations. The ground spectrum was obtained at the nadir, and the seven-day 300 m surface-reflectance MERIS product lacked information on the view zenith angle. Therefore, the simulation data in this study were also based on nadir observations. However, remote sensing data, especially satellite imagery, widely vary in the zenith angle. MERIS reflectance data may not be observed from a nadir or near-nadir view [70], which increases the uncertainty of the validation results for the MERIS satellite data. Although the view zenith angle was fixed as a nadir observation, the results from the simulated field and MERIS data all show the effectiveness of the proposed VI combination method for CCC estimation. The influence of the view zenith angle on different CCC estimation models should be evaluated in the future.
Although ENVISAT MERIS is no longer in work, it makes sense to use past data to produce products with long time series. In addition, Sentinel-3 OLCI is the successor to MERIS, and they have the same band settings and similar spectral response functions [58]. Therefore, the CCC retrieval methods developed for MERIS can generally be applied to

Limitations of Model Simulations and Validation Data
This study was mainly aimed at enhancing the sensitivity of the MTCI to both leaf chlorophyll information and the LAI for CCC estimation. Therefore, wide ranges for the LCC, LAI, canopy structure, and soil background were used in the PROSAIL simulation to analyze the performance of the LAI-VIs and MTCI combination approach. In this study, the leaf parameters, such as the N, C w , and C m , were set to the defaults for wheat and soybean crops ( Table 5). The variations of other leaf parameters were of little significance for evaluating the effects of the proposed methods for fusing LAI and chlorophyll information. However, these parameters may affect the CCC estimation models for different crops based on spectrum simulations. Previous studies have demonstrated that C w has little influence on the visible and red-edge bands commonly used for vegetation chlorophyll estimation [67], while N and C m can affect chlorophyll estimation [27,68,69]. The leaf parameters, such as N, should be adjusted according to the specific crop type if the proposed VI combination approach is applied to other crops.
Although this study considered the variation in the solar zenith angle to represent daily and seasonal changes, the simulations and test data used for the modeling and validation were limited to nadir observations. The ground spectrum was obtained at the nadir, and the seven-day 300 m surface-reflectance MERIS product lacked information on the view zenith angle. Therefore, the simulation data in this study were also based on nadir observations. However, remote sensing data, especially satellite imagery, widely vary in the zenith angle. MERIS reflectance data may not be observed from a nadir or near-nadir view [70], which increases the uncertainty of the validation results for the MERIS satellite data. Although the view zenith angle was fixed as a nadir observation, the results from the simulated field and MERIS data all show the effectiveness of the proposed VI combination method for CCC estimation. The influence of the view zenith angle on different CCC estimation models should be evaluated in the future.
Although ENVISAT MERIS is no longer in work, it makes sense to use past data to produce products with long time series. In addition, Sentinel-3 OLCI is the successor to MERIS, and they have the same band settings and similar spectral response functions [58]. Therefore, the CCC retrieval methods developed for MERIS can generally be applied to Sentinel-3 datasets. In the future, we hope to take the synchronous measurement for Sentinel-3 and further test our algorithm.

Conclusions
This paper proposes a combination of the MTCI and LAI-VIs for univariate linear, binary linear, and random forest regression that fuses leaf area index and chlorophyll information to improve the retrieval of CCC based on the PROSAIL-D model. The validation results based on both field spectra and MERIS satellite data reveal that the vegetation index combinations for all three regression models effectively improved the accuracy, although the VI combination can vary for different types of regression analysis models. The combined multiplicative VI, MTCI × MTVI2, showed a better performance for the univariate regression model (field spectra for soybeans and wheat: RMSE = 51.68 µg cm −2 ; MERIS satellite data for soybeans: RMSE = 88.44 µg cm −2 ) than the MTCI alone (field spectra: RMSE = 77.10 µg cm −2 ; MERIS satellite data: RMSE = 136.54 µg cm −2 ). Moreover, combining the MTCI and LAI-VIs into the random forest regression models exhibited greater potential for retrieving the CCC than the two linear regression models. The MTCI and MTVI2 combination with random forest regression performed the best, achieving an RMSE of 37.76 µg cm −2 for the field spectra data, and an RMSE of 47.96 µg cm −2 for the MERIS satellite data. This study indicates that random forest regression models with a combination of the MTCI and LAI-VIs have an advantage in the fusion of LAI and leaf chlorophyll information, and can produce accurate and robust estimates for CCC. Due to its simplicity, the single combined multiplicative VI with an empirical regression model also shows potential for estimating CCC. The vegetation index combination method proposed in this paper can be applied to other chlorophyll-related vegetation indices to improve their performance in CCC estimation. Since the simulation data used in this paper were mainly related to wheat and soybean crops, the above-described model for CCC retrieval based on simulated datasets must be adjusted according to vegetation type.