1. Introduction
The interaction between global environmental change and terrestrial ecosystems has always been one of the central issues in the study of global change [
1]. Vegetation, which covers 70% of the global land area, is an essential indicator of the change of the land ecological environment. It is also the major object of earth observation with remote sensing techniques. The ecological processes related to plant material energy exchange, for instance, photosynthesis, transpiration, respiration, and primary productivity, are in close connection with the biophysical and biochemical parameters of the vegetation. Among these parameters, chlorophyll is a crucial antenna pigment, which is responsible for light absorption and transfer in photosynthesis. Changes in the leaf chlorophyll content (LCC) thus directly affect biochemical processes such as photosynthesis and primary productivity [
2]. In agricultural remote sensing research, chlorophyll is also used as an important index of crop growth conditions [
3], and its content variations are related to crop stress, the aging process, and nitrogen nutrition [
4]. Therefore, quantitative analysis of LCC has important significance, not only for understanding the process of material and energy exchange between plants and the environment, but also for monitoring crop growth, nutritional status, and stress conditions in agricultural applications.
Owning to its remarkable absorption characteristics in the visible range, nondestructive estimation of LCC is possible with spectral analysis and remote sensing techniques, and numerous studies have focused on chlorophyll retrieval methods [
5,
6,
7,
8]. Generally, the retrieval approaches can be classified into four methodological categories: parametric regression methods, nonparametric regression methods, physically based model inversion methods, and hybrid regression methods [
9], and each method has captured varying degrees of attention in chlorophyll assessment when using multi-spectral or hyperspectral datasets acquired from ground-based, airborne, and space-borne sensors.
Parametric regression methods, such as vegetation indices (VIs), and spectra of first-order and second-order differential characteristics, have been extensively used for chlorophyll retrieval. For example, Pu and Gong [
10] compared and analysed the relationship between hyperspectral reflectance, its first-order and second-order differential characteristics, and the leaf chlorophyll content, and it was found that the first-order differential value at 725 nm and the second-order differential value at 705 nm had the highest correlation with LCC, and the values of the correlation coefficients were both higher than 0.7. Based on Medium Resolution Imaging Spectrometer (MERIS) satellite data, Dash and Curran [
11] proposed the MERIS terrestrial chlorophyll index (MTCI) using red and red-edge band data, and found that MTCI was suitable for accurate estimation of the crop chlorophyll content. Gitelson et al. [
5] established two chlorophyll indices, i.e., green and red-edge chlorophyll indices (CIgreen and CIred-edge), respectively, using a conceptual model, and these two indices showed excellent performance in canopy chlorophyll content retrieval. Yu et al. [
4] proposed a ratio of the reflectance difference index (RRDI) based on the multiple scatter correction (MSC) theory. The results indicated that RRDI was accurate for LCC assessment, and it could alleviate the effect of structural characteristics on LCC retrieval to some extent.
Different from parametric methods that use spectral features established from several specific bands, nonparametric methods take advantage of full-spectrum information based on training data to optimize regression algorithms [
12]. For instance, Tang et al. [
13] investigated and compared multiple linear regression (MLR), back propagation, radial basis function neural networks (BPNN, RBFNN), and partial least squares regression (PLSR) for assessing LCC in soybean plants. Their results suggested that these regression algorithms with wavelet analysis could achieve good estimation results. Among them, RBFNN and PLSR with a Gaussian kernel function showed the best accuracy and stability for LCC retrieval. Zhao et al. [
6] utilized three methods, i.e., the Bayesian model average (BMA), PLS, and stepwise multiple regression (SMR), for LCC assessment with abundant measured leaf data. It was found that these three models achieved a good estimation accuracy. Moreover, the BMA algorithm could alleviate the overfitting problem and improve the generalization of the established LCC model compared with PLS and SMR; thus, it was more suitable for LCC retrieval. Based on spaceborne Compact High Resolution Imaging Spectrometer (CHRIS) data and airborne Compact Airborne Spectrographic Imager (CASI) data, Verrelst et al. [
14] investigated and tested the Gaussian process regression (GPR) algorithm for LCC estimation. Their results suggested that GPR was suitable for LCC retrieval.
Physically based model inversion was established on the basis of radiative transfer models (RTMs). RTMs are quantitative models that explain the mechanism describing the relationship between spectral reflectance and vegetation biophysical and biochemical parameters. These models can be used to perform abundant simulations based on a robust understanding of physical, chemical, and biological processes [
15]. The process with plant input parameters to simulate leaf- or canopy- level reflectance is called ‘forward’, and inversion is the inverse process. Among all RTMs, the leaf optical properties model PROSPECT and canopy bidirectional reflectance model SAIL (Scattering by Arbitrary Inclined Leaves) are widely used in the remote sensing community. Darvishzadeh et al. [
16] tested the capability of PROSAIL RTM and ALOS AVNIR-2 multispectral image data using a lookup-table (LUT) approach for assessing the canopy chlorophyll content in paddy rice. Their results demonstrated the ability of the PROSAIL inversion method to estimate the canopy chlorophyll content in paddy rice using ALOS AVNIR-2 multispectral data. For the sake of alleviating the ill-posed issue of LUT-based RTM inversion methods, Rivera et al. [
17] analyzed different regularization strategies, including varied cost functions (CFs), applying different levels of noise, and employing multiple best solutions, to relieve the problem of LCC estimation. Their results showed that LUT-based RTM inversion methods together with different regularization strategies evidently improved the estimation accuracy, and employment of a normalized “L1-estimate” CF in the inversion process achieved the best estimation with a relative error of 17.6%. Zhang and Wang [
18] conducted research on the assessment of LCC in
Tamarix ramosissima via inversion of PROSPECT RTM by introducing a merit function. They used its calibrated version instead of the original PROSPECT-4 and found that the calibrated PROSPECT-4 was more accurate for the retrieval of LCC with a root mean square error (RMSE) value of 28.79 mg/m
2. Croft et al. [
19] evaluated the capability of LUT-based RTM inversion methods for LCC assessment with multi-spectral Landsat-8 imagery. They adopted a two-step inversion process using coupled PROSPECT and SAIL RTMs, and it exhibited an accurate estimation (RMSE = 16.18 μg/cm
2) of LCC with Landsat-8 data.
Hybrid regression methods take advantage of both physically based techniques and machine learning regression algorithms (MLRAs). That is, these approaches utilize abundant synthetic data simulated by RTMs instead of measured data collected from field campaigns for training machine learning regression models, so as to improve the generalization and computational efficiency of the models. For instance, Malenovsky et al. [
8] investigated the combination of continuum removal and RTM for LCC retrieval from the data acquired by Airborne Imaging Spectroradiometer (AISA) Eagle. They applied a continuum removal technique to PROSPECT-DART (discrete anisotropic radiative transfer) simulations and then used these data to train an artificial neural network (ANN). Their ground validation results showed that the ANN and PROSPECT-DART hybrid approach was accurate for LCC estimation, with an RMSE value of 2.18 μg/cm
2 and a relative RMSE (RRMSE) value of 4.18%. To mitigate the problem of computational costs for MLRAs, especially when the amount of RTMs training data is extremely large, Verrelst et al. [
20] employed active learning (AL) techniques so as to optimize sample selection from simulated Sentinel-3 Ocean and Land Color Instrument (OCLI) data for training Kernel-based MLRAs. Their results suggested that AL methods were more efficient than random sampling in choosing appropriate samples for training the MLRAs, since MLRAs together with AL techniques exhibited better estimation accuracy than the results with random sampling. Research conducted by Upreti et al. [
21] for LCC retrieval with Sentinel-2 data also supported the conclusion that the AL technique was efficient in selecting samples for training MLRAs.
The above-mentioned literature has indeed enriched the methodologies for LCC assessment with remote sensing techniques. Nevertheless, each retrieval method had its own drawbacks that need to be avoided or overcome. For parametric regression methods, the representativeness of experimental samples and the physical mechanism of remote sensing models are crucial to the effectiveness and universality of these models. However, the problem of overfitting training data collected from field experiments may be incurred by flexible model definitions when nonparametric methods are used. In order to mitigate this overfitting issue, various advanced machine learning algorithms had been considered for LCC retrieval [
12]. In terms of RTM inversion, the inversion process is actually an ill-posed problem, since different combinations of leaf-level and canopy-level parameters could lead to very similar simulations of canopy reflectance. Moreover, simplifications and idealization of some processes in RTMs could produce inaccuracies for canopy reflectance modeling [
22]. LUT-based RTM inversion strategies and different regularization strategies might be efficient to mitigate the ill-posed issue, and to better handle the inversion process. For hybrid regression methods, it should be noted that these approaches do not alleviate the main issues of RTMs; they merely use all available data simulated by RTMs to train machine learning regression models. Nevertheless, the main shortcoming of these models with respect to adopting hybrid methods is the computation cost. AL approaches, which are intended for selection of optimal samples from a training data pool, can be promising for obtaining an optimized training set and increasing computational efficiency for hybrid methods. Thus, optimizing and improving different LCC retrieval methods are needed, particularly for the application of these methods to various new sensors for LCC estimation.
In recent years, with the rapid advance of earth observation technologies, newly launched satellite sensors, such as the Gaofen (GF) series in China, Sentinel series in Europe, and Landsat-8 in the US, offer huge potential for enrichment of LCC retrieval methodologies. The Landsat-8 Operational Land Imager (OLI) is the newest senor by far in the Landsat observation project. Compared to previous sensors, the Landsat-8 OLI sensor has advanced spectral bands and radiometric resolution, a better signal-to-noise ratio, and it has been used for various purposes in the terrestrial ecosystem [
23]. Nevertheless, limited studies have reported an investigation of Landsat-8 OLI data or Landsat series datasets for plant LCC retrieval [
19,
24,
25]. Research on the potential and capability of LCC modelling using Landsat-8 OLI data has a profound influence: on the one hand, robust and accurate LCC models from different satellite sensors could be used together for deriving high-frequency LCC products for rapid monitoring of agricultural crops; on the other hand, these models could provide methods and technical support for applications of similar multispectral sensors onboard unmanned aerial vehicles (UAVs) for LCC estimation at a specific fine scale. Therefore, the aim of the present study was to assess the capability of Landsat-8 OLI data for LCC modelling with different retrieval methods. The specific objectives were to: (i) investigate the performance of broadband vegetation indices in LCC assessment with Landsat-8 OLI data; (ii) inspect the ability of machine learning regression algorithms in LCC retrieval; (iii) establish LUT-based RTM inversion based on Landsat-8 OLI data using different regularization strategies to optimize LCC estimation; and iv) explore the feasibility of hybrid methods using computationally demanding MLRAs with different active learning strategies for LCC retrieval.
4. Discussion
Landsat-8 OLI is one of the most remarkable sensors among the Earth Observation projects. Acquired data from this platform have been used for a variety of agricultural applications, such as crop leaf area index estimation, soil moisture retrieval, and crop monitoring [
45,
46,
47]. Nevertheless, its potentials and capabilities for crop leaf chlorophyll content estimation have not been fully explored. The present study took advantage of Landsat-8 OLI imagery and the corresponding field experimental data to completely evaluate its capabilities and potentials for LCC modeling using four different retrieval methods including VIs, MLRAs, LUT-based inversion, and hybrid regression approaches. Overall, the LCC estimation results exhibited good accuracy, which accorded with the research of Croft et al. [
19] and Yin et al. [
48], suggesting that Landsat-8 OLI data are suitable for crop LCC retrieval.
For LCC assessment, VIs that consisted of blue, green, red, and NIR bands were considered on account of the band settings of the Landsat-8 OLI sensor. Even though some VIs, for instance, MTVI2, MSAVI, and EVI2, were not intended for chlorophyll retrieval, they still exhibited good accuracy among all the VIs for LCC estimation. MTVI2 was constructed for increasing the sensitivity to the leaf area index while minimizing chlorophyll influence [
38]. MSAVI aims to increase the dynamic range of vegetation signals and minimize soil background influences [
37]. EVI2 was put forward to increase the sensitivity of vegetation features to high biomass regions while decoupling background signals and reducing atmosphere influences [
35]. Compared with the performance of NDVI, these three indices showed much better results for LCC estimation, suggesting that the modifications of these three indices improved LCC estimation accuracies, particularly for MSAVI and EVI2 since they are composed of red and NIR bands, which is the same as NDVI. The center of the OLI red band is close to the absorption peaks of chlorophyll a and b at 662 nm and 644 nm [
2], which could partly explain the good performance of these VIs. Furthermore, the LCC values used in this study were converted from SPAD readings, while SPAD readings were calculated from the transmission features of red (650 nm) and infrared (940 nm) light [
49]. This could also account for the good performance of MTVI2, MSAVI and EVI2 in LCC estimation despite that their original purposes were not for LCC assessment. It is worth noting that combination of red and NIR bands showed better results than that of the combination of green and NIR bands since NDVI exhibited more accurate results than GNDVI. Overall, all these VI results suggest that red and NIR bands are critical for LCC assessment with Landsat-8 OLI data.
Compared with VI methods, MLRAs generally had slightly better results, since they utilized all band information and nonlinear transforms. NN gained attention for agronomic parameter modelling and operational products in previous studies [
29,
50,
51]. Here, FNN did not outperform other MLRAs and showed a rather similar estimation to that of PLSR, RF, and SVR, suggesting that it might not be the most adequate algorithm. The methodologies of PLSR, RF, and SVR are different from each other, and they exhibited different performances for varied agronomy parameter retrieval in previous studies using hyperspectral data [
4,
52,
53]. Here, they exhibited very similar estimation results. This might be attributed to the confined broadbands (i.e., 6 bands) used in these models. In comparison, KRR and GPR showed even better estimation results. These accurate results accord with their performance in previous research [
54,
55]. Among all MLRAs, GPR is the most capable for not only maintaining very good numerical performance and stability but also for largely overcoming the blackbox issue, by providing ranking features (bands) that are used in the model [
14]. According to GPR sigma band analysis, we found that the red band and NIR band are the top two bands frequently used in GPR models, which indicates that these two bands are critical for GPR modeling. This could also support the phenomenon that VIs composed of red and NIR bands showed good results for LCC estimation.
In terms of LUT-based inversion methods, ten CFs with different multiple best solution regularization strategies showed varied behaviors for LCC retrieval. The results suggest that the “root mean square error” CF, which was extensively-used in some previous studies [
56,
57], might not be the optimal CF for LCC inversion with Landsat-8 OLI data since it exhibited rather poor estimation. In comparison, CFs such as “Pearson chi-square”, “Geman and McClure”, and “K(x) = −log (x) + x” that belong to three different families, had much better estimates. Among them, “K(x) = −log (x) + x” showed the best inversion accuracy. These results accord with the works of Rivera et al. [
17] and Verrelst et al. [
58]. The use of multiple solution regularization strategies did improve the inversion accuracy of different CFs as compared with the cases without using them. However, it seems that high values of multiple solutions were more effective than low values in regulating LUT-based inversion since most CFs achieved good estimations when high values of multiple solutions (i.e., 30%) were used. For noise regularization, a Gaussian noise model was used, and the same noise criterion (details in
Section 2.4) was adopted for both LUT-based inversion methods and hybrid regression approaches, in order to make a comparison between them. Generally, LUT-based inversion methods were more effective than hybrid regression approaches in LCC retrieval with Landsat-8 OLI data, since LUT-based inversion methods with most CFs exhibited better LCC estimation. Reasons for this might be largely connected with the data sizes that were different in using these two methods: LUT-based inversion used all the simulated data (n = 121,500) for modelling, whilst partial simulation (n = 2500) was used for establishing hybrid regression models. Compared with the results from the full training data set with GPR, the use of AL methods with GPR led to superior retrieval accuracies, and all AL techniques actually exhibited similar estimation results for ground validation. AL methods from the diversity family showed consistent results for cross-validation and ground-validation processes, whereas the uncertainty AL exhibited quite a difference between the two processes, especially for the results of EQB with GPR, suggesting that it might be unstable though it achieved the best accuracy for ground-validation. Even though diversities existed between different ALs for training GPR models, we can conclude that AL approaches were fairly effective and accurate for LCC retrieval with hybrid regression methods.
5. Conclusions
In this study, the potential and capability of Landsat-8 OLI multispectral data for LCC assessment in winter wheat were comprehensively investigated and evaluated, using different retrieval methods including broadband VIs, MLRAs, LUT-based inversion, and hybrid regression approaches. Overall, the LCC estimation results exhibited good accuracies except variations existed between different retrieval methods. Among the selected VIs, MTVI2 showed the best estimation accuracy with an RMSE of 5.99 μg/cm2 and an RRMSE of 10.49%. VIs (i.e., MSAVI, EVI2, OSAVI) established from red and NIR bands also exhibited good accuracy for LCC estimation. MLRAs generally had slightly better results compared to those of VIs. GPR best captured the variations in LCC with the highest accuracy for LCC retrieval (RMSE = 5.50 μg/cm2, RRMSE = 9.62%). Furthermore, the red band and NIR bands outweighed other bands in GPR modelling, suggesting these two bands are of great importance for LCC retrieval. LUT-based inversion methods with different CFs exhibited varied results. “K(x) = −log (x) + x” CF that belongs to the “minimum contrast estimates” family had the best accuracy (RMSE = 8.08 μg/cm2, RRMSE = 14.14%), followed by the “Pearson chi-square” and “Geman and McClure” CFs from “information measures” and “M-estimates” families, respectively. Moreover, the addition of multiple solution regularization strategies improved the inversion accuracy compared with the cases without using them. Owing to the computational cost and limited simulated data for modelling, hybrid regression methods with GPR exhibited inferior estimation compared to the results of LUT-based inversion. Nevertheless, the use of AL techniques together with GPR for LCC modelling significantly increased the estimation accuracy compared with the results from the full training data set with GPR, and the combination of EQB and GPR had the best accuracy for ground validation (RMSE = 12.43 μg/cm2, RRMSE = 21.77%). On the basis of all tests carried out in this work with different retrieval methods, it can be concluded that Landsat-8 OLI multispectral data can be accurately used for crop LCC retrieval.