Leaf Area Index (LAI), defined as one half the total green leaf area (double-sided) per unit horizontal ground surface area of vegetation canopy [1
], is an essential biophysical variable used extensively in soil-vegetation-atmosphere modeling [3
]. In agroecosystems, the total leaf area of the crop canopy, as quantified through LAI, is one of the key constraints on carbon assimilation and transpiration rates, which together drive the accumulation of crop primary productivity [6
]. Therefore, LAI is commonly required to estimate photosynthesis, evapotranspiration, crop yield, and many other physiological processes in agroecosystem studies [8
LAI has historically been measured for crop canopies using in situ (destructive or optical) approaches or remote sensing techniques [12
]. Although the in situ approaches are accurate and easy to implement, they are also labor and time intensive, and the sample-based measurements are spatially discontinuous [17
]. In contrast, remote sensors onboard satellite or aircraft are capable of making spatially complete measurements of surface reflectance, which are related to the greenness of canopy. Therefore, there has been continuous interest in estimating LAI using images acquired from airborne/space-borne sensors [19
To estimate LAI using remotely sensed data, two types of methodologies have been adopted: the process-based approach and the empirical approach based on Vegetation Index (VI), also called the VI approach [24
]. Process-based approaches obtain LAI estimates by inverting a radiative transfer model forced with canopy reflectance data retrieved remotely [28
]. Radiative transfer models simulate the (bidirectional) reflectance of the land surface through a series of physical or mathematical description of the physical and radiometric properties of background (i.e., soil or snow surface), the object (i.e., canopy or other surfaces), atmosphere, as well as solar and sensor geometries [34
]. However, unknown model variables usually outnumber reflectance observations, leaving model inversion unsolved or with multiple solutions—an issue referred to as the “ill-posed problem” [40
]. Therefore, although the process-based approach benefits from detailed physical descriptions of the atmosphere-canopy-soil system, its robustness largely depends on the accuracy of model parameters, which has limited its applicability in large scale [41
Unlike the process-based approach, the VI approach provides a simple yet effective alternative by establishing a statistical relationship between remotely sensed VIs and observed LAI values, hereafter referred as an LAI-VI relationship [19
]. VIs are constructed from reflectance of two or more spectral bands, and can be used to estimate biophysical/biochemical characteristics of vegetation, such as LAI, biomass, and canopy chlorophyll content [44
]. A number of VIs has been show to correlate well with LAI. The earliest attempts used VIs such as the Simple Ratio (SR) [48
] and Normalized Difference Vegetation Index (NDVI) [49
], which were designed to accentuate the difference between red and near-infrared (NIR) reflectance. More optimized VIs were later proposed with increased sensitivity to vegetation characteristics (e.g., LAI) and minimized effect from confounding factors (e.g., canopy geometry, soil, and atmosphere). These include Soil Adjusted Vegetation Index (SAVI) [50
], Atmospherically Resistant Vegetation Index (ARVI) [51
], Enhanced Vegetation Index (EVI) [44
], Modified Triangular Vegetation Index (MTVI2) [54
], Wide Dynamic Range Vegetation Index (WDRVI) [55
], and EVI2 [56
]. Besides different types of VI used, the LAI-VI relationships also take various mathematical forms or equations, such as linear, exponential, logarithm, or polynomial [57
Numerous studies, mostly at local scales, have tested the VI approach and demonstrated its effectiveness at various locations around the world, using either field-measured or remotely sensed reflectance data [60
]. Nevertheless, the LAI-VI relationship is not unique, particularly in agricultural settings, but rather represented by a family of equations as a function of the specific geographical, biological, and environmental setting of a study. As a result, the VI approach requires a new set of LAI measurements to be made at each new location, each time, and for each crop, rendering its application impractical over large areas or multiple time periods [64
This “one place, one time, one equation” issue has been identified as a major limitation of the VI approach to map LAI with remotely sensed observations [27
]. While our knowledge of the dependence of the LAI-VI relationship on plant/crop types or canopy geometry continues to advance [43
], a number of questions remain. For example, to what extent can we spatially and temporally generalize the LAI-VI relationships? Can the variation caused by a number of environmental factors be controlled within an acceptable range? Is there a “one-size-fits-all” relationship that suits a global sample across different crop types and time periods? To answer these questions, we (1) synthesized a global dataset of in situ crop LAI measurements and remotely sensed VIs derived from Landsat Thematic Mapper (TM) and Enhanced Thematic Mapper Plus (ETM+) images; (2) established LAI-VI relationships for different crop types and VIs; and (3) evaluated the universality and diversity of these LAI-VI relationships. For consistency with the general LAI research community, we have followed the Committee on Earth Observation Satellites Land Product Validation (CEOS LPV) LAI protocol throughout the data collection, analysis, and evaluation stages [2
2. Data and Methodology
2.1. LAI data Collection and Quality Control
We assembled a global dataset of in situ crop LAI measurements from a number of sources, including regional flux networks, research campaigns, peer-reviewed journals, sample data in crop models, and investigators who collected the LAI data (Table S1
). To be included in our dataset, a set of LAI measurements had to have: (1) accurate geographical location; (2) information on date of measurement, crop type, and method of measurement; (3) cloud-free Landsat images available at the measurement location, within 15 days of the measurement time; (4) ancillary information about the experimental design.
We then conducted a thorough data quality control, using four rules to identify and eliminate data with potential quality issues:
Rule 1: LAI values less than 0.1 m2
or greater than 6 m2
are beyond the prediction power of VIs and were thus eliminated (details see Section 3.2.4
Rule 2: At each site from which in situ data were obtained, we examined the local LAI-VI relationships and used auxiliary data to identify potentially low quality data with respect to remote sensing applications. Our study was based on the widely supported assumption that a statistically significant LAI-VI relationship will exist at a given site; the lack of a significant relationship, therefore, indicates potential in situ or satellite data quality issues. In addition, since our LAI definition is only restricted to green leaves, we were careful in checking the phenological stage when the LAI was measured in each site, and removed LAI measured in senescence stage when leaves were not green. When there is no specific information on the phenological stage, we checked the time series of both LAI and VIs to make sure that all observations stopped at or shortly after peak growth period.
Rule 3: Any crop type with a sample size less than 1% of the overall dataset was eliminated.
Rule 4: After Rules 1–3 were checked, we applied a binned interquartile range (IQR) approach to eliminate additional outliers. First, we grouped LAI into 0.5 m2/m2 bins. Within each bin, the LAI data were ranked according to corresponding NDVI values from the lowest to the highest, and the median, 25% quartile (Q1), and 75% quartile (Q3) were computed. Then outliers were defined as values below Q1 – 1.5IQR or above Q3 + 1.5IQR.
Since LAI beyond 6 m2
is not uncommon for crops like maize, wheat, and rice [36
], we produced another version of dataset with slightly different quality control measures, where rule 1 was replaced by an IQR approach over LAI. This version is thereafter referred to as the full-range dataset. We built LAI-VI relationships for both datasets separately.
2.2. Remotely Sensed Data
We used both Landsat TM and ETM+ images to generate VIs for the in situ LAI observations. Both sensors share the same band designations and spatial/temporal resolutions, thus we did not address between-sensor variability [73
]. We selected the closest-in-time Landsat image within 15 days from each LAI measurement date. Each image was subjected to three levels of radiometric/atmospheric corrections: (1) at-sensor radiance; (2) Top of Atmosphere (TOA) reflectance, and (3) surface reflectance. These corrections were made using the Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) software [74
]. LEDAPS first converts digital number (DN) values to at-sensor radiance, which is then converted to TOA reflectance based on solar zenith angle, Earth-Sun distance, bandpass, and solar irradiance. Finally, atmospheric correction routines based on 6S radiative transfer algorithm [76
] convert at-sensor radiance to surface reflectance. Clouds were masked using Automatic Cloud-cover Assessment (ACCA) algorithm [75
], which is a part of the LEDAPS processing package.
We selected five VIs that are commonly employed in LAI related studies: the Simple Ratio (SR) [48
], Normalized Difference Vegetation Index (NDVI) [49
], Enhanced Vegetation Index (EVI) [44
], EVI2 [57
], and Green Chlorophyll Index (CIGreen
] (Table 1
). SR and NDVI were selected as two of the earliest and simplest VIs, widely used in remote sensing applications. EVI is representative of many soil-line based VIs, such as SAVI and ARVI. Compared to NDVI, EVI is less sensitive to soil background and atmospheric noise, and less saturated at high LAI values. EVI2 is a version of EVI that does not require the blue band to facilitate the use of data from sensors without that capability [56
]. Recent studies demonstrated that EVI2 and EVI perform comparably in LAI estimation at local scales [58
]. Therefore, we aimed to evaluate EVI2 over multiple locations at large scales using our in situ global dataset. CIGreen
was originally designed to exploit the relationship of canopy chlorophyll content and visible green reflectance [78
]. It has been shown to outperform many other VIs for predicting LAI at field scales [57
], but has not yet been tested at a global scale. These VIs have been proved to be effective in estimating crop LAI in many previous investigations (Table S2
2.3. Establishment of the Global LAI-VI Relationships
2.3.1. Exploratory Analysis: Symbolic Regression
In the exploratory analysis, we used symbolic regression to establish LAI-VI relationships for each VI (derived from DN, at-sensor radiance, TOA reflectance or surface reflectance) and each crop type or group of crops: maize, soybean, wheat, rice, cotton, pasture, row crops (all except pasture), and all crops. Symbolic regression is a semi-supervised method that searches a space of mathematical expressions to find the simplest relationship that minimizes estimation errors. Unlike traditional regression methods, symbolic regression does not require the mathematical form of the relationship to be defined.
In this study, the symbolic regression of the LAI-VI relationships was performed through the Eureqa®
identifies the optimal functions based on samples of dependent and independent variables, a set of operators (i.e., addition, subtraction, exponential, power, sine, cosine), and an error metric. We used LAI as dependent variable and VI as independent variable, and defined an operator set consisting of addition, subtraction, multiplication, exponential, logarithmic, and power. For simplicity, we excluded division, and selected invertible functions with only one term (i.e., VI) and a maximum of three coefficients. We used mean squared error (MSE) as the error metric, which Eureqa®
aimed to minimize during the search. After Eureqa®
obtained the functional forms of the relationships, regression coefficients were estimated using Least Absolute Deviation (LAD) regression [84
]. LAD regression minimizes the sum of absolute errors and provides a robust estimation more resistant to outliers [85
]. The relationships established through symbolic regression and LAD regression are thereafter referred to as the best-fit functions. Each function is restricted to a reasonable VI range, which only produces LAI between 0 and 6 m2
to avoid extrapolation.
The best-fit functions were evaluated using three goodness-of-fit (GOF) metrics: R2
, root mean squared error (RMSE), and mean absolute error (MAE). GOF metrics were calculated via a split-sample cross validation method which used 75% of the samples for regression and the remaining 25% for testing. We reported the mean values of GOF metrics after 500 iterates of the cross validation. Since most of the best-fit functions are non-linear, R2
(calculated as the regression sums of squares divided by total sums of squares) was not used to compare models but rather to describe the percentage of the total variance of LAI explained by the LAI-VI relationships [87
]. In addition, we also produced the median and quantiles of absolute residuals and their distributions along the LAI range as additional model evaluation statistics following the CEOS LPV LAI protocol [2
2.3.2. Refined Models of LAI-EVI and LAI-EVI2 relationships
In order to account for the measurement errors in both LAI and VI data, and solve the issue of non-constant residual variance found in many of the best-fit functions, we adopted a rigorous statistical method to construct refined models for LAI-EVI and LAI-EVI2 relationships, as EVI and EVI2 were more effective than other VIs (see Section 3.2.2
). This method was based on simple linear regression and Theil-Sen estimator after transformations of dependent and/or independent variables, as recommended to the remote sensing community by Fernandes and Leblanc [88
We first applied power transformations over LAI and/or EVI/EVI2 to eliminate non-linearity, non-normality of the error terms, and non-constancy of the error variance. Selection of optimal transformation forms were based on Box-Cox model of power transformations on the response variable (i.e., LAI), and Box-Tidewell model for power-transformations on the predictor variable (i.e., EVI, EVI2) [89
]. A score test (Cook-Weisberg test) for non-constant error variance was also used to ensure homoscedasticity in the selected transformations and models [89
We then used Theil-Sen estimator to estimate coefficients in the simple linear regressions. Theil-Sen estimator is a traditional robust regression tool, which estimates the slope of the regression line by choosing the median slope of lines through all pairs of sample data points [91
]. It is an unbiased estimator of the real regression slope, and is robust to up to 29% of samples being outlier [91
]. The refined models for LAI-EVI and LAI-EVI2 relationships (based on only surface reflectance data) were used in the following evaluations and analysis.
2.4. Evaluation of the Global LAI-VI Relationships
The temporal mismatch between LAI measurement and satellite overpass and the different methods used in LAI measurement are two potential error sources in the LAI and VI data respectively. To assess the effects of these two potential measurement errors on the LAI-VI relationships, we analyzed the regression residuals of the overall LAI-EVI relationship (including all crop types) and conducted an ANOVA test with Welch’s correction on non-homogeneity of variances. The pairwise comparisons were accomplished using Dunnett's Modified Tukey-Kramer pairwise multiple comparison test (α = 0.05), which is suitable for unequal sample sizes and has no assumption of equal population variances.
Since we did not have an independent testing set with globally distributed samples, we adopted a site-based evaluation procedure to evaluate the validity of the approach to build global LAI-VI relationships, and assess the universality of these relationships. In this analysis, we used four crop types with large sample size: maize, soybean, wheat, and pasture. For each crop type, we used data only from sites with at least 10 samples. Then for each crop and each site, we first fitted a model using the method described in Section 2.3.2
and data from all other sites, and then calculated GOF metrics of the model using data for the site of interest. We compared coefficients and GOF metrics of models constructed using different sets of data. This analysis reveals the universality of the LAI-VI relationships as well as the leverage each individual site has over the global relationships.
2.5. Preliminary Validation and Example Applications at Three Spatial Scales
We applied the LAI-VI relationships to remote sensing data at three different spatial resolutions/scales. This served as a preliminary validation effort and as examples of potential applications of our LAI-VI relationships for readers who may be interested in applying the relationships in their own studies. Note that in this analysis, the reference LAI data, albeit modeled in nature, were treated as reliable sources of LAI estimates with credible scientific basis as opposed to in situ measurements that would provide direct evidence of the error and bias in our estimates.
2.5.1. Field Scale Application
We measured LAI of maize canopy weekly and obtained high spatial resolution imagery (0.8 m) from an airborne sensor in two maize fields located in northwest of Deforest Wisconsin (43.27° N, 89.40° W), US. This site and LAI monitoring efforts are described in detail in [63
During the experiment, multispectral images were collected using a 6-sensor Tetracam Multi Camera Array (MCA) system (Tetracam Inc., Chatsworth, CA, USA) mounted on the underside of a Cessna 3-passenger airplane. MCA sensors were centered at 450, 570, 620, 650, 670, and 860 nm with a uniformly-averaged 10 nm band width. Images were collected on four dates during the 2012 growing season (5/25, 6/22, 7/30, 8/29) and eight dates during the 2013 growing season (6/4, 7/2, 7/24, 8/1, 8/13, 8/20, 9/5, 9/23) from ~1200 m (~4000 ft) above the ground surface, leading to a ground instantaneous field of view of (GIFOV) of ~0.8 m. Each sensor produced a separate image; individual images were co-registered using Tetracam’s Pixelwrench software and georeferenced in ArcGIS 10 (ESRI, Redlands, CA, USA). To convert MCA images from DN to surface reflectance, we used four control points at the study site which were present in each image: surface water, tarmac road, concrete parking area, and healthy green grass. At least nine spectral reflectance measurements at 1 nm resolution were collected for each of these control points using an ASD handheld spectrometer (Analytical Spectral Devices, Inc., Boulder, CO, USA). These measurements were then averaged at 10 nm wavelengths intervals corresponding to each of the six MCA sensors. Spectrometer-derived mean surface reflectance was linearly regressed to MCA-derived DN values to produce a DN-surface reflectance relationship for each image. Two images (6/22/2012 and 6/4/2013) were discarded due to poor fits between MCA and spectrometer data (R2 < 0.60). The retained images had a mean R2 of 0.72. The relationships developed using linear regression were applied to convert the image DN values to surface reflectance, which were then used to compute VIs.
The LAI measurements were made using a Li-Cor LAI-2200 (Li-Cor Biosciences, Lincoln, NE, USA) plant canopy analyzer at approximately weekly intervals throughout the 2012 and 2013 growing seasons. LAI measurements were taken as the average of 20 below- and 20 above-canopy readings, and were collected under diffuse light conditions (sunrise, sunset, or full cloud cover). LAI measurements were collected the same day as MCA image collection when possible; when LAI and MCA images were not collected on the same day, LAI values were linearly interpolated between measurement dates to estimate the LAI at the time of image collection.
Finally, local relationships between field measured LAI and each VI were established following the same processes described in Section 2.3.2
, and used to produce LAI maps for comparison with maps produced using the global LAI-VI relationships.
2.5.2. Local Scale Application
The second validation process was conducted using Landsat imagery acquired in the Central Valley of California. The study area features one Landsat footprint, which has a heterogeneous agricultural landscape with various crop types. The reference dataset was a LAI map obtained from the Provisional Landsat LAI Products developed by the NASA Earth Exchange (NEX) program [39
]. It was produced from a Landsat ETM+ image acquired in July 2005 using the radiative transfer algorithm adapted from the MODIS LAI product. We produced a LAI map using the global LAI-VI relationship and the same Landsat ETM+ surface reflectance image in the NEX product. We chose to use the global overall relationships only (i.e., not crop-specific) as no crop-type map was available in this area.
2.5.3. Regional Scale Application
The third application was implemented at 1 km resolution using MODIS data within the northwestern corner of the state of Iowa (USA), a region spanning 15 counties. This area is primarily comprised of maize and soybean fields. A crop map of this region for 2009 was extracted from the Crop Data Layer (CDL), a crop type dataset derived from AWiFS data at 56m spatial resolution [94
]. To be consistent with the MODIS images, the CDL map was aggregated to 1 km resolution using a square-wave filter, and the resulting map shows maize and soybean cultivated area fractions for each 1 km pixel.
We used the MODIS Collection 5 Nadir BRDF-adjusted reflectance (NBAR) product (MCD43A4) [95
] to produce the LAI maps using the global LAI-VI relationships. The NBAR data were produced as 8-day composite at 500m spatial resolution, but were aggregated to 1km resolution and a 16-day interval before LAI processing. Based on MODIS reflectance and the crop map, two sets of LAI maps were produced: one based on the global overall LAI-VI relationship (i.e., not crop-specific), and the other based on global LAI-VI relationships for maize and soybean. In the latter maps, LAI was computed as a weighted average of maize and soybean LAI based on the fractions determined from the crop map.
These maps were then compared to the reprocessed MODIS Collection 5 LAI products by the Beijing Normal University Land-Atmosphere Interaction Research Group [96
]. This LAI product is an improved version of the original MODIS LAI product [97
], which overcomes the issues of data noise and gaps by applying a modified temporal/spatial filter. It has a 1km spatial and 8-day temporal resolution, but was aggregated to 16-day. Besides comparing the LAI maps, we also constructed and compared growing season LAI time-series for the two dominant crops in Iowa—maize and soybean—using both global LAI-VI relationships and MODIS BNU LAI products. Since MODIS has a coarse resolution of 1 km, which is larger than most of the soybean and maize fields, the LAI time series used average values of only the pure maize or soybean pixels, which are defined as pixels with more than 90% areas occupied by each crop.
In this study, we developed a dataset containing spatiotemporally explicit in situ crop LAI measurements gathered worldwide to assess the global universality of LAI-VI relationships. In the exploratory analysis, we built best-fit functions between LAI observations and five vegetation indices (SR, NDVI, EVI, EVI2, and CIGreen) generated from Landsat data to depict global LAI-VI relationships for a number of crop types. Results reveal that the global LAI-VI relationships explain more than half of the variance in field-measured LAI using only remotely sensed observations. The LAI-VI relationships are crop specific and are most effective using EVI or EVI2 from surface reflectance. To account for measurement errors from both LAI and VIs, we further refined the EVI and EVI2 models using power transformations and Theil-Sen estimator and the final models have RMSE mostly below 1.0 m2/m2. We provided three examples that applied the global LAI-EVI/EVI2 relationships to local to regional spatial scales, and found them to be effective in generating LAI maps. Based on the preliminary validation and site-based evaluation, we found that the LAI-VI relationships we built possess global university, with random errors reflecting the diverse nature of agro-ecosystem landscapes.
The major contributions of this work include synthesizing a large number of in situ LAI observations and vegetation indices from various locations and developing a set of globally applicable statistical relationships. The simplicity of generating VI using remotely sensed images and applying simple statistical relationships adds to the practical value of this research, especially when essential variables needed for process-based methods are rarely present and hard to measure [27
]. Moreover, to the best of our knowledge, the work presented here is the first to compile a large global dataset of crop LAI and VIs and analyze the universality and diversity of the LAI-VI relationships globally. These findings not only support the CEOS Land Product Validation framework for the validation of remote sensing LAI products but also contribute to a larger community of users that are interested in producing LAI maps from remote sensing but do not have access to measured LAI data [57
]. Moreover, as our analysis was based on Landsat images with 30 meter spatial resolution, the global LAI-VI relationships support production of a large scale fine resolution LAI map which is essential to agricultural applications, especially in regions where crop fields are relatively small. The ability to produce LAI maps at this level provides potentials to assess additional biophysical variables and processes including biomass, primary production (NPP), evapotranspiration, and crop yields, at individual plot/field level, which is more suitable for decision making than aggregated values over a large area. The easy accessibility, low cost, and the long historical coverage and continuity of Landsat mission also render our findings useful to scientific, governmental, and commercial applications. Finally, the analysis and findings here only apply to broadband VIs. As more and more medium to high resolution sensors become available with additional narrow spectral bands, i.e., the red edge band, there will be great opportunities of establishing efficient models for global LAI estimation with various hyperspectral VIs [57