MODIS Leaf Area Index (LAI) Estimation: A Statistical Perspective

Understanding the impact of vegetation mixture and misclassification on leaf area index (LAI) estimation is crucial for algorithm development and the application community. Using the MODIS standard land cover and LAI products, global LAI climatologies and statistics were obtained for both pure and mixed pixels to evaluate the effects of biome mixture on LAI estimation. Misclassification between crops and shrubs does not generally translate into large LAI errors (<0.37 or 27.0%), partly due to their relatively lower LAI values. Biome misclassification generally leads to an LAI overestimation for savanna, but an underestimation for forests. The largest errors caused by misclassification are also found for savanna (0.51), followed by evergreen needleleaf forests (0.44) and broadleaf forests (~0.31). Comparison with MODIS uncertainty indicators show that biome misclassification is a major factor contributing to LAI uncertainties for savanna, while for forests, the main uncertainties may be introduced by algorithm deficits, especially in summer. The LAI climatologies for pure pixels are recommended for land surface modeling studies. Future studies should focus on improving the biome classification for savanna systems and refinement of the retrieval algorithms for forest biomes.


Introduction
Leaf area index (LAI) quantifies the amount of live green leaves in the canopy per unit of ground surface.It is an important parameter in various vegetation ecosystem and land surface process models [1].Global LAI products have been operationally provided through several satellite remote sensing projects, such as MODIS [2,3], CYCLOPES [4] and GLOBCARBON [5,6].Understanding the uncertainties of these products is critical in order to assimilate LAI into the ecosystem and land surface models [1,7].Product uncertainty information can be categorized into two types, theoretical and physical [8].Theoretical uncertainties are caused by uncertainties in the input data and model imperfections and are usually estimated and reported in the quantitative quality indicators (QQIs) [4,9,10].Physical uncertainties are derived through comparison with values representing the ground truth-field measurements or estimations from higher-resolution imagery.In practice, both uncertainties are used as complementary indicators of product quality [1].
Land cover information is commonly used to parameterize canopy radiative transfer models or models that require land cover stratification, such as MODIS [3], GLOBCARBON [5] and ECOCLIMAP [17].Errors in classifying land cover type may thus propagate into LAI uncertainties during the parameter retrieval process.Given the modest accuracy of current global land cover products (overall accuracy 60%-80%) [18,19], understanding the impact of land cover misclassification on LAI estimation is fundamental to improving LAI retrieval from remote sensing imagery.
Previous studies of the effect of land cover misclassification on LAI estimation have adopted either a deterministic or a statistical approach.The deterministic approach simulates the physical radiative transfer processes within vegetation canopies that depend on land cover type.The relationship between misclassification and LAI uncertainty can be explored using numerical experiments and trial-and-error methods [20].Depending on the LAI estimation methods used, biome misclassification may lead to incorrect selection of look-up tables, inappropriate radiative transfer models and/or estimation algorithms [16,20,21].The statistical approach directly explores the relationship between the input biome classification map and the resulting LAI uncertainty, which avoids the usually complex radiative transfer simulation and parameter retrieval processes.The effects of biome misclassification can be evaluated through the retrieval index, mean LAI and the histogram of the retrieved LAI distribution [9,22].Despite its simplicity, few studies have been conducted systematically using this approach on a global scale.
In this context, the focus of this paper is to explore the effect of biome misclassification on MODIS LAI estimation using a noval statistical approach.The MODIS products contain both collocated land cover and LAI products and were thus explored in this study.A confusion matrix was constructed to investigate the correspondence between biome misclassification and LAI uncertainties.Global LAI climatologies were compared for pure and mixed pixels to explore the impact of different biome misclassification on LAI errors.

MODIS LAI and Land Cover Products
The 8-day synergistic LAI (MCD15 C5, 1 km) and land cover (MCD12Q1 C5, 500 m) combined products from the TERRA and AQUA platforms for 2003-2009 were available from the NASA WIST (Warehouse Inventory Search Tool) website (WIST, Available online: http://wist.echo.nasa.gov(accessed on 1 March 2012)).The MODIS LAI retrieval algorithm is based on the inversion of a 3D radiative transfer model that simulates radiation absorption and scattering in vegetation canopies [2,23,24].The main algorithm employs a look-up table (LUT) method that searches for LAIs for specific solar and view zenith angles, observed bidirectional reflectance factors (BRFs) at certain spectral bands and biome type.The output is the LAI mean value averaged over all acceptable solutions.The standard deviation (LaiStdDev) serves as a measure of accuracy and is provided as a product QQI.The current Collection 5 LAI products use eight biome types as a priori information to constrain the vegetation structural and optical parameter space: (1) grasses and cereal crops, (2) shrubs, (3) broadleaf crops, (4) savanna, (5) deciduous broadleaf forest (DBF), ( 6) evergreen broadleaf forest (EBF), (7) deciduous needleleaf forest (DNF) and ( 8) evergreen needleleaf forest (ENF) [13].
MODIS global land cover products are generated utilizing a supervised classification method that exploits a global training database obtained from high-resolution imagery in association with ancillary data [18].The classification algorithm requires spectral and temporal information from MODIS nadir BRDF-adjusted reflectance (NBAR) bands 1−7, supplemented by the enhanced vegetation index (EVI), and the MODIS land surface temperature (LST) is required to obtain the land cover types and phenology information.Several classification schemes are adopted, principally, the International Geosphere Biosphere Program (IGBP) (Type 1), the University of Maryland (UMD) (Type 2) and LAI/FAPR biomes (Type 3).In the IGBP classification scheme, in addition to the primary types, an alternative secondary type was assigned for each pixel when the confidence in the primary type is not high [25].Pixels with no secondary label were considered unequivocal with high confidence [25].The paired primary and secondary land cover types make it possible to study the potential LAI uncertainty caused by biome misclassification.In this study, 'misclassification' represents both subpixel mixture and biome misclassification.

Data Analysis
The uncertainty of LAI retrieval resulting from biome misclassification was analyzed on the basis of the primary and secondary land cover types.For ease of analysis, the Type 3 MODIS LAI/Fraction of Absorbed Photosynthetically Active Radiation (FPAR) eight biome types were used in the confusion matrix and subsequent analysis.Since the secondary types are only provided in the Type 1 IGBP scheme, they were converted to the Type 3 MODIS LAI/FPAR biome types in the confusion matrix and subsequent analysis [18].Figures 1 and 2 show the global primary and secondary land cover types, respectively, in the Type 3 classification system in 2003.The secondary biome type indicates a potential vegetation mixture or a misclassification contributing to LAI uncertainties.For the secondary biome types, land cover types included in the Type 1, but with no equivalent Type 3 class, were not included in the analysis in order to minimize the discrepancies between different classification schemes.These types include the mixed forest, permanent wetlands and cropland/natural vegetation mosaic (white in Figure 2).Pixels with high confidence (pink in Figure 2) and of identical primary and secondary type were regarded as representing 'pure' biome types with minimal biome mixing or misclassification.Otherwise, the pixels were regarded as 'mixed' or 'misclassified'-both terms are used interchangeably in the text.Assuming that the secondary type mainly indicated biome misclassification, a confusion matrix was constructed to investigate the impact of such misclassification on LAI uncertainties.We consider the primary biome type as the actual class and the secondary type as the predicted class in the confusion matrix.LAI uncertainty was calculated as the differences between the predicted and actual vegetation LAIs (Equation ( 1)): LAI Uncertainty = LAI mixed − LAI pure (1) where LAI mixed and LAI pure are the average LAI values for the mixed (predicted) and pure (actual) pixels, respectively.LAI uncertainties were then examined by comparing their climatologies for all biomes.The misclassification induced errors (MIEs) were further compared with the theoretical uncertainties reported in MODIS LAI QQIs.Our goals to calculate the global LAI climatology were to obtain an overview of the LAI uncertainties and their performances against the MODIS LAI QQIs.
Only LAI values retrieved from the main algorithm and 'good quality' land cover data based on the quality assessment layer were analyzed in this study.Considering the yearly land cover variation, the collocated LAI and land cover data from 2003 to 2009 were used for the calculation.

Misclassification between Different Biome Types
Table 1 shows the statistics of the primary and secondary biome types.Pure pixels are located on the diagonal cells, and all others are considered mixed pixels.The last row shows the percentage of pure pixels over the globe.The table reveals that secondary biome types are common and correspond reasonably to the primary types.The total percentage of high confidence (HC) pixels is only 0.92% over the globe.Overall, only 28.74% are pure pixels, leaving most (71.26%) as mixed pixels.Grasses, crops, shrubs and savannas are easily confused with each other, but unlikely to be misclassified as forests.For example, grasses/cereal crops (35.69%) and broadleaf crops (36.94%) are likely to be confused with savannas, but only 6.15% of savannas are mistaken for deciduous broadleaf forest.Because of the mixed composition of trees and grasses, 49.37% of savannas, higher than all other biome types, are classified as pure pixels.
Of the forest biomes in Table 1, deciduous broadleaf forest contains the highest percentage of pure pixels (26.72%).However, all forest types are often confused with one another.For example, evergreen broadleaf forest is easily mistaken for deciduous broadleaf forest (10.64%) and vice versa (8.96%), and evergreen needleleaf forest is often mistaken for deciduous needleleaf forest (16.21%).Deciduous broadleaf forests are very likely to be misclassified as deciduous needleleaf forests (16.69%), and deciduous needleleaf forests are also likely to be misclassified as deciduous broadleaf forests (12.21%).Forest biomes are likely to be misclassified as shrubs (>21%) or savanna (>16%) (Table 1).It is rarely possible to obtain pure pixels for evergreen broadleaf forest (3.87%), because of their confusion with shrubs (25.09%) and savanna (31.72%).Deciduous needleleaf forests are most likely to be misclassified as shrubs (33.84%) and savanna (30.96%).
Table 1.MODIS primary biome types and the percentage of each collocated secondary biome types from 2003-2009.The last row shows the percentage of pure pixels (diagonal) over the globe.The HC (high confidence) column shows the percentage of high confidence pixels for each primary biome type.The last column shows the percentage of each primary biome type over the globe.EBF, DBF, ENF and DNF stand for evergreen broadleaf forest, deciduous broadleaf forest, evergreen needleleaf forest and deciduous needleleaf forest, respectively.

Misclassification Induced LAI Errors (MIEs)
Table 2 shows the LAI mean values for both the pure and mixed pixels and the variability in LAI values induced by potential vegetation mixture and misclassification.The table provides an insight into the extent to which the confusion between two biome types affects LAI retrievals.The impact of biome misclassification varies between biome types.The bias due to the misclassification of grasses/cereal crops ranges from −0.29 (−51.8%) for shrubs to 0.14 (25.0%) for broadleaf crops.The misclassification of shrubs as savanna leads to an overestimation of up to 60%.By contrast, the misclassification of broadleaf crops as shrubs underestimates the LAI by 0.37 (−27.0%).Misclassification of savannas as any of the herbaceous types overestimates the LAI (>0.57), with the greatest bias being 0.84 (100.0%) for broadleaf crops.
Forests, especially evergreen forests, are easily misclassified as shrubs or savannas.Significant underestimation of LAI is observed when evergreen broadleaf forest (0.67, −16.1%) or evergreen needleleaf forest (0.62, −31.2%) is mixed with shrubs.The mixture with savanna leads to the largest underestimation (0.87, −43.7%) for evergreen needleleaf forest.The large errors related to shrubs and savanna for the evergreen forests could be due to the selection of wrong LUTs [26].Evergreen broadleaf forest can be confused with deciduous broadleaf forest, which causes an underestimation of LAI up to 0.41 (−9.8%).However, misidentifying deciduous broadleaf forest as evergreen broadleaf forest has nearly no effect on LAI values (0.03, −1.4%).Overall, confusion between broadleaf forest and needleleaf forest generally leads to an LAI underestimation.Confusion between evergreen and deciduous needleleaf forests causes an underestimation of about 0.24 (−12.1%).When evergreen needleleaf forest is misclassified as broadleaf forest, the underestimation is up to 0.23 (−11.6%).Misclassification of deciduous needleleaf forest as broadleaf forest slightly affects LAI values (0.07, −4.2%).

LAI Climatologies for Pure and Mixed Pixels
Figure 3 illustrates the global LAI climatologies for both pure and mixed pixels in different months.For each primary biome type, the climatologies show similar temporal profiles and seasonal variations.Seasonally, except for savanna, the deviations between pure and mixed LAI values are generally higher in summer than in winter.For grasses/cereal crops and broadleaf crops, misclassification of these two biomes as any other types produces LAI deviations less than 0.70.Similarly, the overestimation of LAI for shrubs is also small (~0.19).The greatest effect of subpixel mixture on LAI estimation is for savanna, for which the mixed LAI values are consistently higher than those of the pure pixels, ranging from about 0.41 (37.3%) when misclassified as grasses/cereal crops in August to

Comparison of MIEs and MODIS LAI Uncertainty Indicators
Table 3 illustrates the monthly average of MIEs calculated from the pure and mixed profiles in Figure 3 with Equation (1).The last column shows the average of the absolute errors from Equation (1).Biome misclassification generally leads to an LAI overestimation for savanna (0.51), but an underestimation for forests (0.08−0.44).Misclassification of woody biomes (≥0.19) generally causes higher errors than those for herbaceous types (≤0.16).The largest errors caused by misclassification are found for savanna (0.51), followed by evergreen needleleaf forest (0.44) and broadleaf forests (~0.31).To acquire a better understanding of both quality indicators, the uncertainties induced by misclassification are compared with those reported in QQIs (Figure 4).QQIs mainly reflect uncertainties induced by the retrieval algorithms and the input reflectance, because the retrieval algorithm is applied for each biome type separately.In this figure, the misclassification induced errors (MIEs) are represented with the absolute differences between the pure and mixed profiles in Figure 3. MIEs and QQIs are generally small (<0.20) and consistent for grasses/cereal crops, shrubs and broadleaf crops.This signifies that biome misclassification and algorithm limitations contribute evenly to LAI uncertainties for these biome types.For savanna, the MIEs (0.51) are significantly higher than those of QQIs (0.20).This reveals that biome misclassification is a major factor contributing to LAI uncertainties for savanna.This contrasts with earlier analyses with a deterministic approach claiming that MIEs do not exceed uncertainties in the model, e.g., [21].The discrepancies are partly due to the different sensor (MISR) and study area (Africa) explored in [21].For forest biomes, MIEs are generally smaller than QQIs for most of the season, indicating the robustness of the MODIS LAI algorithm to forest misclassification.Slightly higher MIEs can only be observed in winter when the LAI values and, hence, QQIs are usually lower.For needleleaf forests, the lower MIEs (<0.30) in comparison to QQIs (0.75~0.85) in July indicate that, other than biome misclassification, the uncertainties may be largely introduced by algorithm deficits in summer.  .on between these two biomes does not cause large errors (<0.30), which is consistent with deterministic findings from earlier LAI collections [21,30,31].
The primary classification confusion is related to savanna and forest.The ambiguity between savanna and forest properties has been reported in other studies [18,32].The main source of classification error has been attributed to the continuum of fractional cover and canopy structure [30,32].These issues, as shown in the present study, result in an overestimation of LAI values consistently exceeding 0.37 for savanna.Misclassification of forest as any other biome type, however, generates an underestimation in LAI retrievals, especially for evergreen needleleaf forest (Table 3).Better characterization of the savanna biome [33] and refinements of the LAI algorithms for woody vegetation [34] may help address some of these concerns.
While other forest biomes have shown to be relatively robust to land cover uncertainties, evergreen needleleaf forest LAI exhibits the highest sensitivity to biome misclassification.This is related to the similarity of the top-of-canopy reflectances of needleleaf forests with a low ground cover and a bright (green) understory [21].Our earlier local studies have shown that misclassification of needleleaf forest leads to an underestimation of LAI, whereas misclassification of broadleaf forest leads to an overestimation [20].These local trends were confirmed for needleleaf forests globally in the present study.The discrepancies for broadleaf forests are attributable to differences in the Type 1 and Type 3 MODIS classification systems and the small study area in [20].

Future Prospects
LAI has long been used by the modeling community in land surface modeling studies [15,35,36].Land surface process models use climatology-based LAI values and uncertainties as input.It should be noted that, in most land surfaces models, LAI is defined for pure pixels or a mosaic of pure vegetations [37,38].Direct calculation of remote sensing LAI climatology based on the MODIS LAI biome types obtains the climatology of mixed biome types.The difference between pure and mixed climatologies may cause land surface models to produce incorrect simulations, especially for savanna and forest biomes (Figure 3).Improved model parameterization will need to consider the biome mixture and assign more realistic climatology values.Table 4 shows the global LAI climatologies derived from the pure pixels (2003−2009).The climatologies for pure pixels are useful for land surface models that treat pixels as a mosaic of pure vegetation types, e.g., [37].The prescribed LAI values in such land surface models may be updated with the LAI climatologies derived here.
In this study, pixels with the same primary and secondary biome types were treated as pure pixels; others were considered to be a mixture of primary and secondary biome types.The results of this study thus depend on the proportion or percentage of the two types in the mix.Confidence in our results will be enhanced when data for the proportion of the secondary biome becomes available.Information about the fractional vegetation cover (FVC) can be explored to study the impact on LAI retrievals [31].The effects of FVC will be quantified when the full MODIS vegetation continuous field (VCF) products (MOD44B), including the percentage of trees, grasses and bare ground, are refined (MODIS Land.Available online: http://modis-land.gsfc.nasa.gov/vcc.html(accessed on 1 February 2013)).It is expected that the increasing quality of the MODIS land cover products will lead to improved LAI retrieval in the future.It is important to note that the sources of uncertainties, such as biome misclassification and retrieval algorithms, are interrelated.Rigorous quantification of the physical relationship between these factors and the influence of misclassification on LAI estimation can be made with a deterministic approach through the retrieval algorithms.

Conclusions
Through an analysis of global MODIS products, this study has quantified the LAI discrepancies induced by potential subpixel mixture and biome misclassification.The statistics show that 28.74% of LAI products are for pure pixels and the other 71.26% are retrieved as mixed biome types.When misclassification between distinct biome types occurs, it does not generally translate into strong disagreement in LAI retrievals.Misclassification between herbaceous types has minimal impact on LAI retrievals (<0.37 or 27.0%).Biome misclassification generally leads to an LAI overestimation for savanna, but an underestimation for forests.The largest errors caused by misclassification are found for savanna (0.51), followed by evergreen needleleaf forest (0.44) and broadleaf forests (~0.31).Biome misclassification is a major factor contributing to LAI uncertainties for savanna, while for forests, the main source of uncertainties may be due to algorithm deficits, especially in summer.To reduce the LAI uncertainties, further efforts should therefore be focused on improving the biome classification for the structurally complex savanna systems and refinement of the retrieval algorithms for forest biomes.

Figure 1 .
Figure 1.Global distribution of the primary biome types based on the MODIS Leaf Area Index (LAI)/Fraction of Absorbed Photosynthetically Active Radiation (FPAR) (Type 3) classification system.Data from the MODIS (MCD12Q1 C5) land cover product in 2003 (1 km).

Figure 2 .
Figure 2. Global distribution of the secondary biome types based on the MODIS LAI/FPAR (Type 3) classification system.Data converted from the MODIS (MCD12Q1 C5) International Geosphere Biosphere Program (IGBP) (Type 1) classification system (1 km, 2003).Pink pixels show the primary biome types with high confidence.White areas are IGBP classes (e.g., mixed forest) with no equivalent MODIS LAI/FPAR classes.
Figure errors quanti 2003 t

Table 2 .
(a) Confusion matrix for LAI mean values for pure (bold) and mixed pixels and (b) the relative LAI errors induced by biome misclassification.Statistics based on data from 2003−2009.Mean values for mixed pixels are significantly different from those of the pure pixels (t-test, p < 0.001).Relative errors are calculated relative to the diagonal pure values.The cells in brackets indicate a smaller number of pixels (<5%) from Table1.

Table 3 .
Monthly average of the misclassification induced LAI errors (MIEs) for the eight primary biome types(2003−2009).The mean absolute errors (MAEs) are calculated from the average of the absolute monthly errors.