A Model to Estimate Leaf Area Index in Loblolly Pine Plantations Using Landsat 5 and 7 Images

: Vegetation indices calculated from remotely sensed satellite imagery are commonly used within empirically derived models to estimate leaf area index in loblolly pine plantations in the southeastern United States. The data used to parameterize the models typically come with observation errors, resulting in biased parameters. The objective of this study was to quantify and reduce the effects of observation errors on a leaf area index (LAI) estimation model using imagery from Landsat 5 TM and 7 ETM+ and over 1500 multitemporal measurements from a Li-Cor 2000 Plant Canopy Analyzer. Study data comes from a 16 quarter 1 ha plot with 1667 trees per hectare (2 m × 3 m spacing) fertilization and irrigation research site with re-measurements taken between 1992 and 2004. Using error-in-variable methods, we evaluated multiple vegetation indices, calculated errors associated with their observations, and corrected for them in the modeling process. We found that the normalized difference moisture index provided the best correlation with below canopy LAI measurements (76.4%). A nonlinear model that accounts for the nutritional status of the stand was found to provide the best estimates of LAI, with a root mean square error of 0.418. The analysis in this research provides a more extensive evaluation of common vegetation indices used to estimate LAI in loblolly pine plantations and a modeling framework that extends beyond the typical linear model. The proposed model provides a simple to use form allowing forest practitioners to evaluate LAI development and its uncertainty in historic pine plantations in a spatial and temporal context.


Introduction
Foliage biomass is an ecological parameter often used as an indicator of forest productivity in commercial forest plantations [1,2]. Its direct determination requires expensive destructive sampling methods that are difficult to measure and laborious to acquire, which is why practitioners turned to indirect biomass estimation methods that rely on light reflectance as captured by remote sensing devices for its determination [3]. The increased availability of surface reflectance products has allowed for the use of higher quality data in biophysical modeling due to the correction of atmospheric distortions, providing more consistent, and less variable data [4,5]. Over the last two decades, use of remotely sensed data has increased as products become more readily available, allowing commercial forest plantation managers to use it as part of their decision-making process [6][7][8][9]. Estimates of structure, productivity, disturbance, and phenology of Pinus taeda L. (henceforth loblolly pine) plantations are some of the applications for these data [10][11][12][13][14][15][16][17]. Nevertheless, in application, most indirect methods aim at estimating leaf area index (LAI). LAI, the onesided leaf surface area over a fixed ground area, is a biophysical parameter that has been used to understand productivity drivers in loblolly pine stands [1,[18][19][20]. LAI serves as a proxy for the processes involving intercepted photosynthetically active radiation (PAR) and atmospheric gas and moisture exchange [21][22][23]. As an indicator for stand productivity, LAI has shown a positive correlation with stem volume increment, biomass production, site index, basal area, and stand density index [18,19,[24][25][26][27]. Similarly, LAI is responsive to changes in resource availability due to silvicultural inputs like fertilization [28][29][30].
Previous models to estimate LAI from passive remotely sensed data have utilized varying spectral, spatial, and temporal resolutions of satellite imagery and different modeling techniques to estimate in situ LAI measurements [11,20,[31][32][33]. Estimation of LAI and other biophysical parameters from satellite imagery rely on the algebraic manipulation of spectral reflectance bands to form vegetation indices (VI) which are then used to model ecological processes. Vegetation indices, such as the normalized difference vegetation index (NDVI) and the simple ratio (SR), are used to separate vegetation dynamics from background processes allowing researchers to use remotely sensed data to model the underlying biological phenomena [11,[34][35][36][37]. To better understand the behavior of canopy spectral reflectance as sampled from satellite imagery, we need to start at the underlying structure of the individual leaf and the behavior of electromagnetic energy. Whether it is reflected, absorbed, or transmitted, the behavior of electromagnetic energy in a specific wavelength depends on the surface with which it interacts [38]. At the individual leaf level, physiological properties, including cell structure, concentration of pigments, air, and water within the leaf determine the reflectance behavior [39]. When scaled to the canopy level, orientation and concentration of leaves, type and concentration of background matter, and the angle of illumination affect reflectance behavior due to the potential for shadows [39]. Additionally, the ability to detect differences in reflectance is determined by sensor parameters. With these sources of potential variation in mind, measuring leaf and canopy reflectance to estimate biophysical parameters must take advantage of known spectral responses to remove noise. One common source of variation in spectral reflectance is a function of the atmospheric scattering which varies spatially and temporally [5,40]. To reduce the effects of these perturbations, surface reflectance data in satellite imagery is created from top of atmosphere reflectance using radiative transfer models and information about atmospheric variables to correct for absorption and scattering of radiation as it passes through the atmosphere [41]. Furthermore, variations in sensor and solar angles can lead to directional effects in the observed reflectance values [40]. These effects can be reduced using ansiotropic functions, such as the bidirection reflectance distribution function (BRDF), to normalize all the pixels in a image to nadir reflectance. After outside influences of radiation are corrected, VIs such as NDVI work by associating known reflectance characteristics, such as high absorption of visible light and high reflectance of near infrared, to build ratio equations that relate in situ conditions to remotely sensed estimates [34,39]. In this case, NDVI provides an estimate of the sample area chlorophyll concentration from healthy, green vegetation having high reflectance rates for near-infrared and high absorption rates for green visible light.
Other considerations for developing a prediction model based on remotely sensed data include selection of a VI that best relates in situ conditions to the dependent variable. VIs commonly suffer from saturation effects in densely vegetated stands where light in the red bands is absorbed by chlorophyll resulting in losses of sensitivity in sites with LAIs greater than 2 [42][43][44]. This can lead to poor estimates of LAI, where it typically ranges in loblolly plantations between 1 and 4 m 2 /m 2 , with a theoretical maximum of 5 m 2 /m 2 in the southeast United States [18,45]. To avoid issues with saturation, Gitelson et al. [46] recommended the use of VIs constructed without bands in the red portion of the spectrum. Other considerations for selection of a VI include the ability to differentiate changes in leaf physical properties (i.e., changes in concentration of pigments due to stress) and changes in amount of leaf area (i.e., reduction of foliage biomass and increasing amount of background reflectance) [39].
The most commonly used model for predicting loblolly pine LAI in the southeastern United States utilized a linear model framework and the simple ratio vegetation index [11]. In spite of the large coefficient of determination found for the model in Flores et al., some sources of error were omitted in model development that influence its practical application [11]. These errors include data acquisition errors in the sensors and processing errors from geometric or radiometric rectifications which lead to biased predictions [11,47,48]. Curran and Hay [49] discussed the inadequacies of simple regression for remote sensing modeling resulting from violation of the measurements without errors assumption, and when that assumption is violated, parameter estimates will be underestimated. To overcome this limitation, parametric and nonparametric techniques, including modified least squares, Thiel-Sen, and Simulation-Extrapolation (SIMEX), have been used to quantify and reduce measurement error effects [48,50].
The objective of this research was to develop an unbiased satellite-to-LAI calibration model using Landsat 5 TM and 7 ETM+ images, and to test different spectral indices in their ability to predict multitemporal Li-Cor 2000 Plant Canopy Analyzer LAI values. The novelty of this research was using error-in-variable methods to correct for the quantified measurement error associated with an observation on the model parameters. The goals were to identify the vegetation index that best correlated with LAI in loblolly pine plantations, determine the model form that best estimates LAI using the vegetation index, and to remove the effects of observation errors on model parameter estimates.

Study Site
For this study, we used leaf area index measurements taken at the Southeast Tree Research and Education Site (SETRES) study. The study was established in 1992 on a flat, excessively well drained loblolly pine plantation in Scotland County, North Carolina with an initial density of 1667 trees per hectare (2 m × 3 m spacing) ( Figure 1). A total of sixteen 50 m × 50 m treatment plots were installed to cover four replications of a 2 × 2 factorial of irrigation and fertilization [19]. The study site's 30-year annual precipitation average was 1210 mm and annual average temperature was 17 • C [19]. Complete study site details are documented in Albaugh et al. [19]. Plot LAI measurements were made using a Li-Cor 2000 Plant Canopy Analyzer and were remeasured from March 1992 to September 2004 for a total of 1673 samples for all 16 study plots [19,51]. The individual plot LAI measurements were the average of twenty LAI measurements sampled along a transect within each plot. The complete LAI sampling protocol is documented in Albaugh et al. [19]. The age range covered by the in situ LAI measurements was between 7 and 20 years old. Stand level attributes throughout the time of this study are reported in Albaugh et al. [28].

Satellite Data
Landsat 5 Thematic Mapper (TM) and Landsat 7 Enhanced Thematic Mapper Plus (ETM+) imagery were selected for this study due to the temporal overlap between in situ measurements and image dates. Landsat Level-1 Tier 1 products were used to ensure the highest radiometric and geometric calibration across sensors for time-series analysis. The Landsat Level-1 Tier 1 products were atmospherically corrected to provide surface reflectance data, correcting for atmospheric distortions and allowing for the comparison with the in situ measurements [5]. U.S. Geological Survey (USGS) Landsat 5 and 7 Surface Reflectance Level 1 Tier 1 data were queried in the Google Earth Engine from 1 January 1991 to 31 December 2005 [52]. Nadir BRDF-adjusted reflectance values were calculated using the c-factor approach to provide temporal and geographic consistency across the thirteen years of observation [40]. Data were further filtered to remove cloud and shadow anomalies based on quality assurance bands and excessive haziness based on the atmospheric opacity band. Using individual polygons for the 16 study plots, the weighted average of the within plot pixel values was calculated and returned to account for the study plots covering multiple partial pixels. Above canopy satellite observations were paired to the in situ LAI estimates by matching observation dates. To account for the below canopy in situ LAI measurements that did not have matching satellite imagery sample dates, the four most previous and the four following satellite observations were subset and fitted with a thin plate spline. The thin plate spline was weighted by the associated standard deviation of the plot level reflectance summarization. The interpolated VI estimates and associated interpolation standard errors were extracted from the thin plate spline model for the in situ observation date. Using the interpolated and matched values, multiple vegetation indices were calculated using the individual band values for the polygons (Table 1) and fit to the paired in situ LAI estimate in a simple linear model form. The determination of the best performing VI to be used in the remainder of the methods occurred following the evaluation of all VIs in this step. VIs were evaluated on their correlation with LAI and the top performer is subsequently referred to as the "selected VI".

LAI Estimation Model
The model used the annual peak leaf area index (pLAI) values for each plot. The pLAI values, which typically occurred in August or September of any given year in the study, were identified and subset at the individual plot and year level to provide a series of pLAI values for each of the 16 plots, for a total of 208 pLAI observations. The pLAI data were randomly split into individual training (n = 156) and validation (n = 52) sets. Using the selected VI and observed pLAI, several linear, segmented, and nonlinear models were fit to the training data using the maximum likelihood framework ( Table 2). Two segmented regression models were fit to the data set (Models 2 and 3, Table 3). The segmented models included a linear and nonlinear component, joined at a common point (θ) such that the line was continuous and smooth across the join point. To test the effect of silviculture on model performance, an indicator variable referred to as FERT, was included in several of the models to denote if the area under evaluation is under optimum nutrition management (1) or not (0) depending on the study treatment it is receiving. The variance structure was assumed to be non-constant for all but two models, Models 0 and 2 ( Table 2). Model selection was based on information criterion (Akaike's and Bayesian) and minimizing root During the model testing and diagnostics procedures, one pLAI observation was found to be highly influential and was removed from the training pLAI data set (n = 155). Table 2. Model forms and error structure evaluated using the normalized difference moisture index (NDMI) to predicted leaf area index (LAI). An indicator variable, FERT, is used to denote if the area under evaluation is under optimum nutrition management (1) or not (0).

Measurement Error Estimation
To estimate the amount of measurement error associated with the selected VI, the Kalman filter was used to step through the series of observations at each plot to determine the amount of error associated with the measurements and selected model. The Kalman filter, a recursive algorithm for data processing, provides mean and variance estimates of a known process under the influence of process and measurement errors through an iterative modeling process [63][64][65]. The selected VI values were fit to a three-parameter logistic equation (Equation (1)) to model the overall trend of the asymptotic VI development observed at the plot level from the multitemporal observations [65]: where α is the asymptotic maximum value for VI, β is the location parameter for rate increase, γ is the shape parameter of the relation, t is time, and V is the VI. The Kalman filter utilized the algebraic differenced form of the equation (Equation (2)) to provide one step ahead estimates of VI for data with different starting points: The generalized form of model was used to incorporate process error into the formula: Using the generalized formulation (Equation (3)), the VI for time n was projected using initial starting parameters, followed by an updated variance estimate. A new VI measurement was observed and input into the system. The observed measurement and projected estimate were assimilated, weighted by the inverse of their variances. The assimilated value was the new estimate of VI and the process was repeated by projecting it forward in time along with a variance update until the next VI observed measurement time. The recursive process was repeated until the system state was optimal given a set of constraints, which in this case minimized the negative log-likelihood. Output from the Kalman filter included updated estimates of the equation parameters in addition to estimates of the process and observation error.

Bias Correction
To correct for the effects of measurement error on the parameter estimates from VI observations, an error invariable approach was used to account for the fact that observations in the analysis come with errors. SIMEX was chosen as the methodology to solve this errors-in-variables issue due to its heuristic based approach for reducing biased parameters [66]. SIMEX permits flexibility in modeling because it does not require specifying an error structure model or a ranking of groups [48,66]. SIMEX, a simulation-based, error invariant method of manipulating the estimated model parameters to reduce the effects of measurement errors, can reduce bias in models with measurement error-prone variables [50,67,68]. The Kalman filter estimate of observation error associated with the selected VI time series for all 16 plots was used in conjunction with the interpolation errors in the SIMEX algorithm to correct for bias in the initial model parameters. SIMEX works with two main steps: a simulation step and an extrapolation step. Using the initial estimate of measurement variance and fitted parameters, SIMEX simulates increasing amounts of variance in the data and re-parameterizes the model. After several iterations of reparameterizing the model with inflated variance, SIMEX extrapolates across the estimated (and biased) parameters to find the parameters with no variance [50,67]. To implement the SIMEX algorithm, the selected model and framework were refit with modified observations simulated to represent the inflated observation error estimate [69]. To create these modified observations, synthetic noise (U * (λ l ) ) was generated following U * (λ l ) ∼ I ID N(0, λ l σ 2 ) and was added to the observation [69]. The values for λ l ranged from 0 to 2 and were used to weight the amount of generated noise being added to the observation. A total of twenty steps between 0 and 2 were simulated 1000 times each for λ l . After simulating the observation error inflation, the refit coefficients were averaged for each λ l and plotted against the observation error values. A quadratic linear model was fit to correlate the new coefficient values to the λ l . The unbiased estimates were then extrapolated from the trend to the value of −1, as if the observations were measured completely without error.
To evaluate the robustness of the methodology and the predictive power of the model, the SIMEX parameter estimates were used to calculate mean square error (MSE) on the training data set and mean square prediction error (MSPR) E ∑ n i=1 (g(x i ) − g(x i )) 2 on the validation data set. If MSE and MSPR comparisons are reasonable, the methodology is repeated with a combination of the two data sets.
To compare the proposed model to the current industry standard for LAI estimation from Landsat 5 TM/7 ETM+ imagery, the same procedures for handling in situ LAI measurements without exact satellite imagery matches were followed, Landsat 5 and 7 Top of Atmosphere (TOA) data were queried, matched, and interpolated to find corresponding SR values. Using the Flores et al. [11] model, LAI was estimated and compared to observed LAI.

Results
A total of 1673 measured observations were used to test the relationship between LAI and the individual vegetation indices. Of the 1673 observations, 1546 did not have exact date matches between in situ LAI measurements and satellite imagery (92%). The VI with the highest correlation with LAI was the normalized difference moisture index (NDMI), with a Pearson's r value of 0.765, indicating a strong positive relationship, followed by simple ratio 2 (SR2), −0.756, and the simple ratio (SR), 0.735. When fitted to the 1673 observations, the relationship between NDMI and LAI had an adjusted R 2 of 0.6172 ( Figure 2). For this study, we shifted NDMI by one to be in the positive domain. September (n = 125) and August (n = 54) had the highest frequency of pLAI values (Figure 3). q q q q q q q q q q q q q qq q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q q  Little improvement was observed in model performance as order increased and form changed as compared to the simple linear model until the treatment variable was included ( Table 3). The model that best estimated pLAI based on lowest AIC, BIC, and RMSE using was Model 9 (Table 3). In addition to including NDMI, Model 9 included an indicator variable for FERT, referring to the optimum nutrition treatment that was applied to the fertilization and combination plots. With FERT in the model, a slight diverging relationship was observed in estimated LAI between the fertilized and unfertilized plots (Figure 4). The unfertilized plots maintained an overall linear relationship between the two variables, with lower expected LAI for a given NDMI as compared to the fertilized plots.  The addition of FERT into the model improved performance as seen by reductions in AIC and RMSE (Table 3). Using Model 6 as an example, the addition of FERT (Model 7) reduced RMSE by 0.0877 units of LAI, decreasing the AIC from 221.74 to 173.42. A further reduction in AIC and RMSE occurred from the change in how FERT was incorporated into the model (Table 3). Model 9 used a nonlinear function of the VI to model the effect of FERT rather than as a constant, reducing the overall RMSE to 0.3947 and AIC to 163.15 (Table 3).
Estimated observation error for the individual treatments were combined following additive error propagation protocol to a final estimate of 0.0019 units NDMI (σ 2 obs ). The SIMEX algorithm modified and unbiased the intercept and slope coefficient values to reduce the effects of measurement error in the NDMI estimation model, Model 9 (Table 3 (Figure 5)). The effects of increasing the amount of measurement error, or noise, on the individual parameters influenced how the parameters behaved, with well-defined trends in not only the mean value from the simulations, but also in the distribution of the parameter values in the simulations. For example, for β 0 , as λ increases from 0 to 2, the mean estimates for β 0 increase from approximately −4.12 to −2.91, while standard deviations increased from approximately 0.04 to 0.10 ( Figure 5).
Residual analysis for the model showed no heteroskedasticity and no obvious bias comparing predicted and observed LAI across all observed levels of LAI (Figures 6 and 7).

Discussion
NDMI was a better predictor of LAI in loblolly pine plantations in the southeastern United States when compared to more commonly used vegetation indices, such as NDVI and SR, supporting the findings in [70]. Whereas literature has supported the use of NDMI to predict vegetation density and canopy moisture, the authors of [71] hypothesized that the shortwave infrared band reflectance is more sensitive to shadows in the image, warranting further investigation of the various influences of canopy geometry on NDMI [57]. Other literature has supported the use of short wave infrared vegetation indices in retrieving canopy physiological properties, providing greater sensitivity to canopy moisture and vegetation anomalies, especially in drought prone areas [72][73][74]. NDMI, calculated by the difference ratio between near infrared (0.76-0.90 µm) and shortwave infrared (1.55-1.75 µm) bands, was a good predictor of live leaf biomass, total aboveground biomass, leaf moisture, and canopy moisture in other species [57,[74][75][76]. Compared with the near-infrared band's propensity to be reflected by the canopy, the shortwave infrared band wavelengths are absorbed by water, providing a measure of water (or moisture) content on the surface of interest [77,78]. The shortwave infrared band is sensitive to vegetation density in forests and to total biomass (brown and green), which is what plant canopy analyzers, such as the Li-Cor 2000, actually measure [71]. Optimum nutrition management influenced the behavior of the vegetation indices. Changes in the relationship between the VI and LAI when under optimum nutrition management may be a result of the increased photosynthetic capacity of the canopy, changes in leaf morphology, or allometric changes within stand [79][80][81]. Physical or physiological changes, such as transpiration, chlorophyll concentrations, foliar elemental concentrations, radiation use efficiency, or leaf size and orientation, may be the driving force behind the fertilization effect [39,80,[82][83][84]. In other crops, a fertilization effect on plant spectral reflectance properties has been observed in the near-infrared and middle infrared wavelengths, with increases in reflectance for the middle infrared and decreases for the near-infrared when nitrogen fertilization had been applied [85][86][87]. Using the study level average reflectance for plots receiving (1) and not receiving (0) fertilization treatments, our study showed similar results, with decreased reflectance of the middle infrared band (Band 5 in Landsat 5 and 7), and increased reflectance in in the near infrared (Band 4) (Figure 8). Using error invariable methods to correct for observations with error is a necessary step to provide unbiased estimators for modeling remotely sensed data. The SIMEX algorithm provided a flexible method for estimating the effects of increasing amounts of error on the individual parameters for the selected model, allowing for the extrapolation of the trend to find the value of the unbiased parameter. Including the estimated error for the interpolated values due to temporal misregistration and the overall process error represents two ways to account for the several sources of error that may affect remotely sensed data [49]. Overall, estimated errors from the Kalman filter and thin plate spline interpolations were relatively low, resulting in a slight change in model fit (Figure 9). Accounting for the errors in remotely sensed data provides the means to avoid violating the basic assumptions of linear modeling and reduces the influence of error prone variables.  While the data in this study come from a relatively small geographical area, the high temporal sampling provided a great number of LAI observations for the 16 plots. Note that the treatments applied in this study are not considered operational for loblolly pine plantations in the southeastern U.S. Consequently, our results may not well represent the relationship between remotely sensed VIs and observed LAI measurements in operational settings. However, the measured experimental data covers the range of LAI observed in operational plantations giving some confidence that these data would be useful to forest managers. The long timeline of this study may also introduce errors due to orbit drift in Landsat 5 for which we did not explicitly manage [88,89]. The study plots were relatively small compared to the pixel size used by Landsat imagery. The 50 m by 50 m measurement plot size did not allow for adequate sampling of the 30 m Landsat pixels to reduce spatial autocorrelation between pixels, for which a 3 by 3 pixel sampling area is recommended [49]. New sensors with higher resolution imagery will provide better sampling of the small study plots typically used for forestry research and should avoid this issue. While the weighted average of the pixels falling within the study plot were used to calculate the VIs, a further potential source of errors may come from interference in reflectance from areas surrounding the study plot under evaluation. Results from [90] corroborate the general underprediction of [11], thus the model presented in this research (Equation (4)) should provide an update to sites relying on legacy data from Landsat 5 TM and 7 ETM+. Further improvements in leaf area index predictions may be realized through the use of advanced machine learning techniques, but the ease of use was the driving force behind the development of the model proposed in this research. Furthermore, the framework for reducing the effects of observation errors provided in this research can be applied to new sensors, such as Landsat-8 and Sentinel-2, providing up to date imagery of current plantations.

Conclusions
This study used error-in-variable methods to reduce the effects of measurement errors in a leaf area index estimation model from Landsat 5 and 7 satellite imagery. A difference equation using shortwave infrared and near-infrared bands was the vegetation index with the highest correlation to in situ loblolly pine plantation leaf area index measurements (76.4%). A model form including an indicator variable for optimum nutrition management was the best framework for modeling the relationship between NDMI and LAI, achieving a RMSE of 0.4181 (m 2 /m 2 ). The Kalman filter was used to quantify the amount of measurement error associated with the vegetation index, 0.0019 units NDMI. SIMEX provided a methodology to reduce the effects of vegetation index measurement error on the final model parameter estimates by establishing a relationship between increasing amounts of error and its subsequent effect on parameter behavior. The proposed model provides an improvement in loblolly pine plantation LAI estimation across the Landsat 5 TM and 7 ETM+ time frame in the southeastern United States. The improvement in spatial and temporal LAI retrieval increases the understanding of forest productivity by providing historical context to the development of loblolly pine plantations allowing for better management of forest resources.