Vegetation Cover Analysis of Hazardous Waste Sites in Utah and Arizona Using Hyperspectral Remote Sensing

This study investigated the usability of hyperspectral remote sensing for characterizing vegetation at hazardous waste sites. The specific objectives of this study were to: (1) estimate leaf-area-index (LAI) of the vegetation using three different methods (i.e., vegetation indices, red-edge positioning (REP), and machine learning regression trees), and (2) map the vegetation cover using machine learning decision trees based on either the scaled reflectance data or mixture tuned matched filtering (MTMF)-derived metrics and vegetation indices. HyMap airborne data (126 bands at 2.3 × 2.3 m spatial resolution), collected over the U.S. Department of Energy uranium processing sites near Monticello, Utah and Monument Valley, Arizona, were used. Grass and shrub species were mixed on an engineered disposal cell cover at the Monticello site while shrub species were dominant in the phytoremediation plantings at the Monument Valley site. Regression trees resulted in the best calibration performance of LAI estimation (R > 0.80. The use of REPs failed to accurately predict LAI (R < 0.2). The use of the MTMF-derived metrics (matched OPEN ACCESS Remote Sens. 2012, 4 328 filter scores and infeasibility) and a range of vegetation indices in decision trees improved the vegetation mapping when compared to the decision tree classification using just the scaled reflectance. Results suggest that hyperspectral imagery are useful for characterizing biophysical characteristics (LAI) and vegetation cover on capped hazardous waste sites. However, it is believed that the vegetation mapping would benefit from the use of higher spatial resolution hyperspectral data due to the small size of many of the vegetation patches (<1 m) found on the sites.


Introduction
Humans produce large amounts of hazardous waste.In 2007, the United States generated 47 million tons of hazardous waste with Louisiana and Texas responsible for more than 50% of the total [1].Hazardous waste from current and historic activities is often isolated in landfills or disposal cells using a capping system consisting of barriers that limit deep percolation of precipitation and mobilization of the hazardous constituents [2,3].In addition to the thousands of capping systems in existence, in 2004 the EPA estimated that almost 300,000 additional waste sites were expected to require remediation [4].Many of these sites, as well as the older sites contain residual contamination or are undergoing some form of in situ remediation.In all cases, the management of water infiltration into the waste area is a key issue to prevent the migration of the hazardous constituents into the environment.
Historically, the vegetation component of a hazardous waste capping system has been viewed as a means to stabilize the surface soils and prevent erosion.However, for some capping systems in arid and semi-arid climates, the vegetative cover has taken an increasingly functional role through the construction of evapotranspiration or water balance cover systems [5,6].In these systems, the vegetative cover and soil system are constructed to maintain a hydrologic balance with the vegetation withdrawing water from the underlying soils on an annual basis, thereby minimizing deep infiltration.Proper functioning of these types of systems depends on the development and maintenance of a robust plant community that can maintain water withdrawal capacity over the life of the capping system.Some in situ remediation strategies are also being implemented whereby the migration of subsurface contaminants is dependent on the water withdrawal capability of vegetation.Considered to be a type of Phytoremediation [7], these strategies may be applicable where subsurface contaminants are potentially mobile, and management of vegetation can result in reduced infiltration and subsequent hydraulic control of the migration of contaminants in soils and shallow groundwater [8].
In all such cases, the maintenance of a high evapotranspiration capacity through well-adapted and healthy plant communities is key to the proper and long term stabilization of the wastes.Monitoring of these systems is commonly conducted by ground level observations by trained professionals and is becoming a significant cost element in the management of such systems.Consequently, there is a growing demand for an efficient and reliable approach to vegetation monitoring at waste remediation and stabilization sites.Remote sensing technology can provide a cost effective tool for this type of monitoring in harmony with information obtained from in situ investigation.
Multispectral (several bands) and hyperspectral (hundreds of narrow bands) remote sensing has been used for monitoring hazardous sites [3,[9][10][11] as well as typical environmental resources such as water, land, and vegetation [12][13][14][15].In particular, remote sensing-derived vegetation products can provide valuable information regarding vegetation health and dynamics when monitoring hazardous waste sites [16].Various classification approaches have been investigated for vegetation mapping, including: maximum likelihood classification [17], subpixel analysis [18], machine learning [19], and object-based methods [20].The phenological cycle of vegetation has also been studied to better map vegetation dynamics [21].
Several approaches have been investigated for modeling vegetation biophysical parameters, such as biomass and leaf-area-index (LAI) using remotely sensed spectral data.These approaches include: empirical methods such as statistical regression, spectral positioning, artificial intelligence, and physical modeling [22].Simple linear regression analysis has been widely adopted to correlate vegetation biophysical parameters measured in situ with various vegetation indices, such as the Normalized Difference Vegetation Index (NDVI) [23].More advanced regression techniques, including principal component regression and partial least squares regression, have also been examined [24,25].Some scientists have focused on identifying the spectral reflectance red-edge position (REP) because of its close association with chlorophyll content and its seasonal variations, which directly influence vegetation health [26].Artificial intelligence methods, such as neural networks and regression trees can incorporate field training samples to estimate the vegetation parameters [27].These methods are relatively simple, but have some limitations, including the fact that the relationships are based on representative training samples and the methods are sensitive to atmospheric conditions, sensor viewing geometry, and the spatial resolution of the remote sensor data.Therefore, the methods generally need to be calibrated each time a new remote sensing dataset is acquired [28].
Physical models are theoretically based on leaf scattering and absorption mechanisms associated with biochemistry [29].A representative type of model is the radiative transfer model, which simulates radiation transfer processes in vegetation by computing the interaction between plants and solar radiation.Vegetation biophysical parameters can be retrieved through inversion of the radiative transfer model (e.g., PROSPECT).Simulated reflectance databases have been frequently used with model inversion techniques [30].Model inversion approaches may result in a multitude of different possible solutions increasing uncertainties [31].
Vegetation is typically characterized by slow rates of change and affected by other slow processes such as climate change or soil acidification.However, the vegetation cover on a hazardous waste site may rapidly change due to the unanticipated conditions (e.g., soil subsidence, biointrusion).This change must be quickly detected.The objective of this research was to demonstrate the usefulness of hyperspectral remote sensing to provide long term monitoring capability for Department of Energy (DOE) remediation and waste sites.This study investigated characteristics of vegetation cover on hazardous waste sites with regard to species type and LAI distribution using various technical approaches.

Study Area and Data
Two US Department of Energy sites were investigated in this study (Figure 1): (a) a uranium mill tailings disposal cell capping system near Monticello, UT, and (b) a phytoremediation planting of desert shrubs near Monument Valley, AZ, USA.The yellow symbols are in situ sampling locations.
To limit percolation into underlying tailings, the Monticello capping system relies on the water-storage capacity of a 163-cm sandy clay loam soil and rock "sponge" layer overlying a 38-cm coarse sand capillary barrier, and native sagebrush steppe vegetation to seasonally remove stored precipitation [32].The capillary barrier increases the water-storage capacity of the soil "sponge" [33].The topsoil has favorable edaphic properties for a sustainable plant community.Percolation flux, measured within a 3-ha embedded lysimeter, was approximately 0.5 mm yr −1 from 2000 through 2009 [32]; the capping system has performed well in the short term.Detecting temporal changes and spatial patterns in plant species and LAI on a landscape scale will be important for performance monitoring in the long term.
At the arid Monument Valley site, two deep-rooted native shrubs, Sarcobatus vermiculatus (black greasewood) and Atriplex canescens (fourwing saltbush), are part of the remedy for nitrate contamination in soil, where a uranium mill tailings pile once stood, and in an alluvial aquifer spreading away from the source area soil [8].When protected from livestock grazing, populations, these phreatophytic shrubs transpire enough water from the source area soil to limit recharge and nitrate leaching [34], and from the alluvial aquifer to slow the spread of the nitrate plume [35].Monitoring the long-term performance of phytoremediation at Monument Valley will include tracking responses of phreatophyte health and transpiration rates to changing land management practices over many hectares [36].
HyMap hyperspectral remote sensing data were collected by HyVista, Inc. at Monument Valley, AZ on 2 June 2008 and Monticello, UT on 3 June 2008.Ground reference data were collected at these sites at the same time as data acquisition.An additional ground level dataset was collected at the Monticello, UT Site the previous week.Field data included vegetation composition (percent canopy cover) and LAI (n = 54 on the Monticello site and n = 19 on the Monument Valley site; refer to Table 1).The dominant species included Artemisia tridentata (big sagebrush), Ericameria nauseosa (rubber rabbitbrush), and Pascopyrum smithii (western wheatgrass) on the Monticello site and Sarcobatus vermiculatus (black greasewood) and Atriplex canescens (fourwing saltbush) on the Monument Valley site.
The HyMap hyperspectral data consisted of 126 bands from 440 to 2,500 nm at 2.3 × 2.3 m nominal spatial resolution.The HyMap radiance data were radiometrically corrected to scaled reflectance using the HYCORR algorithm with EFFORT spectral polishing [37].The scaled reflectance data were then geometrically rectified to a Universal Transverse Mercator (UTM) projection using 15 to 20 Ground Control Points (GCPs) collected from the 2006 National Agricultural Imagery Program (NAIP) Digital Orthophoto Quarter Quadrangle (DOQQ) data (1 × 1 m spatial resolution) over the two study sites resulting in root mean square error (RMSE) < 1 pixel.

Leaf Area Index (LAI) Estimation
One of the major goals of this research was to map the spatial distribution of vegetation biomass on the waste sites using remote sensing-derived indices and algorithms in conjunction with in situ derived LAI used as a surrogate for vegetation biomass.LAI, the total area of one-sided green leaves in relationship to the ground below them, directly quantifies the vegetation canopy structure and is highly-related to diverse canopy processes including water interception, photosynthesis, evapotranspiration, and respiration.
Three approaches were investigated to estimate vegetation LAI, including: (1) vegetation index (VI)-based methods, (2) the red-edge positioning (REP) methods, and (3) the use of machine learning regression trees.Although there are numerous vegetation indices, we evaluated two basic vegetation indices: These are extended versions of the simple ratio (SR) and normalized difference vegetation index (NDVI), respectively [14,38].Unlike SR and NDVI which only use one red band and one near-infrared band, we evaluated VI1 and VI2 with all 126 × 125 (= 15,750) possible band combinations in the entire spectral range from 400 to 2,500 nm as long as the value of B 1 was smaller than B 2 .While the correlation between the VI2 and LAI results in a symmetric pattern when B 1 and B 2 are switched, the correlation between the VI1 and LAI is slightly asymmetric with the switch of B 1 and B 2 .However, the asymmetric characteristic of the VI1-LAI relationship were not considered in the study because the asymmetry was slight and we were only interested in the band combinations that resulted in the highest correlation.
The REP is the spectral position between the red (~680 nm) and near-infrared (~800 nm) wavelengths where the maximum slope is to be found [29].The REP is sensitive to the biophysical and biogeochemical properties of vegetation such as LAI and leaf nitrogen content.Phenological change and/or vegetation stress can affect the REP.There are several methods for computing REP, including: derivative function-based methods [39] and Gaussian model-based methods [40].Some studies have reported that the REP is not a single location, but multiple wavelengths [39,41].In this study, three REP techniques were tested to estimate LAI, including: (1) a linear four-point interpolation (LI_REP) [42], (2) a three-point LaGrange interpolation (LG_REP) [43], and (3) a linear extrapolation (LE_REP) [26].
LI_REP assumes that the reflectance curve at the red-edge can be simplified to a straight line centered near the midpoint between the NIR reflectance and the minimum reflectance of the chlorophyll absorption.It was computed using: LG_REP uses the point with the maximum first derivative reflectance FDR λ , (λ i , ), and two points on both sides, (λ i−1 , ) and (λ i+1 , ).A second-order polynomial was fitted using the three points and the wavelength, where the second derivative equals zero, was determined to be the REP: where ) )( ( ) )( ( LE_REP was based on a double-peak feature in the first derivative reflectance resulting from the discontinuity in REP and foliar-nitrogen relationship.Four points, two at the far-red peak (680-700 nm) and two at the NIR peak (725-760 nm), were used to create two straight lines.The wavelength corresponding to the intersection of the two lines was the REP: where c and m were the intercept and slope of the lines, respectively.
Machine learning regression trees typically use a binary recursive partitioning process to generate rule-based models, based on user-supplied training samples for estimating a target variable such as LAI [27,44].Cubist by RuleQuest Inc. was used in this study.The usefulness of Cubist for creating robust regression trees has been documented in the remote sensing literature [27,45,46].
The coefficient of determination (R 2 ), representing the goodness-of-fit of a model, and RMSE were used to measure calibration performance of the three methods.Non-linear exponential and logarithmic regression approaches were also tested for the VI-based method along with linear regression as vegetation indices are often better correlated with non-linear regression models [15].Due to the limited number of field data points, leave-one-out cross-validation was used to assess the three approaches.

Vegetation Mapping
Two different classification approaches were investigated to map vegetation on the Monticello, UT, and Monument Valley, AZ, USA, hazardous waste sites.Both approaches employed machine learning decision trees, but one used scaled reflectance data as input variables while the other used mixturetuned-matched-filtering (MTMF)-derived metrics and a suite of vegetation indices as input variables.
Decision trees have wide application for classification problems because they divide a complex decision into a hierarchy of simple and interpretable decisions [47][48][49][50][51][52].See5 by RuleQuest Research Inc., a widely used machine learning decision tree software, was used to generate decision trees for image classification.
MTMF is a hybrid classification method based on a combination of linear mixture theory and matched filtering, which is based on a partial unmixing approach with user-defined targets [53].One of the advantages of MTMF is that the endmembers (i.e., spectral reflectance characteristics for spectrally pure materials) within a scene do not need to be identified because MTMF uses each endmember independently and models the pixel at each endmember as a mixture of the endmembers and an undefined background material [54,55].MTMF typically uses the minimum noise fraction (MNF) results extracted from the reflectance data.In this study, the cumulative 80% variation threshold was used to determine the subset of MNF results to be used in the MTMF analysis for each site.Consequently, the first 18 and 25 MNF transformed images were used in the MTMF analysis for the Monticello and Monument Valley sites, respectively.The MTMF output includes a matched filter (MF) score and an infeasibility value for each endmember.Ideally, pixels with a high MF score value and a low infeasibility value have a high percent cover of each endmember (e.g., sagebrush).Pixels with high MF score values and high infeasibility values may be false alarms.We used two image-derived endmembers for each class based on the percent cover data, which resulted in 8 endmembers for the Monticello site and 6 endmembers for the Monument Valley site.
The MF scores and infeasibility values were used as input variables along with a suite of vegetation indices.A total of 11 vegetation indices were used and two (i.e., VI1 and VI2) of them were developed from the LAI estimation in this study (Table 2).The original scaled reflectance data (126 bands) were also used as input variables in the decision tree classifications for comparison.

From this study
Decision trees are known to be sensitive to the characteristics of the training samples, especially when there are a limited number of training samples [47].To improve the stability of the decision tree classifier, an aggregation approach was introduced [55, [63][64][65][66] where multiple decision trees were generated with different sets of training samples and a majority rule was used to determine the class for each pixel.Due to the relatively small number of training data samples, we used 50% of the reference data through random selection to train the decision tree and the remaining 50% to test the decision tree.A total of 40 decision trees were generated and 20 trees were selected based on the testing results for each site (i.e., the decision trees with higher testing accuracy were selected).These 20 trees were used for the voting process to generate the final vegetation map for each site.
Accuracy assessment of the vegetation maps included the determination of commission and omission errors for each class, overall accuracy (%), and the Kappa Coefficient of Agreement (κ).A Kappa Z-test was used to determine if there was a significant difference between two Kappa Coefficients associated with each site.

LAI Estimation
The results of the correlation matrices produced between the vegetation indices and LAI for all possible band combinations are shown in Figure 3.The greater the correlation of the ground reference LAI value with the two bands used to compute the vegetation index, the more blue the pixel in the diagram.Both VI1 and VI2 exhibited very similar patterns for each site.While relatively high correlations between the vegetation indices and LAI were found in the region between 1,500 and 1,800 nm for the Monticello site, they were found in the region between 900 and 1,400 nm for the Monument Valley site (Figure 3).Interestingly, the bands in the red and near-infrared regions, which are assumed to have more information regarding vegetation health, did not produce higher correlations than the bands in the middle-infrared region.This might be because the study sites are located in semi-arid/arid areas and thus the spectral response of each vegetation type may be distinguishable in the middle-infrared region, where the spectral response of vegetation to water is more sensitive.Table 3 summarizes the best band combination and associated statistics (i.e., R 2 , RMSEs from calibration and cross-validation) of the two vegetation indices for each site.It was not surprising that one of the two best bands was found in the water absorption region (~1,200 nm) [67] for the Monument Valley site because some of the sample locations were irrigated while the others were not at this site.The vegetation index approach resulted in better performance for the Monument Valley site than the Monticello site (e.g., R 2 = 0.501 accounting for 50% of the variance with r = 0.7).This might be because the vegetation distribution and structure were more dynamic in the Monticello site than in the Monument Valley site.In particular, the grass and shrub species were highly mixed in the Monticello site, and this made it difficult to calibrate the in situ LAI data with the HyMap data at the 2.3 × 2.3 m resolution.Slight difference between the field and the pixel location might have also introduced errors in LAI estimation.While the non-linear regression models did not result in better fit than the linear regression for the Monticello site, the logarithmic regression model yielded slightly higher R 2 and lower RMSE values for the Monument Valley site (Table 3).The best band combination for each site was consistent regardless of the regression method and the vegetation index used.The scatterplots between each of the three REP approaches and LAI are shown in Figure 4.The REP approach did not predict LAI well, resulting in low correlations (<0.2).Similar to the vegetation index approach, REP methods resulted in better performance for the Monument Valley site than the Monticello site.While the LG_REP resulted in the best performance for the Monticello site, the LI_REP produced the highest accuracy in LAI estimation for the Monument Valley site.However, there was no significant relationship between any of the REPs and LAI (p > 0.05).One of the reasons for the poor performance of the REP approach might be background soil spectral influence within pixels.The REP approach typically works well for vegetation with full canopies [68,69].Many of the sample pixels had vegetation cover between 70% and 90%, and thus the background soil could have influenced the spectral response associated with the red-edge.That most of the sample pixels contained multiple vegetation species may have also caused the REP approach fail to estimate LAI because each species typically has a unique REP characteristic.Unlike the results of the VI-and REP-based LAI estimation, the regression trees resulted in very good LAI estimation performance (R 2 > 0.8) (Figure 5).The regression trees generated three rules for the Monticello, UT Site and two rules for the Monument Valley, AZ Site.Interestingly, one of the three rules for the Monticello site was associated mainly with the grass species samples (i.e., wheatgrass and litter) while the other two rules were applied to most of shrub species samples (i.e., sagebrush and rabbitbrush).Eight bands were used to generate the multivariate equations for the Monticello site, which was not efficient and resulted in inflation of the fitness of the models.
Conversely, only two bands (709 and 754 nm) in the red-edge region were used to generate the multivariate equations in the regression trees for the Monument Valley site.One rule was applied to relatively high LAI (>2) samples for the Monument Valley site, while the other was applied to the lower LAI samples.The red-edge bands effectively divided the samples into the two groups.Although the calibration using the regression trees outperformed those using the vegetation index and REP data, the cross-validation did not correspond to the calibration results.The RMSEs through cross-validation using the regression trees were higher than those using the other approaches.While most of the folds for cross-validation generally resulted in low errors, some of the folds (~25%) resulted in high LAI estimation errors (e.g., >2), which consequently increased the total RMSE.The overfitting problem of the regression trees, especially when a small number of samples is used, often occurs [27].This is also related to the well known problem of decision/regression trees, which are sensitive to the training data configuration [70].Glenn et al. [71] investigated black greasewood and fourwing saltbush over the Monument Valley site using 2007 MODIS data.They measured LAI using the traditional direct method and found a good agreement with the scaled Enhanced Vegetation Index (EVI) from MODIS data based on a simple linear regression (R 2 = 0.77; n = 55; p < 0.001).Figure 6 shows the LAI distribution maps estimated using the regression trees for the Monticello and Monument Valley sites.
Although all possible two-band combinations were tested for the VI1 and VI2 in this study, the most valuable VI might consist of more than two narrow hyperspectral bands.We tested a few narrow band-derived VIs such as the Vogelmann Red Edge Index 2 [72], which uses spectral data at more than two wavelengths.However, the additionally tested indices did not outperform the VIs used in this study.Optimization of multiple wavelengths associated with such narrow band-derived VIs might further improve the performance of LAI estimation.

Vegetation Mapping
Figures 7 and 8 show the relationship between the percent cover and the matched filter scores for the Monticello, UT and Monument Valley, AZ Sites, respectively.The percent covers of wheatgrass and litter were correlated with the matched filter scores at the 90% and 95% confident levels, respectively.However, the percent covers of sagebrush and rabbitbrush failed to be significantly correlated with the corresponding matched filter scores.There might be two reasons for this: the spectral separability of sagebrush and rabbitbrush from other species was not strong.In addition, most of the sagebrush and rabbitbrush samples were dominated by the corresponding species (i.e., percent cover > 70%), while a few of them were mixed with other species (i.e., percent cover < 60%).MTMF is known to be sensitive to the amount of green vegetation present within a pixel [38,55].Litter (dead plant materials) for the Monticello site and soil for the Monument Valley site could affect the determination of the matched filter scores of the healthy vegetation species.Locational errors could also influence the relatively lower correlation between the percent covers and the matched filter scores.Interestingly, the percent covers of saltbush were well correlated with the matched filter scores (Figure 8(b)).Excluding one outlier, the percent covers of greasewood were also well correlated with the scores.The MTMF approach has been successfully applied to map single vegetation species resulting in good relationships between percent cover and matched filter scores [73], but it typically results in more variation and confusion between species when multiple species are considered [74].In this study, while several vegetation species were mixed at each sampling location in the Monticello site, one species was dominant at each sampling location in the Monument Valley Site.Consequently, there was less influence from other species in the relationship between the percent cover and the matched filter scores for the Monument Valley site than for the Monticello site.However, because the matched filter scores were not sufficient for separating each species from the others, the use of a suite of vegetation indices was expected to improve classification accuracy using decision tree logic.Figure 9 shows the performance variations of the multiple (i.e., 20) decision trees using different sets of training samples for each site.Since the size of the reference data was small (i.e., 53 samples of four classes for the Monticello site and 43 samples of three classes for the Monument Valley Site), the performance variation of the multiple decision trees for training and testing was slightly large.For both sites, the MTMF-derived metrics and the vegetation indices (labeled MV in Figure 9) resulted in better performance (for both training and testing) than the original scaled reflectance (labeled REF in Figure 9) in decision tree classification.Table 4 lists the key input variables to the decision trees classifications for each site.The PRI and NDNI among the vegetation indices were very useful in the decision tree classification for the Monticello site.The MF scores of sagebrush and rabbitbrush also contributed to the decision tree generation for the Monticello site.When the original scaled reflectance data were used in the decision tree classification for the Monticello site, the red (663.4nm) and red-edge bands (709 nm) contributed most to the classification, followed by the blue (443.3nm) and middle-infrared band (2,477.5 nm).A different pattern of contributing variables was found for the Monument Valley site: The WBI, NDLI, and NDWI contributed most to the MTMF and vegetation index-based decision tree classification.The infeasibility of greasewood was also very useful.When the reflectance data were used, the bands near the water absorption features (i.e., around 1,400 and 1,940 nm) contributed most to the classification.This may be because the Monument Valley site is located in an arid area and some of the sampling locations were irrigated while others were not.That is why the water-related vegetation indices and reflectance were very useful in vegetation mapping for the Monument Valley Site.The chlorophyll absorption features and related vegetation indices contributed moderately to the decision trees for both sites.A majority rule was applied to produce the final species distribution map based on the multiple decision trees.Because some pixels had multiple maximum votes, additional processing was necessary.A three-step approach was used: a majority rule was first applied using all of the 20 decision trees, and then another majority rule was applied to the undecided pixels using the top 10 decision trees based on their testing performance.Finally, the best decision tree was used to determine the classes for the still undecided pixels and the final vegetation maps were created for each site.
Figure 10 shows the vegetation maps over the Monticello site using the two sets of decision trees (i.e., one set using the MTMF variables and vegetation indices and the other using the scaled reflectance).Sagebrush appeared to be somewhat overestimated through visual inspection when the MTMF variables and vegetation indices were used (Figure 10(a)).On the other hand, rabbitbrush and litter were generally overestimated when the scaled reflectance data were used for decision tree classification (Figure 10(b)).The vegetation maps of the Monument Valley site using the two sets of the decision trees are shown in Figure 11.The classes were more clumped in the map using the scaled reflectance than using the MTMF variables and vegetation indices.Soil appeared to be overestimated when the reflectance data were used.These classification maps exhibited some discrepancies with the actual field conditions.For example, while the disposal cell cover (central region) in Figure 10 appears close to the field conditions, a monoculture of rabbitbrush on the side slopes does not agree with the field conditions.At the Monument Valley site, greasewood appears to be over classified.The limited quantity and quality of the field reference data might account for the discrepancies, including: (1) the small sample size resulted in variations in performance for multiple decision tree classifications; and (2) each ground reference sample was measured using a circular plot with a diameter of 1 m, which is smaller than the hyperspectral image pixel size (2.3 × 2.3 m).Given the small shrub and grass patches in the sites, this could result in significant confusion in classification.Since all of the reference data might have been used to train decision trees (50% on average), the classification accuracy based on the assessment using the reference data might be slightly inflated.The classification accuracy assessment for the Monticello site is presented in Table 5.Interestingly, although the MTMF variables and the vegetation indices outperformed the original scaled reflectance, based on the individual decision trees for both training and testing (refer to Figure 9), the accuracy assessment results were similar between the two maps, resulting in the overall accuracy of around 87% and Kappa of around 0.82.Similar to the visual inspection of the classification maps, the commission error of sagebrush when using the MTMF variables and vegetation indices was large (~36.4%),while the omission error of sagebrush when using the reflectance data was large (~37.5%).Due to the small size of the reference data, the overestimation of rabbitbrush and litter when using the scaled reflectance was not clearly indicated by the error matrix.Wheatgrass was confused with sagebrush and litter in the classification using the reflectance data.There was no significant difference between the two Kappa values (Table 5(c)).Table 6 presents the accuracy assessment results of the classification maps for the Monument Valley site.Saltbush was confused with soil for both maps.This might be because some of the saltbush sample locations were grazed and not irrigated, causing their spectral response to be similar to soil.Saltbush was also confused with greasewood when the reflectance data were used in the decision tree classification.However, the confusion was much improved when the MTMF variables and vegetation indices were used in the decision tree classification.There was no significant Kappa difference between the two classifications due to the relatively large asymptotic standard errors (ASE), a measurement of uncertainty, even though there was a 10% difference between the two Kappa values (Table 6(c)).For vegetation mapping over hazardous waste sites, the omission errors of shrub could be more serious than the commission errors.The commission errors could be false alarms for biointrusion on the capped materials, but the omission errors might indicate undetected biointrusion, which requires quick response and treatment.From this point of view, the use of the MTMF-derived metrics and vegetation indices was better than the use of the scaled reflectance for vegetation mapping.The omission errors of rabbitbrush and saltbush (both shrub species) were relatively large when the scaled reflectance data were used.
At the Monticello site, rabbitbrush is an early successional shrub adapted to disturbed, unstructured soils.Sagebrush is a later successional shrub that appears to be increasing in abundance as soil structure develops in the engineered soil cover, creating preferential flow pathways for water to move deeper in the profile and, hence, gradually creating a more favorable habitat for sagebrush.The phytoremediation study at the Monument Valley site was designed, in part, to compare the two dominant native desert phreatophytes: the obligate black greasewood and facultative four-wing saltbush.Consequently, for long-term monitoring of these sites, differentiating rabbitbrush and sagebrush at the Monticello site, and greasewood and saltbush at the Monument Valley site are critical.
When considering the small shrub and grass patches (~1 m and sometimes <1 m in size) found in the study sites, the spatial resolution of the HyMap imagery (2.3 × 2.3 m) appears to be a bit coarse.Although there is a concern that higher spatial resolution data may actually reduce classification accuracy by increasing within-class spectral variability [75], the Monticello and Monument Valley sites should benefit from higher spatial resolution data (e.g., ~1 × 1 m) for vegetation mapping, because small grass and shrub patches (not tall vegetation) are dominant and their cover is relatively dense (percent cover > 70%).

Conclusions
This study evaluated the usefulness of HyMap hyperspectral data for characterizing the vegetation cover (i.e., LAI estimation and vegetation species mapping) on two hazardous waste sites.The Cubist regression trees resulted in the best calibration accuracy for estimating LAI (R 2 > 0.80), but the instability of the models due to the small sample size was a concern.The vegetation index approach to estimating LAI revealed that reflectance data in the middle-infrared region were more useful than reflectance data in the red-or near-infrared region.More sophisticated narrow band-derived vegetation indices need to be investigated further.Aggregated decision trees were successfully used to map the vegetation species with a limited amount of reference data (overall accuracy > 85%).The MTMF approach with a suite of vegetation indices improved the classification accuracy.
Automated monitoring of vegetation cover on hazardous waste sites using hyperspectral remote sensing data and modeling techniques appears feasible, but requires further investigation using different remote sensing data sources, higher spatial resolution hyperspectral data, and more advanced modeling techniques.Site characteristics must be carefully considered when determining the remote sensing data to be collected and the approaches to be used.Future research includes applications of multi-sensor data fusion (e.g., high density LiDAR data + hyperspectral imagery) and/or different modeling techniques (e.g., artificial immune networks, support vector machines, and artificial neural networks) for monitoring hazardous waste sites.In addition, while this study provided the preliminary results and single-date baseline data associated with monitoring of the phytoremediation systems at the Monticello and Monument Valley sites, linking remote sensing methods with actual monitoring tasks in a hazardous waste context should be further examined.Such links include: (1) detecting changes in the spatial distribution of plant species and LAI through time at the landscape scale using < 1 × 1 m multiple-date hyperspectral remote sensor data, and (2) tracking the response of phreatophyte health and evapotranspiration rates to changing land management practices.

Figure 2
Figure 2 is a flow diagram of the digital image processing methods used to (a) predict the spatial distribution of LAI, and (b) map the vegetation species present on the two hazardous waste sites.

Figure 3 .
Figure 3.The correlation matrices using the vegetation index approach to estimate LAI: using (a) VI1, and (b) VI2 for the Monticello, UT site; and using (c) VI1 and (d) VI2 for the Monument Valley, AZ site.

Figure 4 .
Figure 4.The scatterplots between each of the Red-edge position (REP) and LAI: using (a) LI_REP, (b) LG_REP, and (c) LE_REP for the Monticello, UT site; and using (d) LI_REP, (e) LG_REP, and (f) LE_REP for the Monument Valley, AZ Site.The R 2 and RMSEs from calibration (CAL) and cross-validation (CV) are also provided.

Figure 5 .
Figure 5. LAI estimation using the regression tree approach for (a) the Monticello, UT Site, and (b) the Monument Valley, AZ Site.The R 2 and RMSEs from calibration (CAL) and cross-validation (CV) are summarized in the plots.

Figure 6 .
Figure 6.The estimated LAI distribution maps for (a) the Monticello site, and (b) the Monument Valley site.The dirt road and other land cover classes were masked out for the Monticello site.

Figure 7 .
Figure 7.The relationships between the matched filter scores and the percent cover of the vegetation species for the Monticello site: (a) sagebrush, (b) rabbitbrush, (c) wheatgrass, and (d) litter.

Figure 8 .
Figure 8.The relationships between the matched filter scores and the percent cover of the vegetation species for the Monument Valley site: (a) greasewood and (b) saltbush.

Figure 9 .
Figure 9. Box plots showing the performance variation of the multiple decision trees using different sets of training and testing samples: (a) for the Monticello Site, and (b) for the Monument Valley Site.MV represents the decision trees using the MTMF-derived metrics and vegetation indices and REF represents the decision trees using the original scaled reflectance data.

Figure 10 .
Figure 10.The vegetation species distribution maps for the Monticello site based on the decision trees (a) using the MTMF-derived metrics and vegetation indices, and (b) using the original scaled reflectance data.The road and other land cover classes were masked out.

Figure 11 .
Figure 11.The vegetation species distribution maps for the Monument Valley site based on the decision trees (a) using the MTMF-derived metrics and vegetation indices, and (b) using the original scaled reflectance data.

Table 3 .
The best band combination and associated statistics of both VI1 and VI2 for the Monticello and Monument Valley sites.The statistics resulted from the non-linear regression models were in parentheses.While the exponential regression model resulted in similar performance with the linear regression for the Monticello site, the logarithmic regression model produced slightly higher R 2 and lower RMSE than the linear model for the Monument Valley site.

Table 4 .
Key input variables to the decision trees classifications: (a) for the Monticello site, and (b) for the Monument Valley site.

Table 5 .
Accuracy assessment results of the decision tree classifications for the Monticello site: (a) using the MTMF variables and vegetation indices, (b) using the scaled reflectance data, and (c) Kappa Z-test between the two classifications.

Table 6 .
Accuracy assessment results of the decision tree classifications for the Monument Valley site: (a) using the MTMF variables and vegetation indices, (b) using the scaled reflectance data, and (c) Kappa Z-test between the two classifications.
* no significant difference between the two Kappa values at the 95% confidence level.