Mapping Mountain Pine Beetle Mortality through Growth Trend Analysis of Time-Series Landsat Data

Disturbances are key processes in the carbon cycle of forests and other ecosystems. In recent decades, mountain pine beetle (MPB; Dendroctonus ponderosae) outbreaks have become more frequent and extensive in western North America. Remote sensing has the ability to fill the data gaps of long-term infestation monitoring, but the elimination of observational noise and attributing changes quantitatively are two main challenges in its effective application. Here, we present a forest growth trend analysis method that integrates Landsat temporal trajectories and decision tree techniques to derive annual forest disturbance maps over an 11-year period. The temporal trajectory component successfully captures the disturbance events as represented by spectral segments, whereas decision tree modeling efficiently recognizes and attributes events based upon the characteristics of the segments. Validated against a point set sampled across a gradient of MPB mortality, 86.74% to 94.00% overall accuracy was achieved with small variability in accuracy among years. In contrast, the overall accuracies of single-date classifications ranged from 37.20% to 75.20% and only become comparable with our approach when the training sample size was increased at least four-fold. This demonstrates that the advantages of this time series work flow exist in its small training sample size requirement. The easily OPEN ACCESS Remote Sens. 2014, 6 5697 understandable, interpretable and modifiable characteristics of our approach suggest that it could be applicable to other ecoregions.


Introduction
Understanding patterns of forest disturbances is important for assessing past and future ecosystem structure, function, productivity and diversity [1,2].In recent years, mountain pine beetle (MPB) outbreaks have occurred over extensive areas, from British Columba in Canada to New Mexico in the United States.MPB is endemic to North America; however, the recent outbreaks have reached epidemic levels that have affected millions of hectares of forests [3].The effects of MPB infestation range from altered surface fuel and wildfire hazards [4][5][6][7], changed vegetative composition [8], converted live carbon sinks to dead and slowly decaying carbon sources [9][10][11], impacted nutrient cycling and water quality [12,13], shifted evapotranspiration and albedo, modified local surface energy balance [14] and changed regional climate [15].Those effects are predicted to increase as a consequence of the direct and indirect effects of climatic changes [16][17][18].However, unlike wildfires that have received intensive mapping efforts at different scales, characterizations of the extent and severity of MPB using multi-temporal data have only begun to emerge [19][20][21][22][23].
Remote sensing, both with aerial photography and satellite imagery, has been widely acknowledged as a useful way to map tree mortality over large areas [24].Landsat time series stacks (LTSS) are well suited to this type of analysis, because they are freely available, cover a long time period, have a broad spatial extent, are multispectral and have temporal continuity [20,25].Two frequently used disturbance mapping approaches are (1) trajectory-reconstruction and (2) decision trees (DTs).Trajectory-based change detection examples include the Landsat-based detection of trends in disturbance and recovery algorithm (LandTrendr) [26] and the Vegetation Change Tracker [27][28][29].Both methods recognize disturbance events by the rate of changes in time series of spectral properties.This makes the first and last years of the trajectory more difficult to judge, because the attribution for a particular year is entirely determined with regard to its spectral deviation from the two adjacent years [26].In addition, the event labeling procedure depends on pre-implemented modules in the software and, thus, is not flexible in adjusting to user-defined functions.DT approaches include work by Goodwin et al. [30] who utilized the annual changes in normalized difference moisture index and a set of thresholds to classify the presence of MPB infestation over 14 years, resulting in overall accuracies ranging from 71% to 86%.DT-based approaches have the capacity to integrate expert knowledge and flexible classification schemes.However, poor image quality from sensor drift, geometric misregistration and topographic and cloud shadows will introduce distortions to images and affect the accuracy of change detection.
In this study, we describe a forest growth trend analysis method that integrates the robust change detection strengths of the trajectory reconstruction approach and the flexible user-defined advantages of DT approaches to characterize MPB mortality with an 11-year LTSS.We were interested in determining whether or not this proposed method could reach an accuracy target of 85% or greater.Additionally, we compared its accuracy with single-date image classifications, which are still important and commonly used in disturbance mapping, especially in two-date post-classification change detection.We are aware that the accuracy of single-date image classification is not only determined by the input variables and classifiers, but is also heavily dependent on the quality and quantity of training samples [31], which may have larger impacts than the implemented techniques [32].Thus, we also assessed the effect of different sample sizes on the accuracy of single-date image classifications, and we tested how much our proposed method can improve sampling efficiency as compared with the single-date classifiers.

Study Area
Our study area was located in Grand County, in north-central Colorado (Figure 1), with a total land area of 4783 km 2 .This landscape is dominated by evergreen forests mainly composed of lodgepole pine (Pinus contorta), along with other tree species, as well as shrub and grass cover types.Summer in Grand County is typically hot and dry, which is suitable for MPB development.The average precipitation per year is approximately 400 mm, whereas the U.S. average is 930 mm.This county has one of the highest concentrations of lodgepole pines and has been heavily affected by the MPB outbreak that started around 2002 and reached the peak of tree mortality between 2005 and 2008 [33].

Landsat Time Series Stack
Grand County lies entirely within Landsat path 34 row 32.We restricted the image acquisition season from June to September, because this snow-free period is the recommended time window to detect MPB mortality [20].For each year, one cloud-free or mildly contaminated image was selected to build the LTSS, and if no such image existed, several partly cloudy images with a similar phenology were used as replacements (Table 1).All data were processed to Level 1T (terrain corrected).Radiometric calibration and atmospheric correction for cloud and aerosol effects were performed using the Landsat Ecosystem Disturbance Adaptive Processing System [34].Clouds and their shadows were automatically screened using the Function of mask (FMASK) algorithm [35].Cloud removal effects were judged satisfactory via visual checking, though the extent of cloud cover was slightly overestimated.We accepted this false positive error, because commission errors were more desirable than omission errors in cloud filling.The number of clear observations in the LTSS over the period of 2000 to 2011 is shown in Figure 2, and 98% of the study region contained more than 10 cloud-free observations.Since MPBs attack a limited number of tree species, we confined our analysis to the extent of their host species in the LANDFIRE Refresh 2001 Existing Vegetation Type data layer (EVT), which was developed using circa 2001 (1999-2003) Landsat imagery [36].The spatial resolution of the LANDFIRE data matches the LTSS, and the time period of the imagery from which the EVT data were derived depicts the pre-disturbance distribution of vegetation in Grand County.Currently, MPB mortality is concentrated in lodgepole and ponderosa pine (Pinus ponderosa) forests, but other pine species, like limber pine (Pinus flexilis) and bristlecone pine (Pinus aristata), have also been found to be suitable hosts [37].We included all EVT pixels that contain any of those species (Table S1), and smoothed the image with a 3 × 3 majority filter, which was designed to reduce observed salt-and-pepper effects and to minimize omission errors that are much higher than commission errors for the selected classes [38].

Reference Sample Selection
We randomly placed training samples over forested areas and reconstructed their disturbance history with the aid of normalized burn ratio (NBR) time series from the LTSS and the 1-m resolution U.S. Department of Agriculture (USDA) National Agricultural Imagery Program (NAIP) imagery (obtained from the USDA Geospatial Data Gateway), which were available for years 2005, 2009 and 2011 in Grand County.The NBR time series were calculated using bands 4 and 7 from the Landsat imagery [39] and then multiplied by 100, so they could be represented as integer values instead of floating point values.For each training location, we identified the disturbance event type, duration and onset time (Table 2).Disturbance event types included: healthy, MPB mortality or clearcuts.Fire was not included, because large fires did not occur in our study area during the time period of this study (2000 to 2011), as indicated by the U.S. Geological Survey Monitoring Trends in Burn Severity database [40].In total, we collected 106 training samples, among which, 53 were sites with MPB mortality, 37 were clearcuts and 16 were healthy.We selected the relatively small number of training samples, because many studies suggested that trajectory-based methods perform well when trained with a relatively small sample size [23,26,41].Statistical classifiers require more training data to fully represent each class in the feature space [42].Thus, we expanded the original training sample size by 10 times to a total of 1060 training samples.For those newly added samples, we only identified their disturbance types in the three years for which NAIP imagery were available instead of the whole NBR trajectory from the LTSS.
For validation purposes, we randomly sampled points in strata that were formed in the complete temporal-spectral space as a representation of the gradient of MPB mortality.In this manner, our test samples were able to cover a complete spectrum of disturbance events and a range of mortality.Considering the large data volume of the 11-year LTSS, for which the dimension was 2964 (rows) × 2901 (columns) × 102 (bands), we employed an efficient unsupervised classification tool, Clustering Based on Eigenspace Transformation, to improve the speed of clustering over conventional K-means without losing accuracy [43].Fifty strata were developed, and we proportionately allocated a total of 100 points to each stratum.In the absence of sufficient ground and ancillary information, we visually interpreted the status for each point at each individual year.By checking the proportion of points in the three classes, we found that they are almost equally distributed without severe over-or under-representation; the mean number of validation samples in each disturbance type each year was 39 (healthy), 38 (MPB mortality) and 23 (clearcuts; Table S2).

Temporal Segmentation
We developed time series of disturbance maps by combining the temporal trajectory analysis approach of LandTrendr [26] with a decision tree labeling procedure.Temporal trajectory analysis was used to decompose often noisy time series into a sequence of straight-line segments with distinct features designed to capture the event that happened for the duration of each segment.NBR [39] was chosen to represent the vegetation condition, because of its sensitivity to disturbance [26] and its capacity to minimize radiometric differences among images.The general procedure was to use a set of parameters to enhance the signal-to-noise ratio of the NBR time series in order to capture the salient events happening in each period or segment.Details can be found in Kennedy et al. [26], and the final effect of temporal segmentation, whether it was under-segmentation (split up into too few parts) or over-segmentation (subdivided into too many parts), was determined by the parameters listed in Table 3. Optimal LandTrendr parameter settings have been tested in the forests of the Northwest Pacific Region at both Landsat and MODIS scales [26,41].However, because we are not aware of reported parameter results in the literature specific to the Southern Rocky Mountain ecoregion, we tested a range of candidate parameter values (Table 3).[26] and the values tested for this study and ultimately selected for this study to reduce the raw normalized burn ratio (NBR) time series values to a number of linear segments.We ran LandTrendr on all combinations of the candidate parameter values, which resulted in 144 unique combinations in total, and then visually compared their temporal segmentation results with the original NBR time series for the 106 training samples.Thus, the calibration involved 15,264 comparisons.Each individual comparison for a sample was coded in a binary state using two criteria: the overall match in the shape and the timing of vertices.A good match occurred if the disturbance event that happened at that pixel was successfully captured, and the slope and magnitude for each segment deviated little from the central trend.The vertex timing was judged by how well the algorithm captured the onset and ending time of each disturbance event.If both criteria were satisfied, that comparison would be scored as 1, and otherwise, it would be 0. The optimal parameter set was decided based upon the pooled rankings across all training samples.The processing time took about 200 minutes (on 5,376,538 unmasked pixels) for this step, which was run on a workstation with an Intel i7-2600 3.4 GHz quad core CPU and 8 gigabytes of physical memory.

Decision Tree-Based Spectral Segment Labeling
The outputs of the temporal segmentation included the fitted segments, and a set of vertices connecting individual segments.We employed a DT to attribute disturbance events associated with those vertices and segments, with leaf nodes representing class labels and branches representing decision scenarios.The DT was built with predictors characterizing each segment: the disturbance occurrence onset time, duration and the magnitude of NBR change.We defined the onset time as the starting year of a segment.Magnitude was the absolute change in NBR between the ending year and the onset year, whereas duration was the time elapsed over the length of one segment.Different types of disturbance events were characterized by distinct features.For instance, clearcuts could be recognized by segments with a short duration and large, negative change in NBR, whereas MPB mortality segments tended to have a gradual, but long-duration decline in NBR values.
Decision rules were used to identify forest conditions in a top-down sequence (Figure 3), and the definition of key parameters are explained in Table 4.These parameters were manually calibrated with the same training samples described in Section 3.1.Specifically: (1) Segments were separated into regrowth, stable or disturbed status according to their magnitude (mag) at the first level of the DT.If the absolute magnitude of one segment was less than the pre-set threshold (<thre_mag1), it was treated as disturbed.If the magnitude was within a threshold range (±thre_mag1), then it was considered to be stable.Otherwise, it was considered to be in a regrowth stage (>thre_mag1).(2) To label a segment as regrowth or stable, healthy vertices were identified based on an NBR threshold (thre_vertex1).Thus, a vertex in either a regrowth or stable period with its NBR value greater than the thre_vertex1 parameter was labeled as healthy.
(3) Segments classified as disturbed in Step 2, were labeled as clearcuts if their rate of change in NBR was greater than a second threshold (thre_mag2); otherwise, segments were labeled as MPB mortality.The rate of change was defined as the average NBR change per year, which is a reflection of both magnitude and duration.The thre_mag2 parameter was designed based on the assumption that clearcuts always have more abrupt and rapid decreases in NBR than MPB mortality.(4) In the third level of the DT hierarchy, the label from the previous year (pre_label) was critical in determining the label for the current vertex.This was based on the assumption that events are temporally dependent and forests can only logically transition from certain states to another.For instance, clearcuts often result in abrupt declines in NBR followed by persistent, but slow increases in NBR.Although the magnitude of change in the post-clearcut period is not as sharp as that of the pre-clearcut period, the vertices in the subsequent years will still be assigned as clearcut to ensure that the disturbance class labels have temporal consistency.(5) The final step involved attributing the vertices for the first year of the time series.We separated this step, because there was no vertex information prior to the first year.After MPB mortality, standing tree trunks and branch residuals remain on site, whereas clearcuts usually have a significant amount of bare ground.Thus, NBR values in areas with MPB mortality should generally be higher than they are in clearcut areas.The thre_vertex2 parameter defined the cutoff value for separating clearcuts and MPB mortality in these cases.The NBR value for current vertex.pre_label The vertex label in the previous year.thre_mag1 The magnitude threshold that distinguished stable from either disturbance or recovery.thre_mag2 The magnitude threshold that separated MPB mortality from clearcut. thre_vertex1 If the current vertex NBR value was above this threshold, it was treated as healthy, otherwise, it was further analyzed into disturbance types.

thre_vertex2
If first year's vertex value was below this threshold, it was treated as a clearcut pixel and MPB mortality otherwise.

Post-Labeling Process
Both the temporal segmentation and DT were run at the pixel level.Thus, it was possible that two adjacent pixels could simultaneously experience the same event, but were labeled differently due to errors.Since most disturbance events were patchy, the behavior of spatially adjacent pixels could be used to enhance the robustness of the classification.Based on this logic, we filled small holes in a large patch that were undetected or mislabeled.To do this, we first applied a 3 × 3 majority filter to reduce the salt-and-pepper effect and merged pixels into more complete patches, and then, we used a 3-year temporal filter to remove illogical transitions over time.For example, this step would change a pixel classified as MPB mortality in Year 1, healthy in Year 2 and MPB mortality in Year 3, to MPB mortality for all 3 years.

Single-Date Classification
Single-date mapping strategies were assessed using a traditional maximum likelihood classifier (MLC) and a random forest (RF) algorithm.MLC is a parametric method widely used in forest remote sensing [44].In contrast, RF is a machine-learning technique that combines the results of many (thousands) of weak classifiers, such as classification and regression trees [45].RF is often praised for its good overall performance, as well as robustness to noise [46].We implemented the classifiers on both the raw spectral bands and a combination of raw spectral bands and spectral indices that were sensitive to forest defoliation and canopy water content (Table 5 [39,[47][48][49][50][51]). Table 5. Spectral indices used in the single-date classifications, including formulas and references.

Accuracy Assessment and the Sample Size Effect
We conducted both qualitative and quantitative accuracy assessments.First, a confusion matrix was created for each individual year, and associated accuracy measures were generated, including the overall accuracy, kappa, producer's accuracy and user's accuracy.Second, we also visually evaluated the results by comparing the mapped spatio-temporal pattern of mortality with those observed by the Forest Health Monitoring Aerial Detection Survey (ADS) data [52].The ADS data are generated by aerial sketchmapping, a technique for observing and mapping forest damage from an aircraft.Although ADS had been criticized for its imprecision at fine scales due to reasons, such as misregistration and the subjectiveness of the operators [23], it is reliable when used at coarse scales [52].A pilot study initiated in the Rocky Mountain Region demonstrated that all three types of accuracies for the MPB detection in lodgepole pines exceeded 80% when the spatial tolerance was increased to 500 m [52].We thus had confidence that the epicenters and patterns of spread derived from ADS polygons could be used as a reference dataset to compare against our results.We also altered the size of the training sample dataset from 106 to 1060 in increments of 10% and tested the effects of sample size on accuracy.This produced 10 corresponding sets of results to test the effect of sample size on accuracy metrics for both the MLC and RF classifications.

Parameter Calibration for Temporal Segmentation and Decision Tree Labeling
Parameter calibration in the forest growth trend analysis involved two parts.At the temporal segmentation stage, the set of parameters producing the highest score from all 144 combinations was chosen (Table 3).By using the same set of training samples, we also determined the four threshold values in the DT.We set the allowed variability in NBR for stable segments (thre_mag1) to be 20 (the unit for all thresholds was NBR × 100) as reflected from Figure 4.The cut-off NBR value of vertices (thre_vertex1) to distinguish healthy forest from disturbances was 350.Threshold Thre_mag2, used to separate clearcuts from MPB mortality, was set at −150.A higher degree of overlap was observed between the post-clearcut and MPB mortality NBR values, and thus, we set thre_vertex2 to the third quantile of NBR values in clearcuts, which was 50.

The Performance of Forest Growth Trend Analysis
The forest growth trend analysis achieved an overall annual accuracy ranging from 86.74% to 94.00%, and the average was 90.31% (Table 6).Except for the user's accuracy for MPB mortality (84.81%) and the producer's accuracy for clearcuts (77.30%), the mean value for all types of accuracies was above 85%.Larger standard deviations of accuracies among different years were found in the producer's accuracy for healthy forest and the producer's accuracy for clearcuts.As indicated from the confusion matrix (Table S2) and Figure 5, healthy pixels were well distinguished from the two disturbance types in the first few years and became less separable from MPB mortality after 2005.Pixels with MPB mortality were confused with either healthy or clearcut classes, but the proportion was not high.Confusion between clearcuts and healthy classes occurred in the first few years of our study and then between clearcuts and MPB mortality in later years.Visual examination showed that the shifting of MPB outbreak epicenters and the expansion of the outbreak boundaries was congruent between the two independent datasets (ADS and validation points).Maps of the onset year of MPB mortality revealed that the MPB outbreak in Grand County was initiated at multiple locations and spread outward from those locations over time to affect nearly all habitats with suitable hosts (Figure 6).

Comparison with Single-Date Classification and Sample Size Effects
The accuracy achieved by the single-date classification ranged between 37.20% and 75.20% and was lower than the time series classification, running on either the raw or combined spectral bands trained by the same set of samples (Table 7).We also observed that the time-series classification accuracies were relatively stable over time.In contrast, the single-date image classifications had high year-to-year variability in accuracy, which was shown by a distinct, increasing trend of the producer's accuracy for MPB mortality from 2005 to 2011.The producer's accuracy for healthy forest decreased substantially over the same time period.Table 7 also listed the accuracies generated from RF and MLC with only the raw spectral bands and the combined use of raw spectral bands and spectral indices.The accuracy of the RF classification improved when the spectral indices were included as predictors, and this effect was not evident with the MLC.Between the two single-date classification methods, RF had better overall performance than MLC.By varying the number of training samples used to train the MLC and RF classifiers in 10% increments to a maximum of 1060 training samples, we obtained 10 sets of results for the single-date classifications.Figure 7 shows the relationships between accuracy and sample size, for overall accuracy generated from the MLC and RF classifiers, as well as the producer's accuracy and the user's accuracy for the MPB mortality class.Despite some small fluctuations, all accuracies increased with the training sample size, and the changes were especially pronounced for the initial increases in sample size.The highest overall accuracies were achieved when 100%, 100% and 90% of the training samples were used by RF (89.43%, 90.48%, 88.66%) and 100% of the training samples were used by MLC (88.41%, 82.93%, 85.90%).This improvement was substantial compared to the results based on the original set of 106 training samples.The overall accuracy for RF increased by 30% on average from the smallest to the highest training sample size, 8% for the producer's accuracy and 33% for the user's accuracy.For MLC, the average increase was 40% in overall accuracy, 31% in the producer's accuracy and 43% in the user's accuracy between the smallest and largest training sample sizes.We also noticed that after a certain training sample size, the accuracy stopped increasing or even slightly decreased, and the saturation position was around the 40% sample size (or 424 points) for all accuracy metrics and both classifiers.Accuracies after the saturation point were comparable to the time series classification results.

Discussions
For the single-date classification, we found the following: (1) The RF achieved better overall accuracy than MLC in detecting forest disturbances.MLC relies on assumptions about the data distribution (e.g., normally distributed), whereas the ensemble learning techniques used by RF do not [53].Meanwhile, by grouping many individual classifiers, RF combined the strengths and minimized the weaknesses of each [54].(2) MLC is less affected by the selection of input layers than RF.RF relies on all outputs produced by each individual tree that is built upon the random selection of input layers, and thus, a larger amount of features could create a higher probability of -irrelevant‖ features being selected; but, when this occurs, it is usually negated by using the ensemble of results from individual trees.(3) The accuracies of single-date image classifications showed prominent year-to-year variability.The omission error for MPB mortality declined consistently from 2005 to 2011, whereas the omission error for healthy forest increased significantly.During this period, Grand County was in a steady transition from a landscape dominated by healthy forests to a landscape dominated by -grey-stage‖ forests.Not coincidently, the declining trend in the proportion of healthy forest matches the declining tendency of its omission errors over the years.
From the experiment on the sample size effect, on the one hand, we could determine the optimal sample size in the single-date MPB mortality mapping, which, to our knowledge, has not been discussed before.Both MLC and RF were sensitive to sample size, and several studies are in accordance with our conclusion [55][56][57].Parametric algorithms are well known for their requirement of representative samples, and this requirement is usually satisfied by having a sufficiently large sample size [58].Studies on sample-size effects for non-parametric classifiers are relatively rare.Rodriguez-Galiano et al. [59] observed a significant reduction in overall accuracy and the kappa values when the sample size reduced more than 50% of a non-parametric RF applied to Landsat images.
In our case, if we take both accuracy and labor/economic/time costs into consideration, the most optimum sample size for single-date classification is four to five times the original training sample size, which included 106 samples.On the other hand, we could compare the efficiency of single-and multi-date classifications.By using the same 106 training samples, our time series approach substantially outperformed the single-date classification, and their accuracies were similar only after expanding the training sample size to a large number (e.g., 10 times), which matches the finding from another study concluding that either single-or multi-date methods in Landsat-based forest disturbance mapping could result in high classification accuracy [60], but our results confirm their findings only for large training sample sizes.
Although the forest growth trend analysis method is promising in many ways, several shortcomings exist: (1) Regardless of the high overall accuracy, the commission error for clearcuts and the omission error for beetle mortality were relatively high.The commission errors in clearcuts could be caused by the spectral similarity between post-clearcut pixels dominated by successional vegetation and healthy forest pixels.Errors of commission for MPB mortality usually happened in areas hosting a mixture of grey stage trees and bare land, which could lead to a misclassification to clearcuts.(2) Parameter calibration was the most time-consuming part in the whole procedure and requires the expertise of image analysts.In some sense, the identification of various disturbance types using temporal trajectories is more challenging than the visual interpretation of false-color or true color image composites.Several potential solutions include the use of a tool called TimeSync, which was developed to aid with the temporal segmentation calibration and to verify the output from Landtrendr [60].In addition, the Forest Inventory and Analysis (FIA) Program of the U.S. Forest Service consistently collects information on status and trends in forest species, health status, in total tree growth, mortality and removals by harvest.If FIA plots, or similar types of data, and their exact locations are available to users, they could aid the identification of the plots' disturbance history.In terms of the labeling, although the manmade decision tree, as shown in Figure 3, can be easily interpreted, a machine learning-based decision tree could be employed to make this step fully automatic.

Conclusions
Reconstructed forest disturbance histories are needed for a range of applications, and our paper tested an effective and efficient approach to generating disturbance histories using the Landsat time series stack.Our algorithm was designed to take advantage of differences in the magnitude and direction of changes in spectral indices, which are proxies for the physiological processes following different types of forest disturbances.The site-specific accuracy assessment in the three validation years indicated that the overall accuracy ranged from 86.74% to 94.00%, and the mean value of all types of accuracies was above 85%.Especially, our method outperformed standard classification approaches (maximum likelihood classification and random forest) by 15.8% to 52.3% when small training sample sizes (106 samples per study area) were used, which is a cost-saving advantage.Another strength of our strategy is that most parameters, especially at the attributing stage, are easily interpreted and modified, which will especially benefit users tackling similar problems of labeling change detection results.
In conclusion, our results demonstrate the potential for a forest growth trend analysis to characterize forest disturbance events with an optimal sampling scheme, both in economic and time terms, at a satisfying mapping accuracy level.Some short-term goals for our further studies will include improving the labeling step by replacing it with an automated machine learning technique, linking the magnitude of change with more ecologically meaningful properties, such as vegetation cover, extending the algorithm to the ecoregion scale, building spatio-temporal dynamic outbreak models on the time series maps with projections into the future.In addition to those applications, the spatially explicit information produced by our approach could also serve as raw material to facilitate other research to uncover new patterns of disturbances occurring on the landscape and to investigate the drivers generating the patterns of disturbances.

Figure 1 .
Figure 1.The study area used for our analyses.The Southern Rocky Mountain ecoregion is shown in blue, and Grand County, Colorado, USA, is shown in red.The imagery is a false-color composite of one Landsat image acquired on 21 August 2009.

Figure 2 .
Figure 2. Forest type map for Grand County, from (a) the LANDFIRE existing vegetation type data and (b) the number of clear Landsat observations over the period of 2000 to 2011.

Figure 3 .Table 4 .
Figure 3.The decision tree used to attribute temporal segments into the different disturbance classes.

Figure 4 .
Figure 4. Tukey boxplot for (a) the averaged annual normalized burn ratio (NBR, multiplied by 100) changes of stable, regrowth, MPB mortality and clearcut plots; (b) current vertex values of post-MPB mortality, post-clearcut events and healthy plots.The bottom and top of the box are the first and third quartiles, and the band inside the box is the median.The whiskers represent the 1.5 interquartile range of the lower and upper quartile.

Figure 5 .
Figure 5.The proportion of validation samples that were erroneously labeled in other classes.

Figure 6 .
Figure 6.(a) Classification results of the forest growth trend analysis in the years 2000, 2005 and 2011; (b) maps of the onset year, duration and magnitude of mountain pine beetle (MPB) mortality.

Figure 7 .
Figure 7. Relationships between accuracy and sample size for the maximum likelihood classifier (MLC) and random forests (RF) using multiple training sets.Graphs arranged from left to right display the trends of overall accuracy, the producer's accuracy and the user's accuracy for the mountain pine beetle (MPB) mortality class as the training sample proportion increases from 0.1 to one.Blue and solid lines represent RF results, whereas green and dashed lines represent MLC results.Red arrows indicate the mean accuracies produced by the forest growth trend analysis in the years 2005, 2009 and 2011.

Table 1 .
Characteristics of the Landsat images used in this analysis.

Table 2 .
Examples of normalized burn ratio (NBR) trajectories, National Agricultural Imagery Program (NAIP) data and in situ photos for healthy forest, MPB mortality and clearcut classes.

Table 6 .
Accuracy assessment results for individual years of the time series classification.

Table 7 .
Accuracy comparison among the time series classification, random forests (RF) and maximum likelihood classifier (MLC) with the same set of 2005, 2009 and 2011 validation samples.Only the mountain pine beetle (MPB) mortality class is included here.