Evaluating an Automated Approach for Monitoring Forest Disturbances in the Pacific Northwest from Logging, Fire and Insect Outbreaks with Landsat Time Series Data

Forests are the largest aboveground sink for atmospheric carbon (C), and understanding how they change through time is critical to reduce our C-cycle uncertainties. We investigated a strong decline in Normalized Difference Vegetation Index (NDVI) from 1982 to 1991 in Pacific Northwest forests, observed with the National Ocean and Atmospheric Administration’s (NOAA) series of Advanced Very High Resolution Radiometers (AVHRRs). To understand the causal factors of this decline, we evaluated an automated classification method developed for Landsat time series stacks (LTSS) to map forest change. This method included: (1) multiple disturbance index thresholds; and (2) a spectral trajectory-based image analysis with multiple confidence thresholds. We produced 48 maps and verified their accuracy with air photos, monitoring trends in burn severity data and insect aerial detection survey data. Area-based accuracy estimates for change in forest cover resulted in producer’s and user’s accuracies of 0.21 ± 0.06 to 0.38 ± 0.05 for insect disturbance, 0.23 ± 0.07 to 1 ± 0 for burned area and 0.74 ± 0.03 to 0.76 ± 0.03 for logging. We believe that accuracy was low for insect disturbance because air photo reference data were temporally sparse, hence missing some outbreaks, and the annual anniversary time step is not dense enough to track defoliation and progressive stand mortality. Producer’s and user’s accuracy for burned area was low due to the temporally abrupt nature of fire and harvest with a similar response of spectral indices between the disturbance index and normalized burn ratio. We conclude that the spectral trajectory approach also captures multi-year stress that could be caused by climate, acid deposition, pathogens, partial harvest, thinning, etc. Our study focused on understanding the transferability of previously successful methods to new ecosystems and found that this automated method does not perform with the same accuracy in Pacific Northwest forests. Using a robust accuracy assessment, we demonstrate the difficulty of transferring change attribution methods to other ecosystems, which has implications for the development of automated detection/attribution approaches. Widespread disturbance was found within AVHRR-negative anomalies, but identifying causal factors in LTSS with adequate mapping accuracy for fire and insects proved to be elusive. Our results provide a background framework for future studies to improve methods for the accuracy assessment of automated LTSS classifications.


Introduction
Forests contain 90% of the aboveground terrestrial carbon (C) stocks [1], and quantifying forest disturbance history is necessary to understand forest C fluxes.Disturbance events for forests, such as logging, fire, insect outbreaks and storm damage, typically result in a net C release to the atmosphere.It is important to know when the disturbance occurred, the type, intensity, extent and rate of recovery to be able to improve C-cycle dynamics simulated in terrestrial C models [2][3][4].North American forests have been found to be a large C sink that offsets global anthropogenic emissions [5][6][7].Over the past 100+ years, North American forests have increased aboveground biomass density from regrowth due to changes in land use; and this increase is thought to be due primarily to agriculture abandonment [8].This change in land use resulted in a large C sink [9] that could potentially comprise nearly half the global land C sink [6].Disturbance duration, interval and intensity alter forest age structures, reducing standing C stock.Multiple complex events can alter regional budgets, such as fire, which converts C to emissions [10], or logging, where C can be stored in wood products (e.g, building materials or furniture [9]).Biotic events, such as insect outbreaks, which can be regulated by climate, also can result in forest C stock loss due to mortality and successive fire [11].Remote sensing provides the quantitative means to assess regional forest changes relevant to the C-cycle from harvest, wildfire and insect outbreak by extending the assessment to areas that have poor records.
In this study, we focus on forests of the Pacific Northwest that experienced a sustained decline in productivity observed by the Global Inventory Modeling and Mapping Studies (GIMMS) version "g" Normalized Difference Vegetation Index (NDVI) data from the series of Advanced Very High Resolution Radiometers (AVHRRs) [12].Numerous studies have been performed in these forests to understand the extent and/or duration of declines in productivity on the ground and from space, from modeling C stock change and fluxes resulting from fires, to understanding forest age structure [13][14][15][16][17][18][19].Our investigation seeks to map disturbance events in these forests that experienced marked GIMMS (Global Inventory Modeling and Mapping Studies) NDVI (Normalized Difference Vegetation Index) declines during the 1982 to 1991 period.We focus on this period, as it could be from large-scale harvest events that occurred during the late 1980s on private industrial forest lands [20,21].Another possible cause of the decline was a widespread insect outbreak in ponderosa pine (Pinus ponderosa) [22].Insect outbreak could lead to different C pool pathways altering regional C budgets.Bulk forest C density can reside in different pools after disturbance, above or below ground, live or dead, all of which have different rates of flux to the atmosphere and produce different impacts on ecosystem C-cycling [23].A well-documented study of this ecosystem, amid a dense annual Landsat record with limited phenological influence, lends itself to the development of automated methods to classify and evaluate disturbance dynamic types over space and time that have had marked impact on C-cycle processes.We previously developed an automated procedure to capture harvest and insect outbreaks in the Laurentian forests of Wisconsin and Minnesota.This automated approach had 0.72 user's and 0.63 producer's accuracy for capturing insect mortality events, but had limited accuracy (0.56 user's and 0.36 producer's) in capturing insect outbreak that did not cause mortality, because of the temporally sparse Landsat Time Series Stacks (LTSS) [24].In this study, we use a more robust time series that had limited phenology variability and a nearly complete annual anniversary record to evaluate the accuracy of this approach.Our method differs from prior investigations, because we intend to identify both stand replacement disturbance events of fire and clear cuts, which have been identified in the past [20,21], to more subtle and difficult to distinguish long-term regional events, such as insect outbreaks that cause mortality.Insect disturbance has become more widespread in western North American evergreen forests [25], with a recent widespread severe outbreak in British Columbia that has shown substantial loss of ecosystem productivity [11,26].Developing an approach that can capture multiple disturbances in this region could provide a valuable land cover information tool that could be used in biogeochemistry models to resolve the status of C sinks and sources.
Most remote sensing studies have focused on fire disturbance and logging, due to the ease of extraction of large changes in surface reflectance, producing reliable thematic change maps.However, few have addressed insects, pests and pathogens due to the difficulty of discriminating between background populations and outbreak events that can decimate keystone tree species.Widespread bark beetle (Coleoptera: Curculionidae) outbreaks, including mountain pine beetle (Dendroctonus ponderosae) [27] infestations, have occurred in the past in Pacific Northwest forests, but their scale, intensity and the area affected are increasing due to a warmer climate [25].Mountain pine beetle, in combination with spruce beetle (Dendroctonus rufipennis) and pinyon Ips beetle (Ips confuses) have impacted the western U.S. forest C-cycle more than trees killed by fire [28].The area of attack in U.S. forests often annually exceeds one million hectares, and the outbreak area has risen dramatically since 2000 in the western U.S. [22].
Bark beetles are ecologically important to development, senescence and regrowth of western forests.They inhabit many types of evergreen species, including pine species of ponderosa, lodge pole (Pinus contorta), western white (Pinus monticola) and limber pine (Pinus flexilis), among others.
During outbreaks, female beetles bore into the bark of trees and may lay >100 eggs.As larvae hatch, they construct feeding galleries in the phloem (inner bark), that eventually girdle and kill the tree by cutting off the exchange of nutrients between the roots and crown [29].Nutrient restriction results in needle color change from green to red to yellow to gray over a period that can be monitored with satellite imagery (reviewed in Goodwin et al., 2010 [30]).Other types of disturbance, including fungus, such as white pine blister rust (Cronartium ribicola), which impact western white and limber pine, and Blue stain fungus (Grosmannia clavigera), distributed by the mountain pine beetle to lodge pole pine.Numerous other tree and shrub infectious and non-infectious diseases exist from fungal, bacterial, viral pathogens, nematodes and algae for which we do not attempt to account [31].
Integrated measurement of forest disturbances from multi-spectral remote sensing measurements has become easily attainable through the free distribution of orthorectified Landsat data [32].These data provide optical observations of the status of the Earth's land cover that is unmatched in duration and scale.Many studies have used the GeoCover decadal epoch of Landsat data [33] to quantify forest disturbances throughout the North American continent [34], while others have used a dense LTSS focused on specific regions to evaluate change vectors to understand forest health dynamics [35][36][37].A few studies have used time series data from AVHRRs and/or the Moderate Resolution Imaging Spectral radiometer (MODIS) data to identify forest disturbances, for the 1982 to 1999 and 2000 to 2006 periods, respectively [38][39][40][41].A more recent study has observed multiple disturbances for the entire 30+ year AVHRR record with trend changes in vegetation productivity [42].The use of Landsat and ancillary data to identify disturbance type in conjunction with the coarse resolution AVHRR data have also shown promise [24,43].Disturbance information could enable more advanced terrestrial C-cycle models to capture these C-cycle dynamics.
Many different disturbance agents reduce and change forest C stock, including climate-induced impacts from drought, storm damage and fire, to more direct biophysical impacts from human deforestation, to insect pests and pathogens, which have a wide variety of implications, from defoliation to mortality [4,44].Assessment is critical to the C state knowledge of forested ecosystems [9].Many existing remote sensing studies of forest disturbance using LTSS lack causal information that is critical for C-cycle models to quantify C-cycle dynamics [35,45,46].For example, Huang et al., 2010 [35], developed an algorithm, vegetation change tracker (VCT), and implemented it wall-to-wall for the continental U.S. It has been shown to be very effective at mapping clear cut harvest, but has also been noted to have difficulty in complex heterogeneous landscapes [47].This algorithm uses the concept of normalized values for pixels based on the mean and standard deviation of known forest pixels from the same scene and uses multiple images to improve the forest/non-forest classification.Another algorithm, LandTrendr, by Kennedy et al., 2010 [46], used an automated trajectory-based image analysis that allows users to automatically identify and segment different pixel trends in land cover change through time.However, both of these algorithms do not define the disturbance type.Meigs et al., 2011 [48], used the normalized burn ratio (NBR) with LandTrendr to characterize Landsat spectral trajectories associated with defoliators and bark beetle disturbances, and related measurements to airborne surveys and ground-based measurements of insect-caused tree mortality in the same region as this study.They documented a number of different spectral trajectories that are associated with insect disturbance.However, they did not map the distribution nor report the detection accuracy with independent reference data using good practices [49].
To investigate productivity declines in Pacific Northwest forests observed by GIMMS "g" AVHRR NDVI, we developed a series of questions.The overall goal was to advance the application of automated approaches to extract causal factors from LTSS and resolve the capabilities of detecting, monitoring and reporting with uncertainties the forest disturbances that have C-stock consequences.Our specific objectives were to answer the following questions: 1. Are long-term productivity declines in Pacific Northwest forests observed with GIMMS "g" AVHRR NDVI an error or real? 2. Can an image classification algorithm be used on annual anniversary LTSS with common spectral indices to extract forest disturbance type?If so, what is the change map accuracy with uncertainty?3. What were the dominant agents of disturbance at the coarse scale of AVHRR observations?
Here, we intend to understand and identify dominant drivers of large-scale declines in GIMMS NDVI in Pacific Northwest forests.Many of diagnostic models that use AVHRR NDVI to estimate Net Primary Productivity (NPP) do not have Land-Cover Land-Use Change (LCLUC) incorporated in them.Disturbances, both natural and anthropogenic, are implicitly included through variability observed by AVHRR.Changes in forest cover area can have large impacts on modeled carbon fluxes [50], and integrating cover type dynamics could result in the greater accuracy of post-disturbance aboveground biomass density.Our objective was to evaluate if a simple and efficient algorithm developed for another U.S. forested ecosystem [24] is sufficient for providing disturbance type information for Pacific Northwest forests.We also highlight the importance of a good practice accuracy assessment to inform mapping uncertainty.

Study Area
Our study region, the Cascade Range of upland evergreen forests of Oregon and California, contains dense forest stands with some of the most productive vegetation within the United States [51], which are actively managed for timber production.Productivity declines during the 1980s have been assumed to be driven by anthropogenic and natural disturbance events, as they are stochastic in time and space throughout this region.Numerous events have been reported over the past 25 years, mostly due to increased post outbreak salvage logging rates, resulting in larger cut sizes and increased openness of the landscape [20].Wide-scale disturbances have been reported by forest inventory assessments and other Landsat-based studies evaluating forest harvest and fire disturbance [21].We believe these events were observed by historical NDVI declines in coarse-resolution AVHRR.Due to the complex nature of multiple disturbance events that have been observed at the coarse scale, we have attempted to understand and quantify these dynamics at the human Landsat plot-scale by defining the type of event.We extract the nature and trend of spectra upon specific forest cover types.We provide a map of our study region with data coverage in Figure 1.

AVHRR Data
Bi-monthly AVHRR NDVI data were used from the GIMMS version "g" [12], because a consistent inter-calibrated record is needed to calculate reliable anomalies containing changes in the photosynthetic capacity of vegetation that impact the C-cycle.NDVI was calculated as: These data, mapped to an 8-km spatial resolution, have been processed to account for orbital drift, minimize cloud cover, compensate for sensor degradation, be consistent among many AVHRR instruments and correct for the effects of stratospheric volcanic aerosols [12,52,53].Two steps of analysis were performed using these data, including: (1) calculation of anomalies; and (2) identification of similar bi-monthly NDVI trends with Landsat data.Anomalies were calculated in a similar manner to other studies, as the average growing season (May to September) value with a least squares linear fit per-pixel from 1982 to 1991, with values excluded from analysis that do not meet the 98% confidence interval fit [54,55].The Pacific Northwest area was selected for study because: (1) it comprised a large contiguous forest region that had greater than 2000 km 2 of forests with a negative NDVI trend less than −0.05; and (2) near annual Landsat 5 data exist, as well as aerial photo accuracy assessment data and aerial insect detection survey data (ADS) are available.

Landsat Data
Landsat 5 data were acquired from the United States Geological Survey (USGS) [56] for a sixscene block, where a large concentration of AVHRR 8-km NDVI-negative 1982 to 1991 trends were present (Table 1).Selection focused on minimal cloud cover and peak growing season months.Data were orthorectified for time series land cover change analysis.We selected data from July, August or September from 1984 to 1995, resulting in 10 to 13 images per path row for analysis.Each image was processed in the following way: (1) Landsat Ecosystem Disturbance Adaptive Processing System (LEDAPS) time series surface reflectance calculation [34,57]; (2) cloud and shadow screening building upon automatic cloud-cover assessment masks [58] output from LEDAPS; (3) spectral index generation of NDVI, tasseled cap (brightness, greenness and wetness), normalized burn ratio and a disturbance index; and (4) disturbance index dynamic characterization based upon its trajectory through time.We generated indices to evaluate the optical properties of disturbance, including: NDVI, tasseled cap (TC) (brightness   , greenness   , wetness   ), normalized burn ratio (NBR) and a normalized disturbance index (DI').These indices have been used to identify forest disturbances in other regions [59].NDVI from Landsat was calculated as: TC used coefficients from Crist and Cicone, 1984 [60], and were implemented for Landsat 5 surface reflectance data.Transforming images to the TC principal components space aided image normalization, because weighted components are sensor specific and not scene dependent [61].In our analysis, the third component,   , is considered the equivalent of shadow in a linear mixing model analysis and is orthogonal to   .We use these orthogonal components to identify the forest disturbances that we are interested in, specifically insect outbreaks, logging and fire.
Our DI' is based on the concept that a different spectral response of a regrowing forest stand following disturbance will have higher reflectance or   and a lower reflectance of   or shadows [62,63].We first normalized   and   indexes by subtracting the mean and dividing by the standard deviation of the 75% most stable NDVI forest pixels per image date.Then, similarly to Hais et al., 2009 [59], we calculated DI' as follows: It has been noted that forest stands impacted by bark beetles are better detected using indices that enhance the SWIR wetness, as it emphasizes shortwave infrared reflectance and has been identified as a reliable indicator of both forest structure and forest structure change [64][65][66], and using DI' can enhance the spectral response of insect forest disturbances.Forest stands impacted by bark beetles have been detected using indices that employ reflectance information from the 1.55-1.75-µmand 2.1-2.3-µmLandsat Thematic Mapper bands, and using DI' can better differentiate insect forest disturbances from logging [59].
To capture burn events, as frequently used in the literature [67], we use NBR, which has proven useful for identifying burn severity in a number of different biomes when a time series of Landsat is used [68,69].NBR is derived as the difference between the near-infrared and shortwave infrared to capture leaf chlorophyll and soil moisture changes after a burn [70] and calculated from Landsat as: NBR has been used to identify insect outbreak by Meigs et al., 2011 [48], and the contrast in between NIR and SWIR bands is similar to the DI' used in other studies [30,71,72].We note that NBR and DI' can have similar spectral trajectories and attempt to apply another approach to distinguish between burned area and insect outbreak.

Landsat Forest Mask
We delineated forest from agriculture through the simple mean of the time series threshold values on   and NDVI, located in 1992 National Land Cover Database (NLCD) forest cover classes, by applying a similar approach as Vieira et al., 2003 [73].A   threshold value of greater than 3500 was derived from the mean forest   from NLCD and removed young bright stands of forest less than 3 years old as well as fallow agriculture.A NDVI threshold of less than 0.45 removed sparsely-vegetated areas and was derived from mean NLCD forest cover classes, as well.We applied these thresholds, because the 1992 NLCD does not account for changes in forest cover over our 1982 to 1991 GIMMS NDVI anomaly study period.In addition, we used agriculture and wetlands classes from NLCD to exclude these areas from our analysis.

Cloud Masking
We employed an automated cloud and cloud-shadow detection method for Landsat data that was successfully used in our previous study [24].This approach largely avoided post-classification manual corrections for cloud and cloud-shadow artifacts and builds upon concepts developed in the prior automatic cloud cover assessment (ACCA) algorithm [57] to improve the discrimination of cloud and shadow on a per-pixel basis [58].This method relies on information from multi-date imagery, and clouds were identified as deviations from the mean TC brightness and thermal band values through time for each Landsat pixel.Pixels where the brightness values were more than two standard deviations above the long-term mean and the temperature more than one standard deviation below were identified as cloud tops.The solar zenith angle was then used to determine the direction to cloud shadows.Pixels in the direction of cloud shadows that experienced brightness declines of more than one standard deviation relative to the long-term mean were identified as cloud shadow.A final filter was applied to each 3 × 3 moving pixel window, where the center pixel would only be classified as cloud or cloud shadow if more than half of the nine pixels in the sample window were also classified as cloud or shadow.This filter reduced spurious per-pixel errors.We also buffered defined cloud and shadow with a 5 × 5 moving pixel window to account for edge artifacts.This approach of using the standard deviation through LTSS works best if the data are as near to the anniversary date as possible to limit phenology and changes in solar azimuth.

Landsat Disturbance Classification
To understand disturbance dynamic type, which is relevant to the ecosystem long-term C pathway, we employed two primary spectral indices in forested regions.DI' was first used to define anomalous change in forest cover that could be induced by logging, fire or insect outbreak.The spectral trajectory derived from the Landsat stack of DI' we believe contains important information on disturbance type.Rapid declines in DI' from one year to the next we believe are typically logging or fire events; analogously sustained declines could be a result of forest thinning, insect outbreak or stress.We used this simple spectral vector approach to extract the disturbance dynamic type in an automated manner and to check its accuracy.We provide a simplified schematic of the process (Figure 2) and the hierarchical thresholds of our decision tree at each junction in the classification.
where (∆′) is equal to change in annual normalized disturbance index subtracted from one year � ′  � to the next (′ +1 ).
∆  = ∆′ >  ℎ (6) where logging disturbance (∆  ) is equal to a change in annual normalized disturbance index (∆′) greater than an adjusted logging DI' threshold value ( ℎ ) determined from training data and ensemble iterations.Within the remaining Ld pixels, ∆NBR is used to determine if the event was a result of fire disturbance and represented by the following equation: where (∆) is equal to change in the annual normalized burn ratio subtracted from one year �  � to the next ( +1 ).
∆  = (∆′ − ∆) >  ℎ (8) where fire disturbance (∆  ) is determined by ∆ DI' locations that have had a substantial ∆NBR beyond an estimated threshold ( ℎ ).∆ DI' and the ∆NBR have similar values in logged and burned locations due to similar shortwave infrared reflectance.To reduce the amount of logging and fire disturbance agreement between indices that increased classification error, we applied a size threshold to NBR to capture large fire events greater than 10 hectares, which are relevant to modeling forest C emissions.We found that, due to the small patches of forest clear cuts in our study region and the lack of large fire events during our study period, as confirmed with Monitoring Trends in Burn Severity (MTBS), applying a threshold for patch size could be used as a tool for fire and harvest discrimination.MTBS mapping methods require analyst interpretation of burn perimeters and do not attempt to include small fires, less than ~200 hectares, as ~95% of burned area is captured when excluding those events [67].Our threshold approach is specific to this region for this time period, as the disturbance regime in both size and frequency could render this simple filter ineffective.MTBS data also report that a minimal amount of burned area occurred in our study region over the 1984 to 1995 period, and because of this, we focused our efforts on improving the detection and discrimination of insect outbreak and logging.Forest insect outbreak detection was applied with linear regression on a per-pixel basis in locations with declining over time.Prior studies have found ∆ DI' to be sensitive to insect outbreaks [71].A DI' decline through time over a maximum interval of five years we designated as an insect outbreak.Our assumption is that this approach will efficiently distinguish insect outbreaks from fire and logging, which often have a strong decline within one or two years, followed by an extended recovery.This approach is limited to events that occur within multiple years and is dependent upon at least three successive years of DI' change; shorter duration events could be missed as the average mortality time from bark beetles in Colorado was three years [74].To reduce computational time, regressions on the DI' were only run on pixels that had a decline of less than −750 and a slope () of less than −150.Producing separate regressions based upon this metric allowed independent alteration of decision tree splits to optimize classification accuracy.
We defined insect disturbance with a trajectory approach applying the following equation: where (∆Id) is a change in insect disturbance for a maximum five-year period.First, to reduce computation time, the change in DI' over time (∆DI'5 yrs) was defined as <−750.A linear regression was then fit to each pixel meeting this criterion.Pixels were classified as insect disturbed if the slope over time ( 5  ) was less than −150 with a significance ( 5  ) below an ensemble threshold ( ℎ ) ranging from 0.01 and 0.05 increments.The date of disturbance is labeled as the last year in the decline sequence.Note that a limitation exists with this method, as pixels with less than three values per five-year interval are not computed.These pixels could be contaminated by clouds or cloud shadow.This method was applied incrementally for each Landsat stack (i.e., 1984 to 1988 and 1985 to 1989).We established the change in slope values by using ancillary insect air survey data for training.

Figure 2.
Chart illustrating the image classification procedure adapted from our previous study [24].
We also found through initial accuracy assessment of preliminary outputs that determining salvage logging from natural disturbance was a critical component to enhancing accuracy assessment, because many locations had an initial disturbance signature followed by a rapid decline.Without a salvage logging class, many cleared sites were defined as insect outbreak.To account for this, we defined those events with the following equation: where �∆  �, salvage logged, has a decline half as great as the logging threshold.All salvage logging classed pixels were then later reclassified as logging disturbance for accuracy assessment purposes.

Running Landsat Classifications
We applied a threshold to determine an initial value to identify burn events in our decision tree classification and compared it to independent data for a few select regions.We first identified any obvious errors through a basic visual assessment to ensure that there was value in proceeding in more detail with an ensemble of calculations.We ran our classification with an ensemble mode, producing eight iterations by varying thresholds of ∆DI' (−3000 = moderate intensity, −2000 = low intensity) for observed spectral logging signatures, ∆NBR (−200 = low severity, −350 = moderate-low severity) for fire and p (0.01, 0.05) for systematic forest insect decline, in effect adjusting tree splits of change and no-change, and then compared the resultant classifications to our reference dataset to provide optimized accuracy assessment.This produced 48 classifications, from eight iterations over six path/rows, for error assessment with all classification ensembles sieved to 90 × 90 m to reduce speckle in classified outputs.We believe this approach justified decision tree splits statistically with reference data as opposed to introducing analyst biases in the generation of training data.This approach also provides error bounds of classification accuracy.We then compared all six maps to our high-resolution (<2 m) air photo dataset.

Multi-Temporal High Resolution Air Photos
LTSS studies are often restricted in their ability to have a comprehensive independent reference dataset, and many studies use the Landsat data itself with analyst interpretation to validate their classification [20,34,35].This limitation is imposed because of the lack of a dense annual time series in other higher resolution remote sensing products.To alleviate this, we used a combination of historical aerial interpreted photographs with disturbance maps from land management agencies.Change or disturbance-based classifications require time series reference data from which change areas are automatically segmented or manually digitized by experienced image analysts [34,35,39].We collected aerial photography from the National Historical Air Photo Archive (NHAP) for the early 1980s and the National Agriculture Imagery Program (NAIP) for the early 1990s to validate the logging and fire disturbances that we detected.A subset of one NHAP/NAIP image stack was used for threshold training.We provide dates and image center coordinates in Table 2. Eight sites were selected with air photo stacks ranging from two to three images per site.Air photos were less than 2 m in resolution and were panchromatic and false color infrared.The air photo stack locations were limited by the availability of growing season imagery where multiple overlapping images existed.Partial disturbance found within a Landsat pixel was considered disturbed; if a mixture of disturbances was found within the Landsat pixel, the analyst determined the dominant agent.In mountainous regions with topographic variation >500 m, which is common in the Cascades, these data were orthorectified using Geomatica's Orthoengine, with camera calibration information derived from the USGS to remove terrain spatial error artifacts that could be introduced in automated per-pixel reference cross comparisons [75].

Monitoring Trends in Burn Severity Data
Additional datasets were also used when assessing time series-related classification products.Monitoring trends in burn severity data [76] were used as an independent source to confirm the accuracy of our fire disturbance classification.Monitoring trends in burn severity is a multi-year project designed to consistently map burn severity perimeters at 30 × 30 m for fires larger than 400 ha across the U.S. from 1984 through 2012.These data are based on Landsat Thematic Mapper data and use NBR in a similar manner as this study.The NBR index has been found to be a useful indicator to define the spatial distribution of burn severity and has been actively used in a number of long-term studies [69].

Aerial Detection Survey Data
Due to the limited time record of aerial photography and the reduced ability to detect subtle multi-year insect disturbance, we used the Oregon Insect Disturbance Aerial Detection Survey Data (ADS) derived from the Forest Health Protection Pacific Northwest Region, United States Department of Agriculture (USDA) Forest Service [77], which has mapped annual insect outbreak from 1980 in Oregon.These data are derived from aerial surveys in which interpreters hand-delineate outbreak areas in conjunction with site visits on the ground to confirm accuracy.The primary disadvantage of these data is that they are subjective with accuracy depending on the competence of the sketch mapper and the conditions under which the data were obtained [78,79].These uncertainties limit the ability of these data to be useful in quantitative simulations of ecosystem productivity with insect disturbance.
The assumption that the reference image used in photo interpretation is 100% correct is rarely valid [80][81][82].It is critical that the accuracy of the reference information be considered together with the overall accuracy of the classification.

Good Practice Classification Accuracy Assessment
We applied "good practice" recommendations for designing and implementing an accuracy assessment of our change maps and estimating area based on reference sample data.We applied three main considerations when we developed our classification reference routine sampling strategy, including: (1) sampling design; (2) response design; and (3) analysis.The primary components of the good practice classification evaluation that we undertook included: (i) implementing a sampling design that could achieve accuracy and area estimation while considering practical constraints on available ADS and air photo reference data; (ii) implementing a response design that provides sufficient spatial and temporal representation to accurately label each unit in the sample; (iii) implementing an analysis that is consistent with sampling and response design protocols; (iv) summarizing the accuracy assessment by reporting the estimated error matrix in terms of proportion of area, estimates of overall accuracy, user's accuracy (or commission error) and producer's accuracy (or omission error); (v) estimating the area of classes based on the reference classification of sample units; (vi) quantifying uncertainty by reporting confidence intervals for accuracy area parameters; (vii) evaluating variability and potential error in the reference classification; and (viii) documenting the deviations from good practice that could affect the results.We attempted to account for all of these components in our accuracy assessment.More details on how to construct a good practice accuracy assessment can be found in Olofsson et al., 2014 [49], who have presented a synthesis of the current science of accuracy assessment and area estimation.
The sampling design is the protocol for selecting the subset of spatial units that can be pixels and/or polygons that will form the basis of the accuracy assessment [49].To define the sampling design, a subset area within an image scene is often selected [34].Two-stage cluster sampling, sub-sampling within the original sample, is generally undertaken to reduce the amount of reference data required [83].Sampling units for image classification are typically clusters of pixels used as a single sample (e.g., a 3 × 3 pixel block) to reduce the effect of registration error between the reference data and map locations [84][85][86][87].We applied a stratified sampling design with a simple random automated sample and a systematic manual class-stratified sample to ensure that all change classes were adequately represented.We provide a diagram in Figure 3 that describes our workflow.
We analyzed classification accuracy from values extracted from an error matrix with area proportions, overall accuracy (a combination of both user's and producer's accuracies, related to commission and omission errors) [88][89][90], report the area of each class as determined by the map and assess the impact of reference data uncertainty on the accuracy and area estimates.All of these strategies were implemented in our semi-automated accuracy assessment.

Automated Classification Evaluation
Our reference routine was based on the analysis of both randomly-generated 90 × 90 m polygons and manually-selected polygons in pixels that we defined as forested within systematically-selected subset areas for each path/row-based image scene (Figure 1, yellow polygons; workflow in Figure 3).This approach ensured that classification output was checked for underrepresented forest change classes and ensured a distributed reference sample.Our sampling design consisted of five disturbed and five non-disturbed 90 × 90-m polygons randomly chosen to represent automatic reference selection and were visually assessed against aerial photographs to determine initial cover type and possible change.Stratified sampling was used to delineate ten polygons of cover type changes visually identifiable from aerial imagery; ten additional reference polygons were acquired for disturbance types that were not adequately represented.A final analysis of cover-type disturbance and date of change was determined from aerial photos with associated analyst confidence.Type of disturbance was broken into six categories: no change stable forest, logging, fire, insect, salvage logging and unknown, which could later be reclassified based on ADS and MTBS data.

Manual Classification Evaluation and Statistical Analysis
A number of parameters were added to the attribute table to summarize information from the classified image, such as the number of pixels associated with the major disturbance.In addition, information from insect outbreak and fire polygons was added to the attributes of the reference polygons with the percentage ranging for interpreters confidence for identifying the change type.This information aided in determining whether disturbances were associated with insect outbreak defoliation or fire, and the resultant tree mortality was visually identified in either the aerial photos or classified Landsat TM imagery.As a result, the visually-identified "unknown" class, reference polygons with low confidence for any change type were updated to known disturbance classes with ADS (>1 dead tree per hectare) and MTBS data acting as the ground truth.In addition the time series from ADS and MTBS were added as an attribute to reference data polygons to aid in determining associations with other disturbance classes.Reference polygons were converted to a raster and used to generate a per pixel summary of disturbance class for each reference polygon.All possible comparisons between disturbance classes of the classification and those identified from aerial photographs were populated and used to generate reference statistics.

Drivers of AVHRR NDVI Declines
Our automated classification approach revealed widespread disturbance dynamics similar to those reported in previous studies [20,91].However, we discovered additional time series information through simple linear regression of DI' to detect insect disturbance in many regions that had a long-term 1982 to 1991 decline in AVHRR NDVI (Figure 4).We selected classification iteration 8 to report our results.Our study region was comprised of 61,200 km 2 of forested area, with 1984 to 1995 disturbed area totaling 13,417 km 2 (22%).Within our forest defined area, logging accounted for 8745 km 2 , insect outbreak totaled 4252 km 2 and fire accounted for ~400 km 2 .
Peak insect disturbance occurred in 1989 and covered ~3% percent of the forested area in that year.In Figure 5, we provide an example of the AVHRR 8 × 8-km sub-pixel heterogeneity observable with our Landsat classification.This subset is a part of the Deschutes National Forest on the eastern flank of the Cascade Range, which has many strato-or composite volcanoes.The center of the image is Newberry, a large shield-shaped volcano.The Landsat imagery is displayed as a 4,3,2 red, green and blue band combination.Heterogeneous patch logging of ponderosa pine forest (classified in green) intermixed with our assessment of insect outbreak (purple) displayed in the 1984 to 1995 change layer.All insect disturbance followed by salvage logging was reclassified as logging, and no fire disturbance was identified within this subset.The lower right images correspond to the region of air photo date overlap between National Historical Air Photo Archive (NHAP) and National Agriculture Imagery Program (NAIP) used for accuracy assessment.

Ensemble Accuracy Assessment Error Matrices
The ensemble approach removed user interpretation biases that can develop when deciding threshold values for decision tree splits through semi-automated accuracy assessment of all ensemble outputs.This approach allowed quantitative assessment of classification splits, while enhancing the understanding of commission and omission error in the disturbance type classification.We applied a two-value variance split parameter to reduce analyst accuracy assessment effort, although more splits could potentially further enhance our classification accuracy.Our approach had moderate 0.65-0.77user's accuracy for harvest detection to low 0.19-0.42user's accuracy for insect outbreak with acceptable performance afterward (overall accuracy 0.74) (Table 3).
Ensemble classification accuracy assessment revealed a distribution in the decision tree threshold splits that were important to improve accuracy.We selected the last iteration, which had thresholds of ∆ DI' > −3000, ∆NBR of −350 and p < 0.05, because of how it represented map classes compared to other iterations.Overall user accuracy was 0.90 vs. 0.93 and producer accuracy 0.64 vs. 0.81.Details of our classification results are presented in Table 3 as an error matrix populated by estimated proportions of area.Figure 6 displays the commission and omission errors by cover-type as a percentage of the forested area domain.Forests 2014, 5 3187

Index Performance by Disturbance Type and Cover Type
The distribution of AVHRR negative trends by cover type was dominated by ponderosa pine (70%) and to a lesser degree by Douglas fir (18%) and fir spruce (12%).Most of the anomaly had strong negative values less than −0.05 with the distribution centering near −0.1 displayed in Figure 7. ∆DI' and ∆NBR appeared most sensitive to logging disturbances due to large changes in near-infrared reflectance.Fire disturbance had a strong signature, as well, which made it difficult to distinguish.The application of the segmentation spatial filter alleviated this problem substantially, as prior classification iterations before this method had introduced producers and users accuracies less than 0.5 for these change classes.This is a primary limitation to this spectral vector approach; redundant information exists between indexes and capturing distinct disturbance dynamics requires additional spatial-temporal and spectral analysis.We acknowledge that our interpretation of insect disturbance is limited to a specific response that the dominant species of ponderosa pine has had after a wide outbreak of beetle infestation, as identified with aerial surveys.Air survey data also reported spruce budworm outbreak in the region.Insect disturbance type could not be distinguished in Landsat with our approach, and their signal alone or in combination impacted the DI' signal by an unknown amount.This approach may or may not be consistent with other forest types or other types of insect disturbance.We provide an example of our classification with accuracy assessment data in Figure 8.

Spatial Disturbance Type Trends
Our annual Landsat time series classifications revealed a reduction in logging from 1984 to 1995.Logging and burned area determinations rely on ΔDI' and ΔNBR.The similarities of these indices in the shortwave infrared limit the ability to distinguish between burned area and harvest.NBR and DI' are highly correlated (R 2 = 0.72-0.77,p < 0.001), as found through a random sample of 10,000 pixels from Path 45 Row 31 for each year in the 1985 to 1994 time series.The abrupt nature of these events with an annual record limited our ability to differentiate burned area from logging with this algorithm.Insect disturbance cannot be classified prior to 1988, because three of five successive years are needed for the regression slopes.We were limited to the Landsat record and did not extend our analysis back to the multi-spectral scanner 1972 to 1982 period, because tasseled cap wetness cannot be calculated from these earlier data.We found a measure of agreement between our estimated insect outbreak areas with Bureau of Land Management (BLM) data, although their estimates indicated that 60% of the BLM forested area was affected from 1984 to 1995.

Discussion
We quantified forest disturbances from harvest, fire and insect outbreak in a block area of six Landsat scenes in Oregon and Northern California.Co-variation of ΔNBR and ΔDI' results from similar spectral reflectance in near-infrared and shortwave infrared Landsat bands limited the ability to distinguish between fire and logging disturbances.Similar large declines in leaf chlorophyll and shadow variations occur from harvest and burn disturbance, which make it difficult to distinguish between events.We could discriminate between fire and clear-cut harvest disturbance types with a 10-ha spatial filter, because most fires in our study area were above this size threshold, as found with monitoring trends in burn severity data, while few logged sites were this size or larger.We believe we achieved a low producer's accuracy of 0.23 ± 0.07 for burned area because of the similar spectral signal for ΔDI' and ΔNBR in near-infrared wavelengths.Forests in our study area affected by beetle outbreaks were defined with Landsat spectral bands and confirmed through ancillary data.Our low accuracy (user's 0.38 ± 0.05, producer's 0.21 ± 0.06) for insect disturbance we believe indicates that we are not capturing events in LTSS that were found in ADS; also, these events could be missed in the temporally sparse ~2-m air photo reference data.More work is needed to understand if a more dense LTSS record can capture post-outbreak insect disturbance.Future studies should investigate inter-annual needle color change to track the progression of insect outbreak, and this may be possible with a virtual constellation approach of merging records from Landsat 8 and Sentinel 2. Unfortunately, the Landsat record was not dense enough over the 1984 to1995 period for these path/rows to apply this approach.
We found that the GIMMS "g" NDVI record captures forest disturbances in the Pacific Northwest, primarily due to stand clearance events, which have large changes in near-infrared reflectance.We also found the ability of AVHRR to detect insect disturbance.We do not directly evaluate complex cascading events, such as insect damaged stands susceptible to fire disturbance, although we have implemented a salvage logging component, which reduced classification errors between insect outbreak and logging [92].Future efforts could evaluate temporal vectors for these types of events.Insect outbreaks that cause mortality are important ecosystem state variables that need to be accounted for in remote sensing observations that are integrated in modeling studies of forest C fluxes.
We observed the growth of a forest disturbance trend at the 30-m scale that invoked a negative AVHRR NDVI trend at the 8-km scale.The productivity trend was apparent when the growing season time series of Landsat is available to understand the spatial heterogeneity of disturbed and undisturbed stands at the 8-km scale.Disturbance heterogeneity is compounded because of reoccurring events and the growing extent of the disturbed area in the late 1980s into the early 1990s.We added a salvage logging class after we found that many forested pixels classified as insect disturbance later had a rapid decline in DI' that is indicative of salvage logging.This was necessary because of the dynamic interaction of disturbances in this region.If we did not include a disturbance change class, our mapping accuracies for the logging stratum would be greatly reduced.Oregon has a dense archive, unlike our previous efforts in Wisconsin [24], but the overall accuracy did not change (an overall accuracy for both studies >75%).Multiple disturbance agents were identified, including insects that reached outbreak populations, causing forest damage and mortality.We found insect damage or defoliation to be a widespread disturbance agent that cannot be distinguished between the two with a LTSS ΔDI', and we believe time series density has a direct impact on our results.Our low accuracy for insect outbreak could be due to our approach also capturing stress from climate, acid deposition, pathogens, etc., and the difficulty of identifying multiple types of insect outbreak in panchromatic imagery.The severity of the disturbance must be great enough to impose a multi-year change in surface reflectance with the statistical trend change approach.Image compositing and other more recent complex approaches by Brooks et al., 2013 [93], that use more multi-temporal Landsat data could prove useful in detecting degradation, thinning or insect defoliation events.
We found multiple LTSS path/rows where a strong long-term negative AVHRR NDVI anomaly existed, and our methods captured disturbance dynamics well, because of the strong spectral signatures of mortality and stand clearance.However, distinguishing between bark beetle and budworm defoliation impose additional mapping problems when using LTSS to define disturbance agents.One potential solution would be implementing multi-temporal high-resolution commercial data to map individual tree crowns through time.This has been successfully done by Wulder et al., 2008 [92], for monitoring mountain pine beetle mortality.Future studies could utilize an archive of commercial high-resolution data with LTSS to identify which insect outbreak types have discernible multi-temporal signals in LTSS.This has become more of a possibility with no-cost access to U.S. commercial high-resolution imagery by U.S.-funded researchers [93] and with WorldView-3 satellite, which has eight ~1-m multispectral bands and twelve 4-m shortwave infrared bands.The enhanced number of spectral bands through the shortwave infrared and ~1-m resolution of visible bands could provide the sub-pixel Landsat resolution information needed to distinguish disturbance type on a stem-by-stem basis.

Conclusions
A majority of the 1982 to 1991 GIMMS "g" AVHRR NDVI decline in the Pacific Northwest existed in forests, and we believe that a progressive increase of the clear cut area and insect disturbance at Landsat resolution reduced the AVHRR NDVI signal.We initially had three questions when we undertook this study.First, were the declines in AVHRR NDVI an error or real?From our LTSS classification results, we found widespread disturbance from logging and insect outbreak that induced forest stand mortality followed by salvage logging.A large proportion of forest disturbance in the region occurred during the negative AVHRR NDVI trend.Second, can an algorithm be used on annual Landsat time series data to extract forest disturbance type?We found that our approach can capture stable undisturbed forest and harvest for multiple path/rows, but only harvest events have adequate accuracy in this region using this approach.Fire disturbance was minimal in this region over our period of study and could not be easily distinguished from logging with DI' and NBR.Widespread insect infestation was observed with airborne sketch mapping surveys, but not readily captured in our disturbance classification due to the variation in outbreak intensity.An increased defoliated area and/or insect-induced mortality could reduce NDVI through the reduction of leaf area and increased background soil reflectance.This result we believe was due to the difficulty in classifying multiple types of insect disturbance in temporally-sparse air photo classification evaluation data.BLM air surveys suggest our estimate to be low, but it is known these data overestimate the insect disturbed area.Third, what dominant disturbance agent caused the NDVI decline at the coarse scale?We found a combination of disturbances from clear cut harvest, insect outbreak-induced mortality and burn scars in patches of a heterogeneous landscape and suspect that they all have combined to reduce the large-scale productivity of forests in the Pacific Northwest from 1982 to 1991.Our results for overall accuracy are comparable to our previous study in the Laurentian forests in Minnesota and Wisconsin [24], although this region has had NDVI recovery over the 1990s and 2000s, and it appears as though an annual anniversary LTSS record does not substantially increase classification accuracy for insect outbreak, as our trend approach may be capturing other disturbances that have imposed forest stress.
We believe our approach is important to future forest disturbance mapping studies, as it provides a robust area-based accuracy assessment and implements a stratified sampling design using air photo reference data that uncovered the limitations of this approach for mapping forest disturbances.Most modeling studies use static forest land cover dynamics, and our method provides the means to include location-specific disturbance information that could improve terrestrial biogeochemistry models.More research is needed to understand the spatial temporal signatures in LTSS for different disturbances, forest types and regions.We demonstrated the difficulty of transferring change attribution methods to other ecosystems and found that this has implications for the development of large-scale automated detection/attribution approaches that are applied over a range of disturbance types and ecosystems.The accuracy of mapped disturbance information provided to C modeling could vary from ecosystem to ecosystem if the same approach is applied.Annual Landsat monitoring is now practical for large forested areas due to new data policies and improved computational resources.Combining coarse resolution remote sensing as a "hotspot" detector with temporally-dense Landsat data to understand the disturbance dynamics could potentially reduce large C-cycle uncertainties that result from static forest cover information used in models [50].The high spatial and temporal specificity that Landsat data provides on forest cover extent could also improve our ability to estimate the C content of forests when combined with LiDAR and SAR (synthetic aperture radar) data.

Figure 1 .
Figure 1.Global Inventory Modeling and Mapping Studies (GIMMS) Normalized Difference Vegetation Index (NDVI) anomalies with a negative trend less than −0.05, for 1982 to 1991 (black boxes), overlaid upon Landsat images displayed in a 4,3,2 RGB band combination for 1992.An aerial insect disturbance survey from 1980 to 1994 (purple and orange) and aerial photo accuracy assessment sites (yellow boxes) are shown.The lower right inset displays the 1992 National Land Cover Data (NLCD) for the region with a legend of cover types.

Figure 4 .
Figure 4. AVHRR NDVI time series for six path/rows with a mean of negative trends less than −0.05 indicted with green circles for growing season months May to September and the solid green line indicating the three-year running average.The number of AVHRR sampled pixels meeting the criteria is indicated by the title for each path row.Vertical grey lines indicate image dates, and the bimonthly standard deviation is shown in black circles.

Figure 5 .
Figure 5. Example of Landsat classification for a subset region of Path 44 Row 30, which covers one AVHRR pixel, displayed in 3D with the shuttle radar topography mission (SRTM) digital elevation model.The yellow inset box corresponds to the region of air photo date overlap between NHAP and NAIP used for accuracy assessment (Figure 1).

Figure 6 .
Figure 6.Error summary reported as the percent of domain.

Figure 7 .
Figure 7. (A) Stacked histogram of forest type distribution by negative AVHRR anomaly rescaled to 1 × 1 km, with an inset pie chart of the distribution of forest cover type with an AVHRR anomaly of less than −0.05.

Figure 8 .
Figure 8. Example of classification evaluation with high-resolution air photo data and air survey data located in the orange box displayed in Figure 1.

Table 1 .
Landsat 5 images used in classification.

Table 2 .
Aerial photo evaluation data.

Table 3 .
Error matrix populated by estimated proportions of area for iteration eight, ensemble threshold for logging ∆DI >−3,000, fire ∆NBR >−350 and insect outbreak p < 0.05.