Consistent Classification of Landsat Time Series with an Improved Automatic Adaptive Signature Generalization Algorithm

Classifying land cover is perhaps the most common application of remote sensing, yet classification at frequent temporal intervals remains a challenging task due to radiometric differences among scenes, time and budget constraints, and semantic differences among class definitions from different dates. The automatic adaptive signature generalization (AASG) algorithm overcomes many of these limitations by locating stable sites between two images and using them to adapt class spectral signatures from a high-quality reference classification to a new image, which mitigates the impacts of radiometric and phenological differences between images and ensures that class definitions remain consistent between the two classifications. We refined AASG to adapt stable site identification parameters to each individual land cover class, while also incorporating improved input data and a random forest classifier. In the Research Triangle region of North Carolina, our new version of AASG demonstrated an improved ability to update existing land cover classifications compared to the initial version of AASG, particularly for low intensity developed, mixed forest, and woody wetland classes. Topographic indices were particularly important for distinguishing woody wetlands from other forest types, while multi-seasonal imagery contributed to improved classification of water, developed, forest, and hay/pasture classes. These results demonstrate both the flexibility of the AASG algorithm and the potential for using it to produce high-quality land cover classifications that can utilize the entire temporal range of the Landsat archive in an automated fashion while maintaining consistent class definitions through time.


Introduction
The areal extents and distributions of different land cover types play a key role in many of the ecological and biophysical processes operating within the Earth system, including the absorption of solar radiation by the Earth surface [1], partitioning of absorbed radiation to latent and sensible heat fluxes [2], hydrologic processes such as infiltration and overland flow [3], biodiversity [4], and the movement of organisms within and among landscapes [5].In addition, land cover maps serve as critical inputs for constraining many models of Earth system processes [6][7][8].Remote sensing provides the means to map and monitor land cover and land use change (LCLUC) over large regions, at scales ranging from the regional [9] to the global [10][11][12][13].
Remote sensing of LCLUC possesses several challenges, particularly when classifying images from multiple dates.Developing accurate reference data for training and accuracy assessment of land cover maps is an expensive and time-consuming task with tradeoffs that often force researchers to sacrifice ideal data collection procedures in order to remain within budget and time constraints [14].The challenges inherent in the classification of land cover from single-date remotely sensed imagery are compounded when attempting to monitor LCLUC using multi-temporal imagery.For example, when classifiers are trained independently for each image, inconsistent class definitions among training datasets increase the potential for semantic errors in change detection.Differences in cover class designation between two maps from different dates could therefore result either from a genuine change in land cover or from spurious change due to inconsistent class definitions between the two maps.Automated approaches based on signature extension use class spectral libraries to map land cover in multiple images based on a single set of class-specific spectral signatures.These approaches attempt to enforce spectral consistency in class definitions between images, but are very sensitive to variation in illumination, sun-sensor geometry, and atmospheric and phenological conditions between images [15,16].These signature extension approaches therefore depend on time-consuming atmospheric correction procedures as well as anniversary date imagery which may not be available in some regions due to frequent or persistent cloud cover.Unlocking the potential of remote sensing for mapping and monitoring LCLUC hinges upon the development of algorithms that overcome these limitations in a consistent and efficient manner.
The automatic adaptive signature generalization (AASG) algorithm [17] offers the potential to develop automated, multi-temporal land cover classifications while maintaining consistent class definitions and adjusting to radiometric variation among images.AASG uses a simple change detection technique to identify pixels that remain stable between two image dates, using these as the basis for extracting class-specific spectral signatures from subsequent, spatially coincident imagery and a high-quality reference classification.These generalized signatures adapt to the radiometric idiosyncrasies of the target imagery and serve as automated training data from which a new classification can be obtained, thus maintaining consistency in class definitions between images despite potentially large radiometric differences.Gray and Song [17] found that, unlike traditional signature extension, AASG can produce accurate land cover maps even when atmospheric effects went uncorrected and when large phenological differences existed between images (e.g., summer-winter image pairs in a temperate region).
In this study, we refined the AASG algorithm initially developed by Gray and Song [17] in order to increase accuracy and consistency of multi-temporal image classification, specifically through the use of stable site identification parameters that are tuned for each individual land cover class rather than globally defined thresholding parameters.We also incorporated improved input data, including multi-season imagery and additional topographic information, and a random forest (RF) classifier in place of maximum likelihood classification.We tested the updated version of AASG, hereafter referred to as AASG2, in the Raleigh-Durham-Chapel Hill region of North Carolina, USA using Landsat 5 TM imagery (path 16, row 35), a digital elevation model (DEM) from the Shuttle Radar Topography Mission (SRTM), and the National Land Cover Database (NLCD) for reference land cover classifications and comparison.

Automatic Adaptive Signature Generalization
The automatic adaptive signature generalization (AASG) algorithm automates the workflow for conducting multi-temporal land cover classifications with minimal reference data requirements [17].AASG ensures consistency in class definitions between images by automatically adapting class spectral signatures to each image, thereby reducing the potential for semantic errors in LCLUC detection despite potentially large radiometric differences between images.Because frequent cloud cover may limit the availability of anniversary-date imagery, AASG's signature generalization approach allows temporally irregular imagery, including scenes in very different phenological states, to guide land cover change Remote Sens. 2016, 8, 691 3 of 15 detection.AASG operates under the assumption that within a sufficiently large scene, conversion from one land cover class to another is relatively rare, and thus the majority of the landscape will remain stable between the dates of a reference image (I 1 ) and a spatially coincident target image (I 2 ).
Stable sites are selected in two steps.First, a larger group of potential stable sites are selected from those pixels located near the mean of the image difference histogram (I 2 −I 1 ; Figure 1).Specifically, Gray and Song [17] identified stable pixels as those falling within a predefined number of standard deviations from the mean of the image difference histograms, such that µ k ± c × σ k , where µ k and σ k are the mean and standard deviation of the difference values for class k, respectively, and c is a globally defined thresholding parameter determining the width of the interval.Second, due to the potential for confusion arising from image misregistration and edge effects, especially along class boundaries in multi-temporal imagery, a class-specific erode filter based on a user-defined kernel is employed to ensure that only core pixels are retained as stable sites. of the landscape will remain stable between the dates of a reference image (I1) and a spatially coincident target image (I2).Stable sites are selected in two steps.First, a larger group of potential stable sites are selected from those pixels located near the mean of the image difference histogram (I2−I1; Figure 1).Specifically, Gray and Song [17] identified stable pixels as those falling within a predefined number of standard deviations from the mean of the image difference histograms, such that µk ± c × σk, where µk and σk are the mean and standard deviation of the difference values for class k, respectively, and c is a globally defined thresholding parameter determining the width of the interval.Second, due to the potential for confusion arising from image misregistration and edge effects, especially along class boundaries in multi-temporal imagery, a class-specific erode filter based on a user-defined kernel is employed to ensure that only core pixels are retained as stable sites.Once the sets of pixels corresponding to stable sites between the training and target images are determined, they can be coupled with their respective class labels from a high-quality reference classification (C1) and used to develop class-specific spectral signatures in the target image (I2).While the spectral signatures of stable cover classes may have changed greatly between the two image dates due to variation in atmospheric and ground moisture conditions, vegetation phenology, illumination/view geometry, etc., the identity of those classes will have remained unchanged.Thus, the class-specific spectral signature sampled from stable sites in the target image (I2) can be used as adaptive training data (in conjunction with the consistent class definitions determined from the reference classification) to parameterize a unique classifier for the entire target image (including non-stable sites) (Figure 2).The generalized signature derived for each target image is adaptive in that it reflects the specific radiometric profile of the target image, and automated because sampling in all subsequent imagery is entirely algorithmic and obviates the need for user-delineated training data and image-specific radiometric correction procedures.The accuracy and efficiency of AASG for performing multi-temporal image classification makes it especially attractive for large-scale operational approaches.Once the sets of pixels corresponding to stable sites between the training and target images are determined, they can be coupled with their respective class labels from a high-quality reference classification (C 1 ) and used to develop class-specific spectral signatures in the target image (I 2 ).While the spectral signatures of stable cover classes may have changed greatly between the two image dates due to variation in atmospheric and ground moisture conditions, vegetation phenology, illumination/view geometry, etc., the identity of those classes will have remained unchanged.Thus, the class-specific spectral signature sampled from stable sites in the target image (I 2 ) can be used as adaptive training data (in conjunction with the consistent class definitions determined from the reference classification) to parameterize a unique classifier for the entire target image (including non-stable sites) (Figure 2).The generalized signature derived for each target image is adaptive in that it reflects the specific radiometric profile of the target image, and automated because sampling in all subsequent imagery is entirely algorithmic and obviates the need for user-delineated training data and image-specific radiometric correction procedures.The accuracy and efficiency of AASG for performing multi-temporal image classification makes it especially attractive for large-scale operational approaches.

Class-Specific Thresholds for Stable Site Identification
In the initial version of AASG, hereafter referred to as AASG1, Gray and Song [17] tested global c thresholding parameters of 0.25, 0.5, and 1.0.However, for scenes with large variation in both the prevalence and within-class heterogeneity of land cover classes, these globally defined parameters may be inadequate.Rather, the thresholds for identifying stable pixels can be tuned for each land cover class to balance the needs for an adequate training sample size while minimizing the probability of contaminating the training set with mislabeled pixels.We therefore tested a simple approach that scales the proportion of pixels identified as stable for a given land cover class as a function of the total number of pixels in the scene that are assigned to that class in the reference classification.For land cover class k with Nk pixels in the reference classification, we define a set proportion, pk, of Nk to identify as stable sites: where nmax is a user-defined maximum number of pixels that can be identified as stable, and a and b are parameters that scale pk between 0.1 and 0.5.The proportion of pixels considered stable therefore declines as the prevalence of a given class increases, reflecting the need to achieve an adequately large sample size for less prevalent land cover classes while reducing the probability of incorrectly including unstable pixels in the training set as class prevalence increases.Thus, for land cover classes

Class-Specific Thresholds for Stable Site Identification
In the initial version of AASG, hereafter referred to as AASG1, Gray and Song [17] tested global c thresholding parameters of 0.25, 0.5, and 1.0.However, for scenes with large variation in both the prevalence and within-class heterogeneity of land cover classes, these globally defined parameters may be inadequate.Rather, the thresholds for identifying stable pixels can be tuned for each land cover class to balance the needs for an adequate training sample size while minimizing the probability of contaminating the training set with mislabeled pixels.We therefore tested a simple approach that scales the proportion of pixels identified as stable for a given land cover class as a function of the total number of pixels in the scene that are assigned to that class in the reference classification.For land cover class k with N k pixels in the reference classification, we define a set proportion, p k , of N k to identify as stable sites: where n max is a user-defined maximum number of pixels that can be identified as stable, and a and b are parameters that scale p k between 0.1 and 0.5.The proportion of pixels considered stable therefore declines as the prevalence of a given class increases, reflecting the need to achieve an adequately large sample size for less prevalent land cover classes while reducing the probability of incorrectly including unstable pixels in the training set as class prevalence increases.Thus, for land cover classes that constitute a very small portion of the landscape (N k ≤ 100), half of the pixels are used as candidate training pixels in AASG2 (Figure 3).However, we note that for such classes there may be high variance in the training set due to small N k and a higher likelihood of including unstable and mislabeled pixels, and it may be appropriate to consider removing such a class from further analyses or to combine it with a similar class.For land cover classes that comprise a large proportion of the entire scene (N k > 10•n max ), a maximum of n max pixels are identified as stable and used as training candidates.For classes with intermediate representation in the landscape, we scale the proportion of pixels used as candidates from 0.5 to 0.1 as N k increases.Once p k has been determined, the p k × N k pixels nearest to the class mean of the image difference histogram are considered to be stable.As in AASG1, the stable site maps are eroded using a 3 pixel by 3 pixel filter window to minimize errors due to potential image misregistration and mixed pixels along class boundaries.that constitute a very small portion of the landscape (Nk ≤ 100), half of the pixels are used as candidate training pixels in AASG2 (Figure 3).However, we note that for such classes there may be high variance in the training set due to small Nk and a higher likelihood of including unstable and mislabeled pixels, and it may be appropriate to consider removing such a class from further analyses or to combine it with a similar class.For land cover classes that comprise a large proportion of the entire scene (Nk > 10•nmax), a maximum of nmax pixels are identified as stable and used as training candidates.For classes with intermediate representation in the landscape, we scale the proportion of pixels used as candidates from 0.5 to 0.1 as Nk increases.Once pk has been determined, the pk × Nk pixels nearest to the class mean of the image difference histogram are considered to be stable.As in AASG1, the stable site maps are eroded using a 3 pixel by 3 pixel filter window to minimize errors due to potential image misregistration and mixed pixels along class boundaries.

Topographic Metrics
Topography is an important driver of soil moisture [18], vegetation distribution [19,20], and human land use [21].Since topography plays such an important role in the movement of water over and through landscapes, topographic indices can improve classification of land cover classes, particularly those characterized by surface water [22] and wetlands [23,24].In AASG1, the topographic wetness index (TWI) was used as ancillary input data.In AASG2, we used a 30 m DEM from SRTM to derive an expanded suite of topographic indices for use as predictor variables.In addition to elevation and slope, we also used the upslope accumulated area (UAA), a modified UAA index (mUAA), and the TWI as predictor variables in AASG2 classifications.UAA and TWI capture important aspects of water flow and accumulation and are frequently used in watershed hydrology and spatial scaling of ecosystem fluxes [25][26][27][28].We calculated UAA, mUAA, and TWI using the System for Automated Geoscientific Analyses (SAGA), a freely available geographic information system [29].

Multi-Season Imagery
In AASG2, we utilize imagery from multiple seasons per year as training data to increase the robustness of classification predictions of target imagery.The use of multi-season, rather than singledate imagery, mitigates against the potential for confusion and spurious classification especially for land cover classes that demonstrate large intra-annual variability and ephemeral reflectance characteristics [9,30].The use of multi-season imagery to inform class spectral signatures is especially

Topographic Metrics
Topography is an important driver of soil moisture [18], vegetation distribution [19,20], and human land use [21].Since topography plays such an important role in the movement of water over and through landscapes, topographic indices can improve classification of land cover classes, particularly those characterized by surface water [22] and wetlands [23,24].In AASG1, the topographic wetness index (TWI) was used as ancillary input data.In AASG2, we used a 30 m DEM from SRTM to derive an expanded suite of topographic indices for use as predictor variables.In addition to elevation and slope, we also used the upslope accumulated area (UAA), a modified UAA index (mUAA), and the TWI as predictor variables in AASG2 classifications.UAA and TWI capture important aspects of water flow and accumulation and are frequently used in watershed hydrology and spatial scaling of ecosystem fluxes [25][26][27][28].We calculated UAA, mUAA, and TWI using the System for Automated Geoscientific Analyses (SAGA), a freely available geographic information system [29].

Multi-Season Imagery
In AASG2, we utilize imagery from multiple seasons per year as training data to increase the robustness of classification predictions of target imagery.The use of multi-season, rather than single-date imagery, mitigates against the potential for confusion and spurious classification especially for land cover classes that demonstrate large intra-annual variability and ephemeral reflectance characteristics [9,30].The use of multi-season imagery to inform class spectral signatures is especially effective for classifications of phenologically variable cover types like agriculture.Without the ability of multi-season imagery to capture a potentially short growing season for certain crops, a single-date image from a pre-leaf-out or post-harvest date would likely fail to adequately capture crop cover [31].Similarly, the inclusion of both leaf-off and leaf-on imagery from temperate regions allows for better prediction of sub-canopy cover types, such as ephemeral water bodies or impervious surfaces, that would otherwise be obscured by the upper canopy in leaf-on imagery [32][33][34].The inclusion of multi-season imagery for training and prediction is in keeping with other aspects of the AASG algorithm which seek to minimize ephemeral reflectance characteristics and instead capture a generalized spectral signature that is consistent for the entire time period of the training dataset.

Random Forest Classification
In AASG2, we replaced the maximum likelihood classifier used in AASG1 with a random forest (RF) classifier.The RF, which consists of an ensemble of classification trees, is a machine-learning technique that can be used in a variety of contexts for classification of multidimensional datasets [35].Each "tree" in the "forest" is grown from a bootstrapped sample of the observations with the remaining observations left "out-of-bag" (OOB).The final classification for a new or withheld observation is then determined based on the majority vote of the trees.The OOB observations are also used to estimate the importance of each predictor variable by randomly permuting each predictor, thus destroying its information content, and measuring the increase in error of OOB observations as compared to the original classifier.RF classifiers are not as susceptible as individual classification trees to over-fitting or "overtraining" [35][36][37][38], and they are especially good at modeling nonlinearity and interactions among predictor variables [39].RF classifiers have demonstrated higher predictive accuracy than more traditional classification methods in both ecology [39] and remote sensing [40], and have demonstrated comparable predictive accuracy to other machine-learning methods (e.g., support vector machines) [36,41].In addition, RF models are relatively robust to error or noise in training datasets [35,37,42], making them particularly appealing for mitigating potential errors in stable site identification or assignment of class labels to stable sites.

Study Area and Remotely Sensed Data
We evaluated the performance of AASG2 in a 60 km × 60 km study area of the Research Triangle region in North Carolina, USA.The Triangle has experienced rapid population growth and land cover change over the past several decades, making it an ideal study area for validation of AASG2.Specifically, over the past thirty years, the Triangle Region experienced accelerating LCLUC trends driven by urbanization and interacting trends in timber and agricultural markets [43].Changes were unevenly distributed throughout the study area, with the greatest cover change occurring due to urbanization near cities like Raleigh, Durham, Cary, and Apex, primarily at the expense of surrounding agricultural fields and secondary forests [9].Agricultural and forest land cover-including planted pine, oak-pine mixed, and upland hardwood forests-experienced large transformations prior to and during the study period owing to urban development as well as conversion among the two cover types following market forces [9,43].
We obtained atmospherically corrected Landsat 5 TM surface reflectance over our study area from the LEDAPS archive [44,45].While atmospheric correction and conversion to surface reflectance are not necessary for the development of high-quality land cover classifications with AASG [17], it is necessary for the accurate calculation of spectral vegetation indices.We selected early-, mid-, and late-season images from three single-year windows centered on the target classification dates of 2006 and 2011 (Table 1).While we attempted to collect anniversary date images for each season, image dates were offset by a month or more due to the presence of clouds.Cloud-free summer images were unavailable in 2006 and 2011, so we instead used images from 2007 and 2010, respectively.We transformed each image in the multi-season stack into "brightness" (KT1), "greenness" (KT2), and "wetness" (KT3) components using the Kauth-Thomas tasseled cap transformation [46,47].For reference and comparison with the AASG classifications, we obtained the National Land Cover Database (NLCD) classifications of our study region for the years 2001 [48], 2006 [49], and 2011 [50].
We excluded developed open space (class 21) from further analyses since it is spectrally similar to other grass-dominated classes [51,52] and classification requires additional contextual information such as municipal boundaries [9].This class was masked from the reference classification, and any pixels classified as developed open space in NLCD were reassigned to their most likely alternative class based on the spectral signatures of the remaining NLCD classes.

Analyses
AASG requires a high-quality reference classification in order to extract class spectral signatures for a new image.Here, we used the 2001 NLCD as the reference classification (C 1 ) with a 2001 mid-season Landsat reference image (I 1 ).We then used both AASG1 and AASG2 to classify land cover for 2006 and 2011.First, for AASG1, we followed the original version proposed by Gray and Song [17].We used a global c thresholding parameter of 0.25 to identify stable sites in image difference histograms based on the reflectance in the red band (Landsat 5 TM band 3).In AASG1, we used a maximum likelihood classifier with ten predictors, including mid-season surface reflectance from the six mid-season solar reflective bands of Landsat TM plus four additional spectral transformations and ancillary datasets used in Gray and Song [17]: the normalized difference vegetation index, the simple ratio vegetation index, the structural index [53], and TWI from SRTM.Second, for AASG2, we used the updated procedures detailed in sections 2.2-2.5.Stable sites were determined from image difference histograms calculated from mid-season KT1 using class-specific thresholding parameters.We tested several values of n max for determining stable sites.AASG2 was based on an RF classifier with 14 predictors: three KT indices from each image for the three seasons plus five topographic indices (elevation, slope, UAA, mUAA, and TWI).
Gray and Song [17] showed that AASG is capable of producing accurate land cover maps, with overall accuracies that meet or exceed those of traditional signature extension approaches.To demonstrate that AASG2 further improves the ability to automatically update existing land cover classifications, we compared predicted land cover from AASG1 and AASG2 to NLCD land cover for 2006 and 2011, which were not used in the training process.While the NLCD also contains error and cannot be regarded as a "true" source of reference data, these comparisons provide a good benchmark for testing whether AASG2 improves the ability to extend high-quality classifications to other time periods compared to AASG1.Following standard accuracy assessment procedures, we present these results in confusion matrices, which are normalized by the proportion of the landscape that is mapped as each class [54].From these confusion matrices, we calculated overall agreement (OA), user's agreement (UA), and producer's agreement (PA) between the AASG and NLCD classifications for each year.

Results
AASG1 and AASG2 were generally successful in reproducing the land cover patterns of the NLCD in both 2006 (Figure 4) and 2011 (Figure 5).The major water bodies, which include Falls Lake in the northeast and Jordan Lake in the southwest, were clearly evident in all classifications.Both AASG algorithms reproduced extensive areas of woody wetlands to the north of Jordan Lake and west of Falls Lake in both years, though with varying degrees of success, as discussed below.The AASG models also reproduced extensive agricultural fields, primarily hay/pasture, in the northwestern portion of the study region.The major metropolitan areas of Raleigh, Durham, and Chapel Hill were successfully mapped by both AASG models.With few exceptions, the total areal extent of each class as mapped by AASG2 was more similar to the NLCD class extents than AASG1 (Figures 4 and 5, and Tables 2 and 3).This improvement is particularly evident for low intensity development, mixed forest, grassland/herbaceous, and woody wetlands.The extents of low intensity development and mixed forest were substantially underestimated by AASG1, while AASG2 and the NLCD mapped very similar proportions of these cover classes in both 2006 and 2011.The extent of grassland/herbaceous was overestimated by AASG1 and underestimated by AASG2 in both 2006 and 2011 compared to the NLCD.AASG1 overestimated the extent of woody wetlands in 2006 but underestimated its extent in 2011, while AASG2 produced woody wetland extents that were more stable from year to year and more similar to the NLCD.The most notable differences in proportional land cover between the NLCD and AASG2 occurred in the two agricultural classes.Relative to the NLCD, AASG2 overestimated the extent of hay/pasture and underestimated the extent of cultivated crops.
Overall agreement between AASG2 and the NLCD was higher than that between AASG1 and the NLCD in both 2006 (70.2% vs. 60.2%;Tables 2, S1 and S2) and 2011 (67.9% vs. 60.1%;Tables 3, S3 and S4).When aggregated from the modified Anderson Level 2 classification system to Anderson Level 1 (which combines the sub-divided developed, forest, agriculture, and wetland classes into a single class each), the agreement with the NLCD increased to 74.8% in 2006 and 76.1% in 2011 for AASG1 and 81.8% in 2006 and 80.9% in 2011 for AASG2.Overall agreement between the NLCD and AASG2 exhibited a monotonic increase with increasing nmax (Figure 6), suggesting that the RF model benefits from having a large training sample set that captures a large range of within-class variability.The ability to identify a large training dataset is a major advantage of AASG.Traditional manual delineation of training sets typically attempts to find the purest pixels representing a land cover type, therefore losing the inherent within-class variability.The high within-class variability included in large training samples would be a major challenge for conventional parametric classifiers, such as the maximum likelihood classifier.However, since the RF is a nonparametric machine-learning With few exceptions, the total areal extent of each class as mapped by AASG2 was more similar to the NLCD class extents than AASG1 (Figures 4 and 5, and Tables 2 and 3).This improvement is particularly evident for low intensity development, mixed forest, grassland/herbaceous, and woody wetlands.The extents of low intensity development and mixed forest were substantially underestimated by AASG1, while AASG2 and the NLCD mapped very similar proportions of these cover classes in both 2006 and 2011.The extent of grassland/herbaceous was overestimated by AASG1 and underestimated by AASG2 in both 2006 and 2011 compared to the NLCD.AASG1 overestimated the extent of woody wetlands in 2006 but underestimated its extent in 2011, while AASG2 produced woody wetland extents that were more stable from year to year and more similar to the NLCD.The most notable differences in proportional land cover between the NLCD and AASG2 occurred in the two agricultural classes.Relative to the NLCD, AASG2 overestimated the extent of hay/pasture and underestimated the extent of cultivated crops.
Overall agreement between AASG2 and the NLCD was higher than that between AASG1 and the NLCD in both 2006 (70.2% vs. 60.2%;Table 2, Tables S1 and S2) and 2011 (67.9% vs. 60.1%;Table 3, Tables S3 and S4).When aggregated from the modified Anderson Level 2 classification system to Anderson Level 1 (which combines the sub-divided developed, forest, agriculture, and wetland classes into a single class each), the agreement with the NLCD increased to 74.8% in 2006 and 76.1% in 2011 for AASG1 and 81.8% in 2006 and 80.9% in 2011 for AASG2.Overall agreement between the NLCD and AASG2 exhibited a monotonic increase with increasing n max (Figure 6), suggesting that the RF Remote Sens. 2016, 8, 691 9 of 15 model benefits from having a large training sample set that captures a large range of within-class variability.The ability to identify a large training dataset is a major advantage of AASG.Traditional manual delineation of training sets typically attempts to find the purest pixels representing a land cover type, therefore losing the inherent within-class variability.The high within-class variability included in large training samples would be a major challenge for conventional parametric classifiers, such as the maximum likelihood classifier.However, since the RF is a nonparametric machine-learning algorithm, the large within-class variability can be used in the classification, leading to improved classification accuracy.Both AASG1 and AASG2 exhibited relatively high agreement with the NLCD for the surface water class (UA ≥ 86.9% and PA ≥ 70.4%), which tends to be spectrally distinct from other classes and exhibits low within-class spectral variability, though AASG1 had a much higher UA than PA for this class.AASG1 had relatively low agreement with NLCD on low intensity developed (UA ≤ 70.3% and PA ≤ 64.9%).Pixels classified as low intensity developed by the NLCD were most often confused with medium intensity developed, evergreen forest, and grassland/herbaceous by AASG1.For both versions of AASG, there was considerable confusion among forest classes (particularly with mixed forest), though AASG2 had higher UA and PA than AASG1 for every forest class in both 2006 and 2011.When aggregated to a single forest class, both the AASG1 and the AASG2 classifications showed high agreement with NLCD classifications (UA ≥ 87.4% and PA ≥ 81.7%).AASG2 offered considerable improvement over AASG1 in the ability to map woody wetlands, which were frequently confused with deciduous forest by AASG1.Both AASG1 and AASG2 frequently disagreed with NLCD on mapping the least prevalent classes: barren land, shrub/scrub, grassland/herbaceous, cultivated crops, and emergent herbaceous wetland.For a given land cover class, the magnitudes of UA and PA from AASG2 classifications were typically quite similar.Asymmetries between UA and PA most often occurred for classes with low prevalence in the landscape (e.g., cultivated crops and emergent herbaceous wetland), with the exceptions of evergreen forest and pasture/hay, each of which had higher total areal extents in the AASG2 classifications than in NLCD and which had higher PA than UA.
The OOB training data from the RF model indicated that the mid-season "greenness" index (KT2) was the most important predictor variable for classification of land cover in our study region while, on average, topographic indices did not improve agreement between AASG2 and the NLCD (Figure 7).However, predictor variable importance varied substantially among classes.For classifying surface water, the accuracy of AASG2 declined by more than 20% when the KT1 "brightness" index from any of the three seasons was excluded.For the developed and forest classes, mid-season KT2 contributed most to AASG2 accuracy, though early-and late-season KT2 were also important predictors for the high intensity developed class.Multi-season imagery-including early-and mid-season KT1, mid-season KT3, and late-season KT2-was particularly important for mapping hay/pasture.The annual cycles of planting and harvesting likely imposed a phenological signal in this class that distinguished it from spectrally similar classes like grassland/herbaceous and low intensity developed, and the multi-season imagery was therefore able to separate these classes in AASG2.
While topography played a relatively minor role in mapping most classes, it was among the most important variables for mapping woody wetlands.This class tends to be spectrally similar to the other forest classes but occurs in distinctly different, lower elevation landscape positions where water is likely to accumulate and allow periodic saturation.Since high mUAA and TWI indicate regions where water is likely to accumulate, these two variables, along with elevation, improved the agreement between the NLCD and AASG2 (UA ≥ 60.2%; PA ≥ 50.7%).While this class was still frequently confused with other forest classes (particularly deciduous forest), the topographic indices included in AASG2 offered substantial gains in agreement over AASG1, which exhibited very little skill in mapping woody wetlands (UA ≤ 48.3%; PA ≤ 41.4%).
showed high agreement with NLCD classifications (UA ≥ 87.4% and PA ≥ 81.7%).AASG2 offered considerable improvement over AASG1 in the ability to map woody wetlands, which were frequently confused with deciduous forest by AASG1.Both AASG1 and AASG2 frequently disagreed with NLCD on mapping the least prevalent classes: barren land, shrub/scrub, grassland/herbaceous, cultivated crops, and emergent herbaceous wetland.For a given land cover class, the magnitudes of UA and PA from AASG2 classifications were typically quite similar.Asymmetries between UA and PA most often occurred for classes with low prevalence in the landscape (e.g., cultivated crops and emergent herbaceous wetland), with the exceptions of evergreen forest and pasture/hay, each of which had higher total areal extents in the AASG2 classifications than in NLCD and which had higher PA than UA.
The OOB training data from the RF model indicated that the mid-season "greenness" index (KT2) was the most important predictor variable for classification of land cover in our study region while, on average, topographic indices did not improve agreement between AASG2 and the NLCD (Figure 7).However, predictor variable importance varied substantially among classes.For classifying surface water, the accuracy of AASG2 declined by more than 20% when the KT1 "brightness" index from any of the three seasons was excluded.For the developed and forest classes, mid-season KT2 contributed most to AASG2 accuracy, though early-and late-season KT2 were also important predictors for the high intensity developed class.Multi-season imagery-including early-and mid-season KT1, mid-season KT3, and late-season KT2-was particularly important for mapping hay/pasture.The annual cycles of planting and harvesting likely imposed a phenological signal in this class that distinguished it from spectrally similar classes like grassland/herbaceous and low intensity developed, and the multi-season imagery was therefore able to separate these classes in AASG2.
While topography played a relatively minor role in mapping most classes, it was among the most important variables for mapping woody wetlands.This class tends to be spectrally similar to the other forest classes but occurs in distinctly different, lower elevation landscape positions where water is likely to accumulate and allow periodic saturation.Since high mUAA and TWI indicate regions where water is likely to accumulate, these two variables, along with elevation, improved the agreement between the NLCD and AASG2 (UA ≥ 60.2%; PA ≥ 50.7%).While this class was still frequently confused with other forest classes (particularly deciduous forest), the topographic indices included in AASG2 offered substantial gains in agreement over AASG1, which exhibited very little skill in mapping woody wetlands (UA ≤ 48.3%; PA ≤ 41.4%).

Discussion
Fully realizing the potential of the global remotely sensed archive for land cover classification depends on the development of efficient, accurate algorithms that can overcome challenges associated with radiometric differences among scenes, phenological differences in non-anniversary date imagery, and semantic differences among class definitions that occur when classifiers are trained independently.AASG overcomes many of the challenges inherent in classification of multi-temporal remotely sensed imagery by mitigating the needs for atmospheric correction and anniversary date imagery through the adjustment of spectral signatures to each individual image, by maintaining consistent class definitions throughout the image time series, and by providing an automated workflow.
The viability of AASG for multi-temporal image classification has been demonstrated in previous research [17].Compared to a photo-interpreted reference dataset, the initial version of AASG (AASG1) achieved an overall accuracy of at least 66%, which matched or exceeded the accuracy achieved with traditional signature extension approaches [17].In this study, we demonstrated that our updated AASG2 workflow produced maps with higher overall agreement with the withheld NLCD classifications in both 2006 and 2011 compared to AASG1.Additional experiments confirm that both the updated input data (i.e., multi-season imagery and additional topographic metrics) and the updated methodology (i.e., class-specific stable site identification parameters and the RF classifier) contributed to the overall improvement of AASG2 (Tables S5-S6).Most of the improved performance in 2006 was attributable to updated input data (Table S5), while the improvement in 2011 was mostly attributable to the updated stable site identification and classification methodologies (Table S6).
The improvements made to the AASG workflow further enhance the potential for developing consistent, temporally dense time series of land cover classifications from high-quality reference maps, such as the NLCD.Applications that require frequent land cover maps could therefore use AASG to extend the NLCD, which has a large spatial extent but low temporal frequency, over the entire Landsat archive in an automated manner that maintains consistent class definitions among different image dates.This research also highlights the flexibility of the AASG method for application with a variety of potential classifiers, including traditional parametric methods (e.g., maximum likelihood classification) and nonparametric machine-learning methods like random forests, neural networks, or support vector machines.AASG also handles as many or as few input variables as are available to the user, so long as those inputs are spatially compatible with one another (i.e., same resolution, extent, and projection).
While we explored a simple approach for generating class-specific stable site thresholding parameters, future refinements of AASG could implement approaches that optimize the selection of stable sites to maximize agreement with withheld reference data or to minimize overlap of spectral signatures among classes.Additionally, to date, AASG has been applied as a "hard" classifier (i.e., where each pixel is mapped as a single class).Previous research has demonstrated that sub-pixel land cover fractions can be derived from fuzzy membership functions [55,56] or the posterior probabilities of class membership from hard classifiers [57][58][59].The AASG algorithm developed in this research could therefore be adapted for prediction of sub-pixel land cover fractions using the posterior "vote" distributions from the individual classification trees in ensemble classifiers like the RF.

Conclusions
In this study, we improved the automatic adaptive signature generalization (AASG) algorithm through incorporation of a simple procedure for adapting stable site identification parameters to each land cover class (in place of globally defined thresholding parameters).We also incorporated improved input data (multi-season imagery and an expanded suite of topographic data) and a nonparametric random forest machine-learning classifier.We tested the updated algorithm in the Research Triangle region of North Carolina, where our refinements to AASG resulted in substantial increases in agreement with the National Land Cover Database (NLCD) maps from 2006 and 2011 compared to the original formulation.The areal extent of each class in the withheld NLCD classifications was better replicated by AASG2 than by AASG1, and these areal extents tended to remain more consistent and stable from year to year in the AASG2 classifications.Compared to AASG1, overall agreement between the withheld NLCD classifications and AASG2 was 10.0% higher in 2006 and 7.8% higher in 2011.The increase in agreement was concentrated within several classes (including woody wetlands, low intensity developed, and mixed forests) that AASG1 struggled to classify but that AASG2 skillfully reproduced.The inclusion of additional topographic indices improved the ability to distinguish woody wetlands from other forest classes, and the use of multi-season imagery improved the ability of AASG to classify water, developed classes, forest classes, and hay/pasture.These results demonstrate both the flexibility of the AASG approach for multi-temporal classification as well as the potential for

Figure 1 .
Figure 1.Identification of stable pixels based on image difference histograms (I2−I1) for each land cover class.Stable pixels are those within a given distance from the mean (µk) of class k.

Figure 1 .
Figure 1.Identification of stable pixels based on image difference histograms (I 2 −I 1 ) for each land cover class.Stable pixels are those within a given distance from the mean (µ k ) of class k.

Figure 3 .
Figure 3. Determination of class-specific thresholding factors.(a) The proportion of pixels used as stable sites (pk) as a function of the total number of pixels per class (Nk).(b) The total number of pixels used as stable sites (Nkpk) as a function of the total number of pixels per class (Nk).Both graphs (a) and (b) are shown with nmax = 10,000 pixels.

Figure 3 .
Figure 3. Determination of class-specific thresholding factors.(a) The proportion of pixels used as stable sites (p k ) as a function of the total number of pixels per class (N k ); (b) The total number of pixels used as stable sites (N k p k ) as a function of the total number of pixels per class (N k ).Both graphs (a) and (b) are shown with n max = 10,000 pixels.
Remote Sens. 2016, 8, x FOR PEER REVIEW 8 of 16 northwestern portion of the study region.The major metropolitan areas of Raleigh, Durham, and Chapel Hill were successfully mapped by both AASG models.
Remote Sens. 2016, 8, x FOR PEER REVIEW 9 of 16 algorithm, the large within-class variability can be used in the classification, leading to improved classification accuracy.

Figure 6 .
Figure 6.Overall agreement between NLCD and AASG2 as a function of nmax.

Figure 6 .
Figure 6.Overall agreement between NLCD and AASG2 as a function of nmax.Figure 6. Overall agreement between NLCD and AASG2 as a function of n max .

Figure 6 .
Figure 6.Overall agreement between NLCD and AASG2 as a function of nmax.Figure 6. Overall agreement between NLCD and AASG2 as a function of n max .

Figure 7 .
Figure 7. Mean decrease in agreement between NLCD and AASG2 when a given variable is excluded.

Figure 7 .
Figure 7. Mean decrease in agreement between NLCD and AASG2 when a given variable is excluded.

Table 1 .
Landsat 5TM images used in this study.