The Effects of Point or Polygon Based Training Data on RandomForest Classification Accuracy of Wetlands

Wetlands are dynamic in space and time, providing varying ecosystem services. Field reference data for both training and assessment of wetland inventories in the State of Minnesota are typically collected as GPS points over wide geographical areas and at infrequent intervals. This status-quo makes it difficult to keep updated maps of wetlands with adequate accuracy, efficiency, and consistency to monitor change. Furthermore, point reference data may not be representative of the prevailing land cover type for an area, due to point location or heterogeneity within the ecosystem of interest. In this research, we present techniques for training a land cover classification for two study sites in different ecoregions by implementing the RandomForest classifier in three ways: (1) field and photo interpreted points; (2) fixed window surrounding the points; and (3) image objects that intersect the points. Additional assessments are made to identify the key input variables. We conclude that the image object area training method is the most accurate and the most important variables include: compound topographic index, summer season green and blue bands, and grid statistics from LiDAR point cloud data, especially those that relate to the height of the return.


Introduction
Wetlands are dynamic in both space and time, providing important ecosystem services, which vary depending upon location.These valuable ecosystems help mitigate flooding, provide filtration of polluted waters from waste and run-off, recharge groundwater supply, and provide habitat for many aquatic organisms [1][2][3][4][5].
The dynamic nature of wetlands makes it particularly important to update wetland inventories more frequently than has been done in the past.The hydroperiod, or water level duration and frequency, is the most important attribute to a wetland's function and biodiversity and is heavily influenced by climate patterns both large and small [5].Climate conditions, land use practices, and topographical characteristics affect the location and seasonality of a wetland [6,7].
Wetland maps have been conventionally made using manual photo interpretation and heads-up (i.e., on-screen) digitizing and are not frequently updated [8].Traditional wetland inventories, such as the US National Wetlands Inventory (NWI), are mapped and updated irregularly and typically under-represent ephemeral and forested wetlands due to poor timing of image data acquisition (e.g., during drier conditions in the mid-summer, under full leaf-on canopy conditions that obstruct the view of understory wetlands, and/or under cloud cover).Such inventories lack the incorporation of alternate remotely sensed data, such as light or radio detection and ranging (LiDAR or radar, respectively), that can describe the vertical structure of these diverse ecosystems [9,10].Traditional pixel-based land cover classifications are unsupervised, supervised, or a combination of both (hybrid), and these classifications do not often incorporate spatial context in the classification [11].Object based image analysis (OBIA) groups pixels that have similar data value properties (i.e., radiance/reflectance, elevation, slope, etc.) into areas or objects.Classifications then can be done by object, instead of pixel by pixel, and can potentially provide higher mapping accuracy [12][13][14].
For both traditional and OBIA types of classification, the integration of remotely sensed data from multiple sources can provide a baseline for mapping wetlands and can improve upon the use of single date optical imagery for land cover classification.Furthermore, wetland type can be better resolved by integrating aerial orthophotographs from multiple dates, LiDAR multiple return point cloud data [15,16], topographic derivatives [14,17,18], and combinations of all of these data with other ancillary data, such as hydric soil information [6,19].
LiDAR is well suited for creating highly accurate digital elevation models (DEMs) and normalized digital surface models (nDSMs).The unique information from pulse return elevation and intensity are currently under-utilized and have growing potential for identifying vegetation structure and hydrologic condition [16,[20][21][22].By fully utilizing the data provided by LiDAR and its derivatives such as standard deviation of all returns within a grid cell, intensity of the returns, slope, nDSM, and topographic indices, the precision of wetland mapping is expected to be significantly increased.Intensity is an additional attribute from the LiDAR point cloud data that has been shown to be valuable for distinguishing among different land cover types and water inundation levels [16,21,22,23].Another useful measure is the compound topographic index (CTI), complementary to elevation and slope, that estimates potential wetness based on the flow of water across a landscape and the total contributing area for a downslope point [18].This index has been shown to increase wetland mapping accuracy [14,17,18,24].At this time, the State of Minnesota is one of nine states that have completed statewide LiDAR collects, and one of over 20 states that have plans to acquire statewide LiDAR in the near future.Minnesota also collects both spring and summer aerial imagery (occurring every one-two years), a practice that is growing in demand across the country and increasingly more regions are routinely collecting spring aerial imagery.In this paper we demonstrate techniques for mapping wetlands that take advantage of these types of data acquisitions.The techniques are meant to be affordable, relatively simple, applicable for different study areas, and thus more easily repeated at larger geographic scales.Much of the training and classification process can be automated using open source software that can implement classification algorithms such as randomForest (RF), provided the reference and input model data are preprocessed in advance.The RF classification is computationally fast, can handle multiple data types, and does not require much user-based knowledge for generating a wetland classification.
There are several ways to perform RF classification without great concern of over-fitting, which makes it flexible, including: using point or area training data and producing pixel or object-based classifications.RF also allows for an assessment of input data importance.The choice of classifier training approach may be less flexible, depending on the training data available (usually point reference data), potentially constraining the desired level of accuracy.Here, we compare the results of a land cover classification by training an RF classifier in three ways: (1) field and photo interpreted points using a single pixel; (2) average values for pixels within a fixed window of field and photo-interpreted points; and (3) average values for pixels within image objects that intersect field and photo-interpreted points.Assessments are made for two study sites, a forested region and an agricultural region, to identify the key input variables for accurate classification of upland, water, and wetlands, and for sub-classifying upland and wetland type.The results from this study can be used to provide classification maps that will inform natural resource managers charged with monitoring wetland ecosystems.In addition, the techniques tested in this study will aid in the design of wetland mapping programs and ultimately, provided the current image acquisition programs retain the frequency of their data collection efforts at every one-two years, increases the efficiency and consistency for more frequent mapping and monitoring of these valuable ecosystems.

Study Areas
The study areas are representative of two ecoregions with different hydrological patterns, as defined by the Minnesota Department of Natural Resources Ecological Classification System [25]: Laurentian Mixed Forest (Cloquet) and Prairie Parkland (Mankato) provinces.

Cloquet
This study area surrounds the small city of Cloquet (Figure 1) in the Laurentian Mixed Forest province of northeastern Minnesota.It is dominated by conifer forests, some mixed hardwood-conifer forests, and conifer bogs and swamps.Wetland loss in this region is not as large as in the southern portion of the state; greater than 80% of the pre-settlement wetland area remains [26].The elevation across the study area is 330-450 m above sea level (mean of 392 m), with slopes averaging less than 1.7 degrees.
Precipitation over the study site during the 2009 water year (October 2008-September 2009) was below normal by about 5 cm, but precipitation during the first part of the 2010 water year was above normal (79 cm) by about 10 cm.Precipitation over the study site during the 2011 water year was above normal by about 5 cm [27].

Mankato
Many of the agricultural fields surrounding the study area near Mankato (Figure 1) are drained cultivated wetlands [26].The prairie parkland province is in western Minnesota and is heavily modified by human activity, specifically for large-scale conventional agriculture.Most of the prairie wetlands have been drained with tile systems [28], though the drainage extent is unknown.It is estimated that less than 50% of the pre-settlement wetland area remains in this portion of the state [26].The elevation across the study area is 233-316 m above sea level (mean of 296 m), with slopes averaging less than 1.3 degrees.Precipitation over the study site during the 2010 water year (October 2009-September 2010) was above normal by about 25 cm, but precipitation during the 2011 water year was below normal (84 cm) by about 5 cm [27].

LiDAR-Derived Input Data
We used LiDAR point cloud data to generate several topographic raster datasets used in the RF models, including: digital elevation model (DEM), local ground slope, normalized digital surface model (nDSM), slope of the nDSM, grid statistics on Z (height; minimum, mean, maximum, and standard deviation), grid statistics on intensity, and the compound topographic index (CTI).Since wetlands tend to be located in low-lying flat or depressional areas on the landscape, we used elevation and slope data in the RF classifier.The Cloquet site LiDAR data were acquired 3-5 May 2011, by Woolpert, Inc. Flight lines had 25% overlap and multiple returns were recorded for each laser pulse along with an intensity value for each return.The nominal point spacing of the LiDAR pulses is reported to be 1.5 m; the horizontal accuracy is ±1.2 m (95% confidence level), and the vertical accuracy root mean square error (RMSE) is 5.0 cm.The Mankato site LiDAR data were acquired 26-28 April 2010, by Quantum Spatial, Inc. Flight lines had 50% overlap and multiple returns were recorded for each laser pulse along with an intensity value for each return.The reported nominal point spacing of the LiDAR pulses is 1.3 m, the horizontal accuracy is ±0.5 m; and the vertical accuracy root mean square error (RMSE) is about 10.0 cm.The DEMs used for both study sites were provided by the Minnesota Department of Natural Resources as a 1 m product and were reportedly generated by extracting vendor-classified bare earth and model key points from the point cloud data and hydro-flattened using the edge of water breaklines.Prior to calculating the CTI, all sinks in the DEM were filled to avoid interference with hydrologic flow.We used the Fill Sinks XXL tool in the software program SAGA (System for Automated Geoscientific Analyses; v. 2.1.0)because of its method of filling depressions by maintaining a downward slope along the flow path [29].The CTI data layers were calculated for both study areas using the following formula [30]: CTI = ln [(α)/tan(β)], where α = contributing upslope area and β = local slope.The algorithm we used to calculate α was called the triangular multiple flow direction algorithm (MD∞), which allows for multiple neighboring cells to estimate the downslope cell's flow direction [30].We processed the CTI layers using the software Whitebox (v.1.0.7).In addition, we ran a 3 × 3 cell window low-pass filter using a mean rule on the final CTI data layers to minimize anomalous values.
For the remaining topographic datasets used, we generated the data using QuickTerrain Modeler (Applied Imagery, Silver Spring, MD, USA; v. 7.1.6).We imported the LiDAR point cloud data into QT Modeler by including all returns except the LiDAR points from the overlapping flight lines (25% for Cloquet and 50% for Mankato).Through exploratory data analysis, we found that avoiding the overlap class reduced the amount of noise and outlying spikes in intensity due to scan angle at the edges of flight lines.We created a DSM by using the maximum Z value of all returns.We chose 5 m grid spacing to maintain enough returns within a grid cell and retain confidence in calculating statistical summaries, while still preserving moderately high resolution.For cells that did not have any LiDAR returns, a proprietary algorithm in QT Modeler (adaptive triangulation interpolation) was used to fill gaps.The interpolated surface is then approximated as a TIN (triangular irregular network) and QT Modeler uses an anti-aliasing (AA) routine to refine the precision by taking into account triangulation within cells.Before exporting the DSM raster, a smoothing algorithm, natural neighbor, was performed to smooth and curve triangulated lines based on the elevation levels of neighboring cells.These processes result in slopes and building edges being represented more realistically.An nDSM was then created by subtracting the DEM from the DSM.Additional grid statistics were calculated from the LiDAR point cloud, including the minimum, maximum, mean, and standard deviation of the Z and intensity values of all returns within each 5 m grid cell.We used QT Modeler's Grid Statistics Tool to run these statistical operations using a 5 m grid on the attributes from the imported point cloud.The resulting grid statistics were exported in raster format with the 5 m grid spacing.

Optical Input Data
For both study sites, we used summer 2010 aerial orthophotos from the U.S. Department of Agriculture, Farm Service Administration's National Agricultural Imagery Program and spring digital orthophoto quarter quads (DOQQs) from the Minnesota Department of Natural Resources.The DOQQs for the Cloquet site were collected in June 2009 (early leaf onset) and in 2011 for the Mankato site.The spring 2009 and 2011 imagery was acquired with visible and near infrared bands (blue, green, red, NIR), whereas the summer 2010 imagery was collected only in visible bands (blue, green, red).All optical imagery used in this study was orthorectified and radiometrically balanced by the respective supplier prior to distribution.The NIR band is often used for calculating spectral indices, such as the Normalized Difference Vegetation Index (NDVI).Many studies have shown that NDVI is particularly useful for separating vegetated versus non-vegetated areas and wet versus dry areas [10,11,31].We used the red and near-infrared bands to calculate NDVI for both sites (spring season 2009 for Cloquet and spring season 2011 for Mankato).For all optical images used in this study, we used 5 m spatial resolution to correspond with the resolution used for the LiDAR data and derivatives.

Land Cover Classification Schemes
We performed two levels of land cover classification for both study sites that differentiated among upland, water, and wetland areas (Level 1), and sub-classified upland and wetland types (Level 2).Upland classes included agriculture, forest, grassland, shrub, and urban.Wetland classes included emergent, forested, and scrub/shrub wetlands.This wetland class scheme was modified from the Cowardin wetland classification scheme [32] and included the three most common wetland classes in the study area according to the NWI: emergent, forested, and scrub/shrub wetlands (where palustrine unconsolidated bottom and palustrine aquatic bed classes were merged into one emergent wetland class and the riverine unconsolidated bottom class was merged with the water class).
We took a hierarchical approach for classification using the following steps: (1) water and non-water areas were classified; (2) the non-water class was sub-classified into upland and wetland areas; (3) the wetland class was sub-classified into three wetland types; and (4) the upland class was sub-classified into five upland types.At each step, different sets of input data were used to optimize model performance.More details about the datasets used in each hierarchy level are described in the RandomForest section, and each of the data layers included will be explained in detail in the respective subsections of the Input Datasets and Process Flow section.

Random Forest Classification
We used the meta-classifier RF for our study [33] and implemented the algorithm (v 4.6-7) in the software package R (v 2.15.1).We executed the classifier by bringing in the input layers as data frames, where all input data layers are required to be scaled to the same spatial resolution, and used three different types of reference training data.The RF classifier constructs decision trees using a random sample of input variables at each node in each tree.The number of variables sampled was the square root of the total number of input variables.Each decision tree is fully grown using a sample (with replacement) of about one-third of the training data (in-bag).The remaining training data (out-of-bag) is used to calculate cross-validation accuracy per tree, and averaged to estimate relative accuracy of each model prior to formal accuracy assessment.After trial-and-error revealed minimal change in cross-validation accuracy with more 500 trees, we decided to remain at the algorithm's default number of 500 trees.Each of the trees produced a "vote", and the final classification result was the class that had the most votes [33].We built one RF model per test of training method by integrating different combinations of remotely sensed data.
The combination of data input for each classification step was selected based on expert knowledge of remotely sensed data and success from a previous study of land cover classification [10].The same datasets were used for both study sites to make inferences on the relative power of each input data layer for two ecological regions.The following datasets were used in each step of the hierarchical classification: (1) water and non-water areas were classified using a DEM, the CTI, and spring and summer aerial orthophotos; (2) the non-water class was sub-classified into upland and wetland areas using slope gradient, the CTI, spring and summer aerial orthophotos, and LiDAR grid statistics; (3) the wetland class was sub-classified into three wetland types using a DEM, slope, the CTI, an nDSM, slope of the nDSM, spring and summer aerial orthophotos, and LiDAR grid statistics; and (4) the upland class was sub-classified into five upland types using slope, an nDSM, the slope of the nDSM, spring and summer aerial orthophotos, normalized difference vegetation index (NDVI) of the spring imagery, and LiDAR grid statistics.

Training and Reference Data
We tested a hierarchical land cover classification technique using three methods for classifier training: (1) point training using a single pixel value per point; (2) buffer area training using the average value for pixels within a 5 × 5 pixel window (approximately 12.5 m buffer radius) surrounding the reference data points; and (3) polygon area statistics within image objects that intersect reference data points.For all three methods, we used the same stratified random sample of 75% of the reference point data for training and 25% of the reference point data for an independent accuracy assessment on the results.This percentage split of the data was used to ensure there were sufficient training points per class, which was particularly relevant for the Level 2 classification (Tables 1 and 2).Point training data for the RF classifier included a single pixel value for each input data layer.Buffer area training data included the mean value for all pixels within the buffer area for each input data layer.Image object area training included the minimum, maximum, mean, and standard deviation values calculated for each image object.[26], and newly generated points using photo interpretation.The procedure for field crew reference data collection during the summers 2009-2011 involved: locating randomly generated reference points with a Trimble Juno SB GPS unit (3-5 m real-time and 1-3 m post-processed accuracy); identifying the dominant land cover type; recording basic characteristics of the site on the GPS unit; taking representative photographs; and maintaining a back-up recording of the point ID, photo ID, land cover classification, and GPS coordinates in a field book.Points were added to the reference dataset via photo interpretation to ensure adequate representation of each land cover class and to maintain an appropriate spatial distribution of data points.The RF model built using these point training data was applied per pixel to the whole study area for both sites to produce a pixel-based classification.

Buffer Area Training Data
We used a fixed window around each of the reference training points to incorporate contextual information in the training phase of the RF classifier.The additional information provided by including an area surrounding a training point, rather than using the data from a single pixel corresponding to a training point, has been reported to increase the representativeness of the training data and improve the accuracy of the classification [34][35][36].After trial-and-error revealed minimal sensitivity to buffer size, we used a 5 × 5 cell window at 5 m spatial resolution (approximately 12.5 m buffer radius).The average value for each buffer area was calculated for each input data layer and the classifier used that value.The RF classifier was applied per pixel using the buffer area training data to produce pixel classifications.

Image Object Area Training Data
Pixels that were contained within the previously described buffer areas surrounding reference training points provided contextual information irrespective of ecosystem transition zones and edge effects.However, image segmentation algorithms generate objects containing relative contextual homogeneity present in the input data [12].These boundaries contain more relevant information within them about the landscape compared to a single point and are more representative of the feature than a fixed window (buffer area).
Image segmentation has been used in land cover classification for several decades [14,[37][38][39][40][41].The methods available for image segmentation and the applications for OBIA have become more broadly sophisticated over the last decade [12,[42][43][44][45].The OBIA process tends to rely heavily on expert knowledge of software algorithms, photointerpretation skills, knowledge of remotely sensed data, and time invested in trial and error for parameter settings.
The image segmentation procedure we used involved several steps and employed multiple segmentation algorithms in the software package eCognition Developer 64 (Trimble Navigation Limited, Westminster, CO, USA; v. 8.8).We describe this process as dynamic, iterative, and minimally knowledge-based at multiple scales.In this case, 'scale' refers to creating image objects that represent features at different spatial scales, i.e., fine-scaled features (trees and buildings) and larger features (agricultural fields, water bodies, and wetland complexes).Figure 2 outlines the approach in six steps that are explained in more detail as follows.
We used Contrast Split segmentation on the nDSM.This effectively separated tall areas from shorter areas, where the eCognition software uses iterations of different threshold values to optimize the split.eCognition also uses an "Edge Difference" method for finding the borders of image objects.
We used an nDSM slope data layer with the multi-threshold segmentation algorithm to segment the tall features (greater than 1 m in height, based on user assessment of the distribution of surface feature heights throughout the nDSM) into areas of low slope (<3°) and high slope (>3°).The value of 3° of slope was chosen by trial and error and confirmed by other studies that found this value to be an appropriate threshold for wetland mapping [46].To isolate buildings from trees and other natural features, we applied the quadtree segmentation algorithm to spring and summer optical data.This algorithm aggregates pixels with similar spectral properties and produces rectangular objects, leveraging the fact that natural features tend to be more rounded (less angular) than anthropogenic features.
We used multi-resolution segmentation with spring and summer optical data on all tall features (low and high slope).This is an optimization algorithm that consecutively segments to minimize the heterogeneity within image objects and merges segments to maximize the homogeneity between neighboring image objects.The parameters used in this algorithm were: scale parameter = 10, shape = 0.3, and compactness = 0.5.Image objects were reviewed by four expert photo interpreters and co-authors of this study [45], who verified that the image objects resulting from parameters used in segmentation approximated the features of interest.For the not-tall features, we used multi-resolution segmentation with both optical and LiDAR intensity values using the same algorithm parameters described above.
In the next step, multi-resolution segmentation results were further refined using spectral difference segmentation, where small image objects were aggregated to create contiguous image objects for large features while maintaining spectral similarity within small features.This further minimizes the heterogeneity within image objects and maximizes the homogeneity between neighboring image objects.We used a different Spectral Difference value for the tall features than for the not-tall features (30 and 15, respectively).For features that are not-tall, a relatively smaller value preserves more objects with greater heterogeneity between neighboring objects, such as a wet ditch between a road and an agricultural field.
To ensure a proper number of pixels for statistical operations within the final objects, we removed all objects that were less than 100 pixels, which is less than half the size for a reasonable minimum mapping unit (1-3 acres).These smaller objects were merged with objects that shared more than 20% of the total boundary or were merged with the objects that fully engulfed the smaller object.As a last step, we removed all labels ("tall, high slope", "tall, low slope", "not tall").
After finalizing the segmentation within both study sites, we intersected the reference training points with the image objects to isolate the set of training objects with known class value for the RF classifier.
The training data used by the RF model included statistics for the minimum, maximum, mean, and standard deviation of all pixels within image object areas.For example, the grid statistic "Z minimum" was used as an input variable, but the training data for the RF model included grid statistics about that variable from all pixels that were contained within each object (the minimum "Z minimum" value within each object, the maximum "Z minimum" value within each object, the mean "Z minimum" value within each object, and the standard deviation of the "Z minimum" value within each object).
Post classification edits were done on some larger segmented polygons because the objects included features of different land cover classes.These edits were performed on the largest objects only, covering approximately 5% of the resulting image objects, where most of the edits were completed by using photo interpretation to split large objects into two or more smaller objects with different land cover classes.Only the final classification results were edited.

Accuracy Assessment
For each of the three different methods for classifier training (point, buffer area, and object area), we used the same reference test points for an independent accuracy assessment (25% of the total reference dataset).We used traditional accuracy assessment methods, including: constructing error matrices with overall accuracy, 95% confidence intervals (CI), User's and Producer's accuracies, Kappa statistic (K-hat), and comparison significance tests of error matrix k-hat values [47].We produced summaries of the Producer's and User's accuracies and the overall accuracies for each of the three classifier training methods for both study sites.We also performed pairwise error matrix significance tests to compare the three different methods for classifier training.
We assessed the importance of variables using the outputs from RF, which complement the traditional accuracy assessment.The Mean Decrease in Accuracy measure was used to report variable importance.Mean Decrease in Accuracy is calculated from the out-of-bag sample of input data, which was held out of the growth of a decision tree.The variable is ranked higher in importance if the cross-validation accuracy of the model was decreased when the variable was held out of the growth of individual decision trees within the RF model [33].

Cloquet
The most accurate classifier training method was image object area training (86%).Table 3 shows a summary of the Producer's and User's accuracies for each of the three methods of training, illustrating that object area training is best for nearly every class.The lowest Producer's accuracy for the object area training method was from the wetland class (81%) and the lowest User's accuracy was from the upland class (77%).The highest Producer's and User's accuracies were from the water class (100%).Pairwise significance test results showed that the object area training was significantly different from the other two methods, at an alpha level of 0.05.
The second best method for classifier training was point training (80%) and the third was buffer area training (78%), but these two methods were not significantly different from one another.These two methods were still significantly more accurate than the original NWI (70%).The lowest Producer's accuracy for the point training method was from the wetland class (76%) and the lowest User's accuracy was from the upland class (72%).The highest Producer's accuracy for the point training class was from the water class (88%) and the highest User's accuracy was from the wetland class (85%).The lowest Producer's accuracy for the buffer area training method was from the water class (68%) and the lowest User's accuracy was from the wetland class (77%).The highest Producer's accuracy for the buffer area training class was a tie between the upland and wetland classes (79%) and the highest User's accuracy was from the water class (88%).
The output classification maps for each of the three methods of classifier training show spatial differences among the three methods.In Figure 3a, the point training has less heterogeneity than the buffer area training (Figure 3b), and the image object area training, by nature, has more homogeneity (Figure 3c).The water class, in particular, has strong spatial uniformity in the image object area training approach, meaning features are not as broken or choppy compared with the point and buffer training methods.There were several variables that were considered important in all three training methods, including: CTI, summer season green band, Z minimum, Z mean, Z maximum, Z deviation, and intensity deviation.For the point training approach (Table 4, first column), the summer season red and blue bands, and intensity minimum were also found to be among the top ten most important variables.For the buffer area training approach (Table 4, second column), the summer season red and blue bands and the spring season NIR band were also found to be among the top ten most important variables.For the object area training approach (Table 4, third column), the spring season NIR band and intensity minimum were also found to be among the top ten most important variables.5).Table 5 shows a summary of the Producer's and User's accuracies for each of the three methods of training, illustrating that the different methods for classifier training do not vary greatly in their results for each land cover class.The lowest Producer's accuracy for the object area training method was from the water class (80%) and the lowest User's accuracy was from the wetland class (95%).The highest Producer's accuracy for the object area training method was from the upland class (100%) and the highest User's accuracy was from the water class (100%).For the point training method, the lowest Producer's and User's accuracies were from the wetland class (95%) and the highest Producer's and User's accuracies were from the water class (100%).Pairwise significance test results showed that the three methods of classifier training were not significantly different, at an alpha level of 0.05.
The least accurate method for classifier training was buffer area training (95%).The lowest Producer's accuracy for the buffer area training method was from the water class (80%) and the lowest User's accuracy was from the wetland class (90%).The highest Producer's accuracy for the buffer area training class was from the upland class (97%) and the highest User's accuracy was from the water class (100%).All three of the methods were still significantly more accurate than the original NWI (70%).
The output classification maps for each of the three methods of classifier training show slight spatial differences between the three methods.Figure 4a,c shows that the point training and object area training methods resulted in more homogeneity than the buffer area training (Figure 4b).
There were several variables that were considered important in all three methods, including: CTI, summer seasons green and blue bands, Z minimum, Z mean, Z maximum, and Z deviation.For the point training approach (Table 6, first column), intensity minimum, intensity mean, and intensity deviation were also found to be among the top ten most important variables.For the buffer area training approach (Table 6, second column), the summer season red band, intensity minimum, and intensity deviation were also found to be among the top ten most important variables.For the object area training approach (Table 6, third column), only LiDAR related variables were found to be ranked in the top ten important variables, including: all statistics on Z minimum data and the standard deviation of intensity maximum.The image object area training method was the most accurate (77%) approach and the urban and water classes had the best results.The upland shrub class had the poorest accuracy for all three training approaches for both User's and Producer's accuracies.The output classification results (see Figure 5) showed that the point and buffer area training methods had more spatial variability, with scattered areas classified as agriculture and grassland throughout and overall less wetland area.This visual assessment confirmed the results of the Producer's and User's accuracies (see Table 7).Aside from the upland shrub class, all of the upland classes using the image object area training method were more accurate than the wetland classes.This result may imply that the Level 1 classification of the upland class was more accurate than the wetland class, allowing for higher accuracies in the Level 2 classification of upland subclasses.It may also imply that the image object area training data for the upland subclasses was more representative than the image object area training data was for the wetland subclasses.The dataset used to sub-classify the wetland class into emergent, forested, and scrub/shrub wetlands included: DEM, slope, CTI, spring and summer aerial orthophotos, and LiDAR grid statistics.The dataset used to sub-classify the upland class included: slope, nDSM, slope of the nDSM, spring and summer aerial photos (including NDVI of the spring imagery), and LiDAR grid statistics.Among all of these variables, the most important variables for Level 2 classification across all three training methods (shown in Table 8) were LiDAR grid statistics on Z and intensity values, nDSM, and slope of the nDSM.In all three training methods, Z deviation was the most important variable, implying that wetland and upland subclasses are distinguishable by looking at the variability of the height of all LiDAR returns within a grid cell.For the image object area training method, the slope of the nDSM was also highly important, implying that horizontal changes in feature height within wetland and upland subclasses are a distinguishing attribute (i.e., transition zones or boundaries).Other variables that were identified as important across all three training methods included: the summer season optical layers (blue, green, and red), and intensity minimum and intensity deviation.The optical variables were important due to the distinguishable visual differences in land cover type during a summer with above normal precipitation (i.e., full canopy deciduous trees, open grass fields, parking lots and buildings, healthy urban lawns, emergent wetlands along shorelines, coniferous bogs, etc.).Intensity values are different depending on land cover type and target material [16,22], where lower values may be associated with wetter areas (LiDAR pulse is absorbed more) and higher values may be areas with less absorption and more specular reflectance of the LiDAR pulse.Intensity deviation can be an indicator of the land cover types within a grid cell.

Mankato
The image object training method was the most accurate approach (93%), but the point and buffer area training methods were not significantly different (89% and 88%, respectively).The Producer's and User's accuracies for each class of each method did not reveal a strong pattern of commission or omission error for any of the methods (see Table 9).For all three methods, the water, emergent and forested wetland, agriculture, and urban classes had high User's and Producer's accuracies, whereas the upland and wetland shrub classes had the lowest User's and Producer's accuracies.In terms of a visual comparison (Figure 6), the biggest differences are seen in the grassland and urban classes, where the image object training appears to have included more urban area around Mankato's city center, but the point and buffer area training approaches included more grasslands within agricultural areas.
The dataset used to sub-classify the wetland class into emergent, forested, and scrub/shrub wetlands included (the same set as for Cloquet): DEM, slope, CTI, spring and summer aerial orthophotos, and LiDAR grid statistics.The dataset used to sub-classify the upland class included (the same set as for Cloquet): slope, nDSM, slope of the nDSM, spring and summer aerial photos (including NDVI of the spring imagery), and LiDAR grid statistics.Among all of these variables, the mutually important variable for all three training methods was spring season NDVI (shown in Table 10).The most important variable for the point and buffer area training was the slope of the nDSM and the most important variable for the image object area training was the mean nDSM value.Other than NDVI, optical data did not rank very highly in importance, but nDSM-related attributes did rank highly in all three methods.This result shows the high importance of topographical derivatives (i.e., slope, nDSM, CTI, and Z) corroborates what we discussed in the Level 1 section: land cover classes in an agricultural area are more often distinguishable by topographical data, alone.

Cloquet
The image object area training method was the most accurate (86%) approach for both the Level 1 and Level 2 classification and was significantly more accurate than the other two approaches at Level 1.Our results show that the image object approach mapped wetland areas accurately, only incorrectly mapping wetlands 7% of the time and omitting wetlands 19% of the time.Uplands, on the other hand, were mapped incorrectly 23% of the time and were omitted about 10% of the time, where the upland shrub class had the poorest accuracy for all three training approaches, in terms of both User's and Producer's accuracies.These results imply that more information is needed, perhaps from other remotely sensed data or additional dates of imagery, to differentiate wetland from upland areas (see Tables 3 and 7).
The dataset that was used to separate the non-water class into upland and wetland areas included: slope, CTI, spring and summer aerial photos, and LiDAR grid statistics.Among these variables, the majority of the most important variables (shown in Tables 4 and 8) were LiDAR grid statistics on Z and intensity values.For the Level 2 classification, the slope of the nDSM was also highly important, implying that horizontal changes in feature height within wetland and upland subclasses are a distinguishing attribute (i.e., transition zones).In all three methods of Level 1 model training, CTI was found to be the most important variable.This finding confirms the usefulness of this index for differentiating areas that have potential wetness from areas that are drier and less conducive to wetland conditions [7,21].
Other variables that were considered important for all three training methods for the Level 1 classification included: the summer season optical layers (blue, green, and red), and spring season NIR band.Though upland and wetland areas may be optically similar (i.e., forested uplands versus forested wetlands; upland shrubs versus wetland shrubs), in all three training methods for a Level 2 classification Z deviation was the most important variable, implying that wetland and upland subclasses are distinguishable by looking at the variability of the height of all LiDAR returns within a grid cell.Another consideration is that some of the optical data over this study site was acquired during a period of above-normal precipitation that may have made wetland areas wetter and more optically differentiable.On the other hand, the spring season optical data (which includes the NIR band) was acquired during below normal precipitation conditions.The timing of this data collect may have aided in making vegetative land cover types, such as coniferous or deciduous trees, more distinguishable due to leaf-off conditions.The lower amount of precipitation in this dataset may have also aided in distinguishing areas with drier soil conditions from areas that are more permanently wet.

Mankato
The image object training method was the most accurate method for both Level 1 and Level 2 classifications, where for Level 1 the image object training method had the same accuracy as the point training approach (96%) and the Level 2 classification accuracy was only slightly lower at 93%.These results were not significantly different from each other.The Producer's and User's accuracies for each Level 1 class of each method did not reveal a strong pattern of commission or omission error for any of the methods in any of the classes (see Tables 5 and 9), however, the upland and wetland shrub classes had the lowest User's and Producer's accuracies.The largest visual differences are with the Level 1 wetland class, where the image object training approach appears to have included more wetland area around the water bodies but the point and buffer area training approaches include more wetlands scattered within upland areas.For Level 2, the largest visual differences are with the grassland and urban classes, where the image object training approach appears to have included more urban area around Mankato's city center, whereas the point and buffer area training approaches included more grassland scattered within agricultural areas.
The dataset that was used to separate the non-water class into upland and wetland areas was the same set as for Cloquet, where the most important variables were LiDAR grid statistics on Z (shown in Table 6).The most important variable for all three training methods was Z minimum.This result implies that the minimum Z value is not spatially variable over our training sites and is therefore equally important at the pixel level, a fixed window (buffer area) around the pixel, and an object area containing the pixel (feature scale).The high importance of Z attributes (minimum, mean, maximum, and standard deviation within a 5 m grid cell) shows that the land cover classes in Mankato are more often distinguishable by height, alone.Among the variables used to sub-classify the wetland and upland classes, the mutually important variable for all three training methods was spring season NDVI.Other than NDVI, optical data did not rank very highly in importance for a Level 2 classification, but nDSM related attributes did rank highly in all three training methods.This result shows the high importance of topographical derivatives (i.e., slope, nDSM, CTI, and Z) corroborates what we found for the Level 1 classification: land cover classes in an agricultural area are more often distinguishable by topographical data, alone.
Other variables that were identified as important for all three training methods included: CTI and the summer season blue and green bands.The summer season optical data had been acquired during a year with above-normal precipitation, which may have contributed to the health of the cultivated vegetation and its relative 'greenness'.The above-normal precipitation may have also contributed to the amount of water in the wetland areas; these conditions tend to reveal more change in the values of the blue band in optical imagery.The spring season aerial orthophotos were collected during a period of below-normal precipitation and none of these data layers were among the top variables of importance for any of the three methods for classifier training.This result possibly indicates two things: spring season imagery are not as important as summer imagery for an agricultural region or low precipitation conditions in the spring are not conducive to making optical imagery more useful than topographical data.

Conclusions
This research investigated techniques to produce an accurate land cover classification using methods and datasets that were affordable, relatively simple, moderately reliant on expert knowledge, and, as a result, more easily repeated at larger geographic scales.Given the increased availability of the input datasets used in this research for other regions throughout the United States, including comparable resolution satellite imagery and the lowering costs of collecting LiDAR data, the methods developed in this research are applicable to many other regions for updating land cover classifications more frequently.We found that for both the northern-forested areas surrounding Cloquet, Minnesota and the agricultural areas surrounding Mankato, Minnesota, the image object approach to training was the most accurate method for classifier training.This method uses the information from multiple pixels to train the model, as opposed to information from a single pixel or a fixed window (buffer area) surrounding a field reference point.
We found that the most important freely available datasets to produce a Level 1 classification (upland, water, and wetland) for both Cloquet and Mankato included: CTI, summer season green and blue bands, and grid statistics from all returns of point cloud LiDAR data, especially those that related to the height of the return.For a Level 2 classification (sub-classifying the upland and wetland classes), the most important input data layers for Cloquet using image object area training included: Z deviation and mean, intensity minimum, and summer season blue and green bands.For Level 2 classification in Mankato, the most important data layers included: nDSM, NDVI, slope, and CTI.These results showed that in a forested region, a variety of data sources (optical, LiDAR height, and LiDAR intensity) were important for maintaining an acceptable degree of accuracy (>75%), but for an agricultural region, topographical models (surface and potential wetness [CTI]) were important and provided very good classification accuracy (>95%).
In terms of increasing classification accuracy using the randomForest classifier, we suggest a few options: improve the image segmentation polygons and therefore improve the area statistics being used for classifier training; include additional remotely sensed input data, such as radar, that shows other feature characteristics (such as structure, density, and below canopy attributes); increase the number of image dates to incorporate more seasonality; and include additional spectral bands from satellite imagery to test further improvement in the accuracy of sub-classifying land cover types.We thoroughly tested training methods and found that using a fixed window (approximate to a buffer radius) is not appropriate for classifier training.For forested regions, we propose that additional data, such as more dates of imagery or additional data types such as radar, be explored to complement the information provided by optical and LiDAR data.We also showed that in some cases, image object area training did not significantly improve the accuracy of land cover classification over point training, particularly in agricultural regions such as Mankato, Minnesota.In such limited cases, a pixel classification approach can yield similar classification accuracy with lower cost.This study has provided detailed information on successful techniques used to aid in the design of programs to map and update maps of wetlands, thereby increasing the efficiency of monitoring these valuable and dynamic ecosystems.

Figure 1 .
Figure 1.Study areas near Cloquet, MN and Mankato, MN.The aerial images are from the 2010 National Agricultural Imagery Program.

Figure 3 .
Figure 3. Output Level 1 classifications for point training pixel classification (a); buffer area training pixel classification (b); and object area training polygon classification (c) for the Cloquet site.

Figure 4 .
Figure 4. Output Level 1 classifications for point training pixel classification (a); buffer area training pixel classification (b); and object area training polygon classification (c) for the Mankato site.

Figure 5 .
Figure 5. Output Level 2 classifications for point training pixel classification (a); buffer area training pixel classification (b); and object area training polygon classification (c) for the Cloquet site.

Figure 6 .
Figure 6.Output Level 2 classifications for point training pixel classification (a); buffer area training pixel classification (b); and object area classification (c) for the Mankato site.

Table 1 .
Summary of reference point data for Cloquet, Minnesota.

Table 2 .
Summary of reference point data for Mankato, Minnesota.Reference training and test point data were compiled from several sources, including: randomly generated field sites visited by trained field crews (summers 2009-2010 for Cloquet and summer 2011 for Mankato), plots of an existing wetland monitoring program (centroids from polygons of the 2006-2008 Minnesota Department of Natural Resources Wetland Status and Trends Monitoring Program

Table 3 .
Cloquet site Producer's and User's accuracies for each Level 1 class, overall accuracy (95% CI in parentheses), Kappa statistic, and Z statistic of the three methods for land cover classification: point training, buffer area training, and object area training.
* Values were significant at an alpha of 0.05.

Table 4 .
Importance of variables (in decreasing order).Top ten selected from each training method for Level 1 classification of the Cloquet study site.

Table 5 .
Mankato site Producer's and User's accuracies for each Level 1 class, overall accuracy (95% CI in parentheses), Kappa statistic, and Z statistic of the three methods for land cover classification: point training, buffer area training, and object area training.
* Values were significant at an alpha of 0.05.

Table 6 .
Importance of variables (in decreasing order).Top ten selected from each training method for Level 1 classification of the Mankato study site.

Table 7 .
Cloquet site Producer's and User's accuracies for each Level 2 class, overall accuracy (95% CI in parentheses), Kappa statistic, and Z statistic of the three methods for land cover classification: point training, buffer area training, and object area training.
* Values were significant at an alpha of 0.05.

Table 8 .
Variable importance (in decreasing order).Top ten selected from each training method for Level 2 classification of the Cloquet study site.

Table 9 .
Mankato site Producer's and User's accuracies for each Level 2 class, overall accuracy (95% CI in parentheses), Kappa statistic, and Z statistic of the three methods for land cover classification: point training, buffer area training, and object area training.
*Values were significant at an alpha of 0.05.

Table 10 .
Importance of variables (in decreasing order).Top ten selected from each training method for Level 2 classification of the Mankato study site.