Fine-Scale Mapping of Natural Ecological Communities Using Machine Learning Approaches

: Remote sensing technology has been used widely in mapping forest and wetland communities, primarily with moderate spatial resolution imagery and traditional classiﬁcation techniques. The success of these mapping efforts varies widely. The natural communities of the Laurentian Mixed Forest are an important component of Upper Great Lakes ecosystems. Mapping and monitoring these communities using high spatial resolution imagery beneﬁts resource management, conservation and restoration efforts. This study developed a robust classiﬁcation approach to delineate natural habitat communities utilizing multispectral high-resolution (60 cm) National Agriculture Imagery Program (NAIP) imagery data. For accurate training set delineation, NAIP imagery, soils data and spectral enhancement techniques such as principal component analysis (PCA) and independent component analysis (ICA) were integrated. The study evaluated the importance of biogeophysical parameters such as topography, soil characteristics and gray level co-occurrence matrix (GLCM) textures, together with the normalized difference vegetation index (NDVI) and NAIP water index (WINAIP) spectral indices, using the joint mutual information maximization (JMIM) feature selection method and various machine learning algorithms (MLAs) to accurately map the natural habitat communities. Individual habitat community classiﬁcation user’s accuracies (UA) ranged from 60 to 100%. An overall accuracy (OA) of 79.45% (kappa coefﬁcient (k): 0.75) with random forest (RF) and an OA of 75.85% (k: 0.70) with support vector machine (SVM) were achieved. The analysis showed that the use of the biogeophysical ancillary data layers was critical to improve interclass separation and classiﬁcation accuracy. Utilizing widely available free high-resolution NAIP imagery coupled with an integrated classiﬁcation approach using MLAs, ﬁne-scale natural habitat communities were successfully delineated in a spatially and spectrally complex Laurentian Mixed Forest environment.


Introduction
An ecosystem is defined as "a community of organisms and their physical environment interacting as an ecological unit" [1]. Land cover grouped into types and systems by resource managers led Arthur Tansley [2] to coin the term "ecosystem". Ecosystems with spatially related features are considered higher-order, larger-scale ecosystems, referred to as "macroecosystems" [3]. When ecosystems are viewed as macroscale patterns, they can be divided into ecoregions [4]. The term "ecoregion" was first proposed by Orie Loucks [5], a Canadian forest researcher. Ecoregions play an important role in resource conservation and management by enabling consideration of the natural process and patterns of communities which provide ecosystem sustainability in a particular region [6]. Many different factors 2 of 25 (vegetation, soils, spatial and temporal scales, landform and bedrock geology) are utilized to classify these systems, and numerous ecological classification schemes exist. Spatial and temporal dimensions of ecosystem integrity can be addressed using scale (level of detail) and a hierarchical structure approach [7]. Various geographic ordering schemes were developed by Bailey [3,6] to identify and delineate ecoregion boundaries. Additionally, having a hierarchical classification scheme allows ecosystems to be presented at different spatial scales [8,9]. A holistic ecological framework was introduced by Rowe and Barnes [9,10], using a landscape ecosystem or geo-ecosystem approach that incorporates factors such as climate, landforms, soil characteristics, hydrology and biota. An example of a widely used hierarchical classification scheme for wetlands and deep-water habitats for the United States was developed by Cowardin [11]. It divides ecological taxa into hierarchal systems or subsystems to provide mapping uniformity across the United States.
Selecting or developing an appropriate ecological classification scheme is critical for the classification to be useful for the end user. There are numerous existing classification schemes [8,[10][11][12], and selection of an inappropriate scheme can limit the end product's accuracy and utility. To classify a landscape, whether in situ or using remotely sensed data, it is important to have a classification scheme which reduces or eliminates confusion between various landscape features requiring separation [13]. Traditionally, many resource management agencies (federal, state and private) use field-sampled regional vegetation classes focusing on the dominant vegetation species while ignoring associated plants, animals and other organisms which are repeatedly found under similar environmental conditions [14], focusing on describing native ecosystem types minimally impacted by anthropogenic activities [12]. For this study, the hierarchical classification framework of Cohen et al. [12] was utilized. "Natural Communities of Michigan: Classification and Description", published by the Michigan Natural Features Inventory (MNFI), provides detailed information on separating Michigan's complex landscape into understandable and describable groups called natural habitat communities. The foundation of this classification is based on the work completed by Chapman [15] and first published by Kost et al. [16]. It is important to understand the difference between a plant community, such as the ones used in the national land cover classification [17], and a natural habitat community [12]. The latter differs from other hierarchical classification schemes in that Cohen et al. [12] regard such a community as "an assemblage of interacting plants, animals, and other organisms that repeatedly occur under similar environmental conditions across the landscape and are predominantly structured by natural processes rather than modern anthropogenic disturbances".
Along with the classification scheme, it is important to choose appropriate field collection methods and data sources. Commonly used field methods for data collection (e.g., collecting location points via global navigation satellite systems (GNSS) and vegetation sampling) are labor intensive, costly and time consuming. Sampling is confined to small areas due to limited access and safety concerns [18]. With technical advances in geospatial technology, an alternative and/or complementary approach to traditional field data collection techniques is available. Remotely sensed imagery provides a practical, economical approach to monitor and measure biogeophysical factors. Hence, it is efficient for large-area monitoring [19][20][21]. Satellite imagery from multispectral and hyperspectral sensors (Landsat, Sentinel, SPOT, MODIS, and HyMap) as well as LiDAR and RADAR data have been extensively used for land cover mapping at various scales [17,[22][23][24][25][26][27][28][29] Fine-scale mapping is critical to locate and map endangered habitats, particularly with escalating global climate change impacts. Hence, high spatial resolution imagery, such as that of the National Agriculture Imagery Program (NAIP), is important. The NAIP program is managed by the Aerial Photography Field Office (APFO) of the United States Department of Agriculture (USDA). It has 8-bit radiometric and 60 cm spatial resolutions, with four spectral bands (near infrared (NIR), red, green and blue). NAIP data are nominally cloudfree and widely available at no cost [30]. They have been used for wetland mapping, land cover classification, forest-cover-type mapping, forest health monitoring and other resource management projects [31][32][33]. (45°50′27″N, 86°40′30″W) located in the western half of the Hiawatha National Forest, adjacent to the head of Big Bay de Noc. The study area encompasses approximately 3151 ha (77,861 ac) and contains a wide variety of natural habitat communities. Dominant vegetation is composed of upland and lowland deciduous, conifer and mixed forest types as well as palustrine wetlands [55]. The influence of the Great Lakes and various landforms creates distinct climatic zones across the area with summer temperatures ranging from 22 °C (71 °F) near the Great Lakes shoreline to a warmer 27 to 29 °C (81 to 85 °F) inland. Winter temperatures range from an average high of −2 °C (28 °F) to an average low of −11 °C (13 °F). Winters are long and snowy, with average snowfall of 142 cm (56 inches) along the Lakes Michigan and Huron shorelines, to 554 cm (218 inches) near Lake Superior. The study area is part of a glacial lake plain and is a nearly level lake plain that was covered with water from the glacial Lake Algonquin. The soils on the landform are derived from predominantly sandy lacustrine deposits [49,56]. Soil drainage classes range from poorly to excessively drained, and soil pH ranges from neutral to extremely acidic. The area contains numerous wetlands with a complex hydrography.

Imagery
The National Agriculture Imagery Program (NAIP) collects data during the active growing season, also known as "leaf-on" imagery, which is needed for habitat classification. The four-band (red, green, blue and near-infrared) aerial imagery was acquired at approximately 16,000 feet above ground level (AGL) with a Leica ADS100 airborne digital sensor (Leica Geosystems). The image has a spatial resolution of 60 cm with 8-bit radiometric resolution. The spectral range of the 4 bands ranges between 435 and 882 nm ( Table  1). The data were preprocessed by the contractor to reduce radiometric and geometric distortions. NAIP imagery tiles dated 25 July 2018 and 11 August 2018 were downloaded from the United States Geological Service (USGS) EarthExplorer website (https://earthexplorer.usgs.gov/, accessed on 14 July 2021). A LiDAR-derived 1 m digital elevation model (45 • 50 27 N, 86 • 40 30 W) located in the western half of the Hiawatha National Forest, adjacent to the head of Big Bay de Noc. The study area encompasses approximately 3151 ha (77,861 ac) and contains a wide variety of natural habitat communities. Dominant vegetation is composed of upland and lowland deciduous, conifer and mixed forest types as well as palustrine wetlands [55]. The influence of the Great Lakes and various landforms creates distinct climatic zones across the area with summer temperatures ranging from 22 • C (71 • F) near the Great Lakes shoreline to a warmer 27 to 29 • C (81 to 85 • F) inland. Winter temperatures range from an average high of −2 • C (28 • F) to an average low of −11 • C (13 • F). Winters are long and snowy, with average snowfall of 142 cm (56 inches) along the Lakes Michigan and Huron shorelines, to 554 cm (218 inches) near Lake Superior. The study area is part of a glacial lake plain and is a nearly level lake plain that was covered with water from the glacial Lake Algonquin. The soils on the landform are derived from predominantly sandy lacustrine deposits [49,56]. Soil drainage classes range from poorly to excessively drained, and soil pH ranges from neutral to extremely acidic. The area contains numerous wetlands with a complex hydrography.

Imagery
The National Agriculture Imagery Program (NAIP) collects data during the active growing season, also known as "leaf-on" imagery, which is needed for habitat classification. The four-band (red, green, blue and near-infrared) aerial imagery was acquired at approximately 16,000 feet above ground level (AGL) with a Leica ADS100 airborne digital sensor (Leica Geosystems). The image has a spatial resolution of 60 cm with 8-bit radiometric resolution. The spectral range of the 4 bands ranges between 435 and 882 nm ( Table 1). The data were preprocessed by the contractor to reduce radiometric and geometric distortions. NAIP imagery tiles dated 25 July 2018 and 11 August 2018 were downloaded from the United States Geological Service (USGS) EarthExplorer website (https://earthexplorer.usgs.gov/, accessed on 14 July 2021). A LiDAR-derived 1 m digital elevation model (DEM) was obtained from the Natural Resources Conservation Service (NRCS). All datasets were registered to the Universal Transverse Mercator (UTM), Zone 16 North, North American Datum of 1983 (NAD83). The spatial resolution for all the datasets is 60 cm. Where resampling was required, the data were reprojected using a two-dimensional affine coordinate transformation using nearest-neighbor resampling, with a fundamental vertical accuracy of ±24.5 cm, meeting all FGDC (Federal Geographic Data Committee) standards. ERDAS IMAGINE ® 2020 (https://www.hexagongeospatial.com/products/powerportfolio/erdas-imagine, accessed on 14 July 2021) and ArcGIS Pro 2.7 (https://www.esri. com/en-us/store/arcgis-pro, accessed on 14 July 2021) were used for image transformation. IMAGINE's Spatial Model Editor toolbox was used to generate the normalized vegetation index (NDVI) and a modified water index (WINAIP), as well as the gray level co-occurrence matrix (GLCM) textures, aspect and slope (%). ArcGIS Pro tools were used to generate random points (Create Random Points) and to extract multi raster values (Extract Multi Values to Points). The "caret" package [57] of the R programming language [58] was used for MLA implementation.

Utility of Image Transformations
Multiple workflows have been developed by the remote sensing community for delineating and mapping vegetation. They are dependent on environmental factors (e.g., soil characteristics (drainage, slope, pH), hydrography, landform and climatic conditions) [59], as well as field collected data, expert image interpretation (manual and computer assisted) and an appropriate classification scheme [11,12,47,48,[60][61][62]. The delineation and classification workflow used in this study is presented in Figure 2. Inter-class spectral variability is important to differentiate natural communities. However, locally high variance occurs not only due to changes in vegetation but also due to site conditions and the high spatial resolution of the imagery [63]. When selecting training sets, it is important to understand how these factors influence training set statistics and to generate adequate training data to encompass the variation. Having polygon training sets with well-delineated feature boundaries is critical to generate valid training set point matrices for the machine learning algorithms (MLAs) [39].
To fully utilize spectral reflectance information, and reduce redundant information between bands, an integrated approach to spectral transformation techniques (principal components analysis (PCA) and independent components analysis (ICA)) were utilized. PCA is a widely employed transformation using a linear transformation, and it generates uncorrelated components [64,65]. PCA-derived components have been used to map wetlands vegetation, assess change detection and evaluate vegetation anomalies and identify geological features [66][67][68][69][70]. PCA uses second-order statistics and assumes the data are normally distributed and correlated. PCA has been utilized since the launch of the Landsat 2 Multispectral Scanner which consisted of 4 bands [71]. ICA considers higher-order statistics, and each transformed component is considered to be non-Gaussian [72]. ICA has been shown to identify details in an image even when the feature occupies a small area [73]. However, it has not been extensively used in vegetation and land use/cover classification [70,[74][75][76]. Both PCA and ICA were used to draw accurate training classes for the MLA. The components of the PCA transformation were not used in the classification area [73]. However, it has not been extensively used in vegetation and land use/cover classification [70,[74][75][76]. Both PCA and ICA were used to draw accurate training classes for the MLA. The components of the PCA transformation were not used in the classification process, as components 3 and 4 contained information critical to delineating the emergent marsh natural habitat communities.

Vegetation and Moisture Indices
Spectral indices have been used extensively to evaluate, monitor and map vegetation both qualitatively and quantitatively [77]. Two spectral indices, the normalized vegetation index (NDVI) and a modified water index (WINAIP), were derived from the NAIP bands. NDVI indicates the biomass abundance and vigor, as well as differentiating vegetation from non-vegetated areas [78,79]. NDVI was generated using NAIP bands 4 (NIR) and 1 (red). The WorldView water index (WV-WI) was proposed by Wolf [80] and uses WorldView satellite band combinations of coastal blue (425 nm) and the second near-infrared channel (NIR2) (950 nm). Previous water indices for satellite imagery were based on NDWI [81] which used NIR and short-wave infrared (SWIR) channels. Based on Wolf's index, a custom water index was developed for this study by replacing the coastal blue band (425 nm) band with NAIP blue (435 nm) and the NIR2 (950 nm) band with NAIP NIR (882 nm). The modified index is referred to as the WINAIP.

Vegetation and Moisture Indices
Spectral indices have been used extensively to evaluate, monitor and map vegetation both qualitatively and quantitatively [77]. Two spectral indices, the normalized vegetation index (NDVI) and a modified water index (WINAIP), were derived from the NAIP bands. NDVI indicates the biomass abundance and vigor, as well as differentiating vegetation from non-vegetated areas [78,79]. NDVI was generated using NAIP bands 4 (NIR) and 1 (red). The WorldView water index (WV-WI) was proposed by Wolf [80] and uses WorldView satellite band combinations of coastal blue (425 nm) and the second near-infrared channel (NIR2) (950 nm). Previous water indices for satellite imagery were based on NDWI [81] which used NIR and short-wave infrared (SWIR) channels. Based on Wolf's index, a custom water index was developed for this study by replacing the coastal blue band (425 nm) band with NAIP blue (435 nm) and the NIR2 (950 nm) band with NAIP NIR (882 nm). The modified index is referred to as the WINAIP.

Texture
Plant communities often share similar spectral reflectance characteristics, which leads to confusion and misclassification. In such scenarios, texture differences may be useful, even critical, for correct classification. A gray level co-occurrence matrix (GLCM), originally referred to as a "gray-tone spatial dependence matrix" by Haralick et al. [82], uses secondorder metrics to analyze relationships between pairs of pixels and computes both angular relationships and directionality to measure the spectral and physical distance between two neighboring pixels [83]. GLCM texture analysis has been used in numerous studies to classify various features including urban areas, forests, agriculture and wetlands [61,[82][83][84].
The GLCM texture was calculated from the first and second NAIP PCA components. These components explained 94% (component 1) to 96.74% (components 1 and 2) of the variability in the image. Three different GLCM measures of texture (contrast, entropy and standard deviation) were calculated with PCAs 1 and 2. GLCM was performed in ERDAS IMAGINE (Hexagon Geospatial, 2020), with a grayscale level of 32 and 3 × 3, 5 × 5 and 7 × 7 processing windows. Two different Euclidean geometry XY offsets (2, 2 and 2, −2) were used. The derived means from the texture measures were used in the classification.

Topographical Characteristics
At fine scales, natural habitat communities are controlled by soils, local topography, lake effect climate zones and, in some instances, past disturbances such as fire and windthrow. Extensive glaciation in the Upper Midwest has created a wide variety of surficial geologic features. These features are made up of interacting landform patterns, which have been described [56,85] as combinations of "relief-topography (surface shape) and geological parent material". Rowe [10,56,86] stated that "repetitive patterns in vegetation can be traced directly to repetitive patterns of topography associated with specific types of surficial material of landforms". Drawing on this body of research, a 1 m DEM generated from LiDAR imagery collected in 2015 helped identify important topographic features with high positional accuracy. The DEM was downloaded from the National Resources Conservation Service. Hillshade (multi-directional oblique weighted-MDOW), aspect and slope (%) were generated from the DEM. Aspect and slope were evaluated as input ancillary data into the MLAs and hillshade aided in manual training set delineation.

Selection of Input Ancillary Data Layers
When imagery alone cannot adequately classify the data, the incorporation of ancillary data is important. Once ancillary data layers are selected, understanding their contribution towards classification is important [87], with the goal of optimizing and creating an accurate classification. Variable or ancillary data selection has been used in many applications including data mining and machine learning, network anomaly detection, natural language processing, bioinformatics and image processing [87]. There are many different feature selection methods available [88].
In this study, joint mutual information maximization (JMIM) [89], a filter-based variable selection method, was implemented in R using the "praznik" package [90]. This method uses "mutual information" and "maximum of the minimum" criteria to calculate the contribution of each input variable. This method evaluates each variable's importance. The JMIM method was performed prior to running the MLA classification. Variables with higher JMIM values were used as ancillary data inputs. Along with the JMIM method, the "varImp" function was used [89] from the "caret" package [57,91], which evaluates the implemented MLA and identifies the best ancillary data layers for improving the classification [88,91]. The varImp method is implemented after the RF classification is performed. The scores produced for each variable helped select various variable combinations for the MLAs classifications.
Based on joint mutual information maximization (JMIM) and "varImp" scores, the following variables were used as inputs in the machine learning algorithms (MLAs): • • Modified water index (WINAIP) derived from NAIP imagery.
Feature selection methods have been primarily used in geology applications to reduce the complexity of the dataset where there are large numbers of bands (i.e., hyperspectral data) [70]. In this study, both feature selection methods were implemented. However, as this is an uncommon approach for natural resource applications [92][93][94], various combinations of the ancillary data were also manually selected and evaluated, and listed in the results section.

Collection of Training Samples
An adequate number of training samples is crucial to achieve optimum classification results. As a "rule of thumb" in ML, the number of training samples should be 10 times (preferably 100 times) the total number of variables [39,95]. How the training samples are selected also plays a crucial role and requires expert knowledge of the spectral and spatial variation within and between the natural communities [13]. Additionally, the choice of classification algorithm, the number of input variables and the spatial extent of each natural habitat community influences the number of training samples required [39,96]. Researchers have noted that large and accurate training datasets are preferable, regardless of the MLA used [39,97,98]. Training samples were generated within manually delineated training set polygons ( Figure 3) using stratified random sampling in ArcMap 10.7.1, with a minimum of 5 m distance between points. Different spectral enhancement techniques (PCA, ICA), soils, elevation data and ground truth data were utilized to delineate the training polygons. Table 2 presents the natural communities within the study area and the number of training and test points collected for each. The points were assigned natural habitat community class names and IDs. Using the X and Y locations of the training sample points, pixel values for the NAIP imagery and the ancillary data layers were extracted for use in the MLAs.
Feature selection methods have been primarily used in geology applications to reduce the complexity of the dataset where there are large numbers of bands (i.e., hyperspectral data) [70]. In this study, both feature selection methods were implemented. However, as this is an uncommon approach for natural resource applications [92][93][94], various combinations of the ancillary data were also manually selected and evaluated, and listed in the results section.

Collection of Training Samples
An adequate number of training samples is crucial to achieve optimum classification results. As a "rule of thumb" in ML, the number of training samples should be 10 times (preferably 100 times) the total number of variables [39,95]. How the training samples are selected also plays a crucial role and requires expert knowledge of the spectral and spatial variation within and between the natural communities [13]. Additionally, the choice of classification algorithm, the number of input variables and the spatial extent of each natural habitat community influences the number of training samples required [39,96]. Researchers have noted that large and accurate training datasets are preferable, regardless of the MLA used [39,97,98]. Training samples were generated within manually delineated training set polygons ( Figure 3) using stratified random sampling in ArcMap 10.7.1, with a minimum of 5 m distance between points. Different spectral enhancement techniques (PCA, ICA), soils, elevation data and ground truth data were utilized to delineate the training polygons. Table 2 presents the natural communities within the study area and the number of training and test points collected for each. The points were assigned natural habitat community class names and IDs. Using the X and Y locations of the training sample points, pixel values for the NAIP imagery and the ancillary data layers were extracted for use in the MLAs.  The training samples derived from the input ancillary data layers were used to train both random forest (RF) and support vector machine (SVM). Parameter optimization, validation and accuracy assessment were performed for both algorithms. The "caret" package [57] available in the "R" programming language was used to implement the MLAs. RF and SVM classifiers are discussed briefly below.

Random Forest
RF is an ensemble classifier method developed by Brieman [99], which uses a set of nonparametric classification and regression tree (CART) rules to make predictions [100]. Decision trees (DTs) are generated using the values of a random vector sampled independently from the input vector and distributed equally among all the trees in the forest [35,38,101,102]. The algorithm then uses the majority of votes from the tree's assemblages and assigns that value to each of the unknown vectors [39,47]. Random forest works with a "bagging" (or bootstrap aggregating) approach, which generates training datasets by randomly drawing replacements for the original training set for each selected feature/feature combination [38,99,103]. RF uses two thirds of the data to train the trees and the remaining one third of the data to provide an independent estimate of overall accuracy (OA) [34,39]. The classifier is computationally efficient but can be prone to overfitting [104,105]. The forest can grow to a user-defined largest number of trees (Ntree) by optimizing the number of RF-created trees exhibiting high variance and low bias [34,99]. The m try parameter controls the number of variables randomly selected at each split in the tree-building process and has a sensitive influence on RF's performance. It can be adjusted if needed, as part of the tuning process [106,107]. RF's advantages include its ability to work with spatially large, complex datasets with correlated variables, to rank variable importance and to improve classification accuracies [39,99]. We used the default parameters provided with the "rf" method in the caret package and used "center" and "scale" to standardize the ancillary data [57,108]. RF tends to work robustly without optimization parameters [39].

Support Vector Machine
SVM is a supervised machine learning algorithm developed by Vapnik [109], based on statistical learning theory. The classifier is inherently binary and tries to identify a boundary or a hyperplane which separates two classes that are closest in the feature space. The hyperplanes associated with the class are parallel to the optimal separating hyperplane and the samples located on these hyperplanes are called support vectors [109,110].
However, in the real world, data distributions are often non-linear and noisy, and may not be easily separated, which promotes overfitting. Projecting the input data to a higher dimension feature space helps overcome this issue, assuming that a linear boundary exists in the higher dimensional feature space [39,111]. If a linear higher dimensional feature space does not exist, a kernel function can be used, and for this study a radial basis function (RBF) kernel was used [112]. We used the default "svmRadial" method from the caret package [57]. SVM contains two important model parameters: "cost" (C) and "sigma" (σ).
Higher C values can lead to a more complex decision boundary and less generalization [39], whereas a higher σ affects the overall shape of the separating hyperplane and may influence overall accuracy. The "caret" package estimates approximate values for the cost and σ parameters directly [57,108,113]. RBF parameters are determined by a grid search algorithm using an m-fold cross-validation (m-FCV) approach. The grid search method tests different pairs of parameters, and the one with higher cross-validation accuracy is selected [114]. Centering and scaling were performed to standardize the ancillary data [57,108].
The cross-validation procedure prevents overfitting issues with the data [110,115]. To select and evaluate optimum parameters, a 10-fold cross-validation procedure was used for both RF and SVM. This is the recommended number of folds for comparing machine learning algorithm performances [116].

Accuracy Assessment and Classification Differences
Thematic accuracy assessment is an important component in evaluating the "correctness" of a classification. Products with low accuracy have limited or no utility to the end user, as they are composed of misinformation. Stratified random sampling was selected for the accuracy assessment points. This sampling approach ensures unbiased sample selection and adequate sampling for each habitat, since a minimum number of samples is specified for each class [117,118]. Using the methodologies developed by Congalton and Green [13], error matrixes, overall accuracies, kappa and Z-scores with 95% confidence limits were calculated.
Differences in the output classifications occur, as the MLA algorithms do not process the data the same way. Processing differences do create different end products. Differences between the two MLA outputs were assessed using ArcGIS Pro 2.7.

Classification Post-Processing
Classified data, including the natural habitat community classifications resulting from the MLAs, manifested a "salt and pepper" appearance due to the spectral variability encountered by the classifier [66,119]. A single pixel, due to its small area, doesn't provide useful information to a resource manager and may have little or no relation to the actual mapping of a natural habitat community [13,120,121]. It is common practice to remove noise using a majority filter and a specified window size [66] to create homogenous polygons suitable for accuracy assessment. To make the natural habitat community classes easily identifiable, and the final map more useful to the end user, we applied a 7 × 7 moving window majority filter.

Results
Natural habitat community classification was evaluated using RF and SVM with various input ancillary datasets including NAIP spectral bands, topographical layers, textural measures and spectral indices, to classify the natural communities ( Table 3). The training and testing datasets were split 75% and 25%. The appropriateness of the input datasets was evaluated based on the overall accuracy (OA) and kappa (k) of the test samples (5801 pixels). OA indicates the ratio between the total number of pixels and the total number of accurately classified pixels, and the k (kappa) is a measure of the agreement between the accurately classified data and the reference data [13]. Along with OA and k, the UA and PA for each community class were also evaluated. User's accuracy (UA) and producer's accuracy (PA) show the accuracy of each community class, as described by Congalton [13]. UA and PA show the commission and omission errors of individual classes, respectively. Z-scores with a 95% CI were also evaluated. Along with these, a difference map was generated to evaluate the classifier differences. Table 3. Overall accuracy (OA) and kappa coefficient (k) for the input variable combinations. Input 4 (bolded) has the highest OA and k and was used for the final natural communities classification. Abbreviations: Asp-aspect, Slp-slope, Tex-GLCM texture (contrast, entropy, standard deviation 7 × 7), Tex1-contrast (7 × 7), Tex2-contrast (3 × 3), Tex3-contrast (5 × 5).

Ancillary Data and Feature Selection Methods
A total of 19 ancillary datasets ( Figure 4) were evaluated in this study. The contrast texture image provided the most detailed information compared to entropy and SD. Hence, two more images were generated using 3 × 3 and 5 × 5 moving windows to evaluate impact of window size on the natural habitat community MLA classifications. JMIM and varImp scores were calculated for all of these input ancillary datasets from highest to lowest scores. Classifications using RF and SVM were generated using all possible combinations of the above-mentioned ancillary data layers (Table 3) to verify the robustness of the feature selection methods. Ancillary data layers which had lower scores in the feature selection method (Figure 4) showed similarly poor performance in the MLA classifications (Tables 3 and 4).

Classifications Results
The lowest accuracy occurred when only the National Agriculture Imagery Program (NAIP) bands were used for classification and supports the need for ancillary data incorporation. Using only the four multispectral NAIP bands (Input 1) led to a low OA (56.02%, k = 0.46) with RF and with SVM (OA = 59.01%, k = 0.49) ( Table 3). The addition of slope, aspect, normalized difference vegetation index (NDVI), NAIP water index (WINAIP) and texture improved the OA. The highest classification accuracies included those using all the input ancillary data (Input 3) and those using a combination of derived indices and textures. Tables 3 and 4 shows the overall accuracy (OA), kappa (k) and confidence intervals (CI) and other supporting statistics the natural habitat community classifications derived from various input variable combinations. RF has OAs and associated k values ranging from 56.02 % (k = 0.46) for Input 1 to 79.96% (k = 0.75) for Input 3. SVM values were between an OA of 59.01% (k = 0.49) for Input 1 and 75.85% (k = 0.70) for Input 4. Using Cohen's categorization of kappa value ranges, the classifications from Inputs 2, 3 and 4 (Table 4) show substantial agreement.
Incorporating the DEM with the NAIP bands provided a significant improvement in OA (12 to 17% increase) regardless of the MLA. Using slope and aspect with the NAIP bands did not significantly improve the OA. Similarly, vegetation and water indices did not differ significantly compared to those using only the four NAIP bands. When all variables (Table 3), including the four NAIP bands, three topographic layers, ten texture layers and two indices were used, the OA improved to almost 80% for RF and 74% for SVM (Table 3). Figure 5 shows the graphical representation of OA and k for NAIP bands, all ancillary dataset and the final classification approach. Input 4 ancillary data layers were used for the final classification (Figures 6 and 7), giving 79.45% OA for RF and 75.85% for SVM. Between Input 3, which used nineteen ancillary data layers, and Input 4, which used only nine ancillary data layers, no significant statistical differences were observed (Tables 3 and 4). R-red, G-green, B-blue, NIR-near-infrared, DEM-digital elevation model, Slope-slope, Aspect-aspect, C1-contrast texture (PC1, 7 × 7 moving window), C2-contrast texture (PC2, 7 × 7 moving window), Ent1*7-entropy texture (PC1, 7 × 7 moving window), Ent2*7-entropy texture (PC2, 7 × 7 moving window), SD1*7-standard deviation texture (PC1, 7 × 7 moving window), SD2*7-standard deviation texture (PC2, 7 × 7 moving window), NDVI-normalized difference vegetation index, WINAIP-modified water index-NAIP, C1*3-contrast texture (PC1, 3 × 3 moving window), C2*3-contrast texture (PC2, 3 × 3 moving window), C1*5-contrast texture (PC1, 5 × 5 moving window), C2*5-contrast texture (PC2, 5 × 5 moving window). Note: varImp score for Ent 2 = 0 (no bar). Table 4. Kappa values, associated Z-scores and 95% confidence intervals for the input variable combinations. The pairwise Z-score indicates whether the classifications from RF and SVM with the same input variables are statistically different. Abbreviations: Asp-aspect, Slp-slope, Tex-GLCM texture (contrast, entropy, standard deviation 7 × 7), Tex1-contrast (7 × 7), Tex2-contrast (3 × 3), Tex3-contrast (5 × 5).  Z-scores with a 95% confidence limit were calculated for all k values and are shown in Table 4. The scores for each classification, regardless of input variable combination, show the classification to be meaningful and significantly better than a random classification. Pairwise Z-scores are also presented in Table 4. The scores indicate that all of the classification results for RF and SVM with the same input variables were not statistically different, as all have an absolute value < 1.96. Figures 6 and 7 present the final classification results from Input 4 data (Table 3) for RF and SVM. There are five natural habitat community classes (Figures 6 and 7), with four non-habitat community classes (Open Land, Impervious, Inland Lake and Open Water) which are not considered natural habitats under the current classification scheme but reflect human influences on the landscape, such as agricultural/open land and roads, and must be included. Input 3 led to a slightly higher OA, but this was not statistically significant (Z-score) compared to Input 4, which had fewer variables. Fewer variables shorten the computing time, which is an important consideration for classification of large areas. However, the kappa and Z-score are only two measures of classification quality; they only consider the overall classification and not the accuracies of the individual natural habitat community classes. Table 4. Kappa values, associated Z-scores and 95% confidence intervals for the input variable combinations. The pairwise Z-score indicates whether the classifications from RF and SVM with the same input variables are statistically different. Abbreviations: Asp-aspect, Slp-slope, Tex-GLCM texture (contrast, entropy, standard deviation 7 × 7), Tex1-contrast (7 × 7), Tex2-contrast (3 × 3), Tex3-contrast (5 × 5).        Table 5 presents the error matrix, including user's accuracies (UA) and producer's accuracies (PA) for the final RF and SVM natural habitat communities classification. A total of 5,801 randomly selected ground truth points were used for the accuracy assessment (Table 5). Each natural habitat community class had at least the minimum number points to create a 95% confidence interval and was based on the percentage area of each class with respect to the total area.   Table 5 presents the error matrix, including user's accuracies (UA) and producer's accuracies (PA) for the final RF and SVM natural habitat communities classification. A total of 5,801 randomly selected ground truth points were used for the accuracy assessment (Table 5). Each natural habitat community class had at least the minimum number points to create a 95% confidence interval and was based on the percentage area of each class with respect to the total area. The PAs and UAs calculations show the accuracies of the natural habitat communities can be split into two groups: those with accuracies above 85% and those with accuracies between 60 and 75% (Table 5). Open Land (OL), Emergent Marsh (EM), Mesic Northern Forest (MNF), Inland Lake (IL), Open Water (OW) and Impervious Surface (IS) have PAs between 84 and 100%. Rich Conifer Swamp (RCS), Poor Conifer Swamp (PCS) and Northern Shrub Thicket (NST) have PAs between 41 and 81%. These lower PA accuracies can be explained by the presence of speckled alder (Alnus incana), red osier dogwood (Cornus sericea), white pine (Pinus strobus) and red maple (Acer rubrum) across all three communities [12]. Sphagnum mosses (Sphagnum spp.), bunchberry (Cornus canadensis), balsam fir (Abies balsamea), paper birch (Betula papyrifera), black spruce (Picea mariana), huckleberry (Gaylussacia baccata) and sensitive fern (Onoclea sensibilis) are found in both Rich and Poor Conifer Swamp communities [12]. The presence of these species in both communities contributes to the lower UA and PA (Table 5). Also contributing to the Northern Shrub Thicket's lower UA and PA was misclassification between it and Emergent Marsh. This is due to large areas of spatial intermixing of the two communities. Similar observations were made from the classifier difference map (Figure 8), where majority of confusion with the SVM classification was observed in RCS, PCS and NST natural habitat communities. Areas in red show where differences in the natural habitat communities occur between the MLAs. Otherwise, the classification results were in agreement.

Input
both Rich and Poor Conifer Swamp communities [12]. The presence of these species in both communities contributes to the lower UA and PA (Table 5). Also contributing to the Northern Shrub Thicket's lower UA and PA was misclassification between it and Emergent Marsh. This is due to large areas of spatial intermixing of the two communities. Similar observations were made from the classifier difference map (Figure 8), where majority of confusion with the SVM classification was observed in RCS, PCS and NST natural habitat communities. Areas in red show where differences in the natural habitat communities occur between the MLAs. Otherwise, the classification results were in agreement.

Feature Selection and Imprtance of Ancillary Datasets
Ancillary datasets such as NAIP (National Agriculture Imagery Program) multispectral bands, DEM, aspect, slope, spectral indices and PC1-and PC2-based texture layers were selected by implementing the feature selection method. The utility of these input datasets was determined and compared for the automated MLAs. The ancillary data were selected not only based on the variables' importance but also on the OA and k values as well. Both of the feature selection methods (i.e., JMIM and varImp) proved to be important for observing differences in ancillary data contributions to the MLA accuracies (Figure 4a,b).
The JMIM (Figure 4a) scores are independent of the classification algorithm and are generated prior to running the MLAs. The ranking serves as a guide to the potential contribution of the variables and assists in initial variable selection. This is important when there are numerous inputs to choose from. By contrast, the varImp (Figure 4b) ranks the importance of input variables only for random forest and is generated after the classification is performed. This ranking allows confirmation of the input variable's importance and validates the original selection of the input variables. The order of variable listing is not expected to be the same when calculated at different points in the workflow. However, the first nine listed variable rankings are closely similar for both feature selection approaches. The approach also confirms the lower contribution of the useful information of certain variables such as slope and aspect.
Regarding spectral indices, both NDVI and WINAIP were helpful in improving the accuracies. NDVI was critical for calculating the green biomass present in the area, and it assisted in discriminating Mesic Northern Forest from Northern Shrub Thicket. For WINAIP, even though the NAIP bandwidths did not match as the WV-WI bandwidths, initial results showed that the modified index contributed to differentiating standing waterbodies, emergent marsh and shadows efficiently. Of the three calculated GLCM texture measures, contrast provided the most useful information for classification. Emergent Marsh, Northern Shrub Thicket and Water had smooth textures and hence low mean contrast values, whereas Mesic Northern Forest, and Rich and Poor Conifer Swamps had high mean contrast values, due to a rough texture. Average mean contrast values in the study area ranged from 0 to 285. To determine the best texture window size and evaluate finer-scale changes in texture, three different moving window sizes were evaluated (3 × 3, 5 × 5 and 7 × 7) using contrast. The largest window size (7 × 7) performed well for entropy and standard deviation (Figure 4a,b) providing unique information. However, contrast texture outperformed both entropy and standard deviation in differentiating natural habitat communities. Only the first two PCA components were used to generate GLCM texture. PCs 1 and 2 explained 96.74% of the variability in the original NAIP data. The feature selection methods (JMIM and varImp) may not necessarily improve accuracy, but they helped to reduce the model complexity by allowing a selection of variables which contributed the greatest amount of information to the classification.

MLAs Classifier Performance
Overall, random forest (RF) outperformed support vector machine (SVM), including giving the classification with the highest overall accuracy (OA) and kappa value (k). In comparison to the SVM model, the RF model was more robust at handling a higher number of ancillary datasets (Table 3). Our observation showed that the DEM can be considered as important ancillary data in classifying and increasing the model accuracy for the community types. The smallest changes in elevation reflect a variation in soil types and drainage patterns which can influence the natural habitat community of the area [47]. The model which was used for the final classification (Input 4), showed a 6.36% increase in OA for RF and 4.64% for SVM, compared to Input 2 where we only used two ancillary data layers. Therefore, users who wish to increase the model OA in similar environmental conditions should consider using elevation as ancillary data, along with texture and spectral indices. Of the three GLCM textures, contrast proved to be the most useful, followed by entropy and standard deviation (Figure 4).
Both RF and SVM showed major confusion between RCS, PCS and NST classes. RCS and PCS have a variety of tree species of similar types, which can cause confusion in the classification. NST is another community class which shares similar vegetation types with RCS and PCS, and also showed lower UA and PA (Figure 9a,b). Due to the close spectral similarities between these classes, they have a lower UA and PA overall compared to other classes (Figure 9a,b). Classes with higher spectral dissimilarity showed maximum accuracy (i.e., OL, EM, MNF and IL) and vice versa. Considering the overall UA and PA for the community classes, we see that in general, RF (Figure 9a) performed better than SVM (Figure 9b). For example, all five natural habitat community classes (EM, RCS, PCS, NST and MNF) showed better or similar UA and PA values compared to SVM (Figure 9a,b).
The final classified maps from the RF ( Figure 6) and SVM (Figure 7) algorithms show that RF delineated the natural community boundaries better than SVM. Figure 8 shows the classification difference map between RF and SVM. Areas of disagreement are shown in red. Most of the confusion occurs between Rich Conifer Swamp, Poor Conifer Swamp and Northern Shrub Thicket, and there are errors of both commission and omission in the accuracy assessment matrix (Table 5). These areas occur along the glacial lake shoreline (west side of study area), in the riparian area of the Sturgeon River (center of study area) and on moraines (southeast corner of study area). Areas such as Open Land, Open Water, Impervious Surface and Emergent Marsh show agreement across the landscape for both MLAs. Higher confusion was observed in the SVM accuracy assessment matrix, resulting in lower PA and UA compared to RF. It is important also to visually assess the final classifications, not just the matrices, to fully understand the MLA performance.
RCS and PCS, and also showed lower UA and PA (Figure 9a,b). Due to the close spectral similarities between these classes, they have a lower UA and PA overall compared to other classes (Figure 9a,b). Classes with higher spectral dissimilarity showed maximum accuracy (i.e., OL, EM, MNF and IL) and vice versa. Considering the overall UA and PA for the community classes, we see that in general, RF (Figure 9a) performed better than SVM (Figure 9b). For example, all five natural habitat community classes (EM, RCS, PCS, NST and MNF) showed better or similar UA and PA values compared to SVM (Figure 9a,b). The final classified maps from the RF ( Figure 6) and SVM (Figure 7) algorithms show that RF delineated the natural community boundaries better than SVM. Figure 8 shows the classification difference map between RF and SVM. Areas of disagreement are shown in red. Most of the confusion occurs between Rich Conifer Swamp, Poor Conifer Swamp and Northern Shrub Thicket, and there are errors of both commission and omission in the accuracy assessment matrix (Table 5). These areas occur along the glacial lake shoreline (west side of study area), in the riparian area of the Sturgeon River (center of study area) and on moraines (southeast corner of study area). Areas such as Open Land, Open Water,

Overall Performance of MLAs for Natural Habitat Communities Classification
The classification results are similar to the outcomes of previous studies where RF outperformed the SVM model. The SVM model tends to work better with a smaller number of classes, whereas RF can work with a larger number of classes [39]. In the past, Rodriguz-Galiano et al. [22,23] successfully used RF to map and classify 14 classes using Landsat TM and ancillary datasets, with a high OA. Berhane et al. [47] discriminated 22 wetland classes and achieved the highest OA using RF with WorldView-2 multispectral data, along with various ancillary data. Higher accuracies with RF classifications compared to SVM classifications were also observed by Adam et al. [122] for land-use/cover classification. Hayes et al. [44] used the RF classifier to successfully classify nine landcover classes using NAIP bands and additional ancillary data such as spectral indices, elevation data, texture, etc. Land cover classification is common in the remote sensing community, but this is the first time the natural communities of Michigan classification system [12] has been used to classify a complex Laurentian Mixed Forest system at the natural habitat community level.

Importance of Reference Vegetation Map
The color infrared (CIR) combination (R:4, G:3, B:2) of high-resolution multispectral data and high spectral contrast from the feature extraction techniques (PCA, ICA) helped differentiate the natural habitat community types. Ancillary data such as the soils map (soil moisture, pH, drainage), as well as prior fieldwork and knowledge of the study site contributed important information. Figure 3 shows the reference natural habitat commu-nity map, where nine different classes were delineated, including five community types ( Table 2). The component combination (R:4, G:3, B:2) from PCA and ICA ( Figure 10) transformation showed outline boundaries between forest, marsh, swamps, thickets, open land and water groups. Figure 10 shows how the differences in texture from the five natural habitat communities were enhanced and visualized using the PCA and ICA components, compared to the original NAIP imagery. This distinction between different natural habitat communities permits more accurate boundary delineation for better training data, compared with traditional maps, which are drawn at larger scale and have lower spatial resolution (i.e., NWI and NLCD maps). Both RF and SVM models require good training data in order to perform well [39].

Impact of Number of Training Samples and Quality of Sample Data on MLAs
The reference vegetation map was used to generate random training points for the nine classes within delineated polygons. Data from field validation, expert knowledge and a limited, very high spatial resolution UAS dataset [55] for the area provided a reliable and accurate reference map. The training points (total number = 23,214) were divided into groups of 75% and 25%, respectively, for the training and testing datasets ( Table 2). Potentially, an increase in the number of training and testing points could achieve higher classification accuracies. For example, this study used a minimum distance of 5 m between the randomly generated points for each community class. If the distance was decreased to 3 m, it would allow a greater number of points for training and testing. Researchers have shown in the past that training sample size can play a crucial role in the classification accuracies for supervised machine learning algorithms [39,96]. The required number of training data points also depends on the classification algorithm, number of ancillary datasets and the size and complexity of the study area [96].

Validation of Classified Community Vegetation Map Using Field Data and Expert Observation
The study area is in a remote rural location where land use and cover changes are minimal. Parts of the study area are protected from exploitation and development. There is limited access, which constrains field verification due to a lack of roads and a complex network of land ownership where most private lands cannot be traversed, and many of the natural communities also prohibit ground truthing for safety reasons. Where possible, ground surveys were conducted during the summers of 2018, 2019 and 2020. However, field work in 2020 and 2021 was severely curtailed due to the COVID-19 pandemic. Additional reference data were collected by observations made by an expert interpreter of the NAIP imagery outside of the reference area polygons, as well as using United States Forest Service (USFS) stand compartment maps [53]. This was the first time the Laurentian

Impact of Number of Training Samples and Quality of Sample Data on MLAs
The reference vegetation map was used to generate random training points for the nine classes within delineated polygons. Data from field validation, expert knowledge and a limited, very high spatial resolution UAS dataset [55] for the area provided a reliable and accurate reference map. The training points (total number = 23,214) were divided into groups of 75% and 25%, respectively, for the training and testing datasets (Table 2). Potentially, an increase in the number of training and testing points could achieve higher classification accuracies. For example, this study used a minimum distance of 5 m between the randomly generated points for each community class. If the distance was decreased to 3 m, it would allow a greater number of points for training and testing. Researchers have shown in the past that training sample size can play a crucial role in the classification accuracies for supervised machine learning algorithms [39,96]. The required number of training data points also depends on the classification algorithm, number of ancillary datasets and the size and complexity of the study area [96].

Validation of Classified Community Vegetation Map Using Field Data and Expert Observation
The study area is in a remote rural location where land use and cover changes are minimal. Parts of the study area are protected from exploitation and development. There is limited access, which constrains field verification due to a lack of roads and a complex network of land ownership where most private lands cannot be traversed, and many of the natural communities also prohibit ground truthing for safety reasons. Where possible, ground surveys were conducted during the summers of 2018, 2019 and 2020. However, field work in 2020 and 2021 was severely curtailed due to the COVID-19 pandemic. Additional reference data were collected by observations made by an expert interpreter of the NAIP imagery outside of the reference area polygons, as well as using United States Forest Service (USFS) stand compartment maps [53]. This was the first time the Laurentian Mixed Forest of the Hiawatha National Forest (HNF) has been classified using the natural communities classification system [12]; as a result it was not possible to directly compare it to any previously available maps.

Future Work
Future research will involve assessing the robustness of this classification approach using other study sites with different natural communities, varying landforms and soil conditions. Consideration will be given to mapping spatially larger areas such as an entire watershed, as well as looking at smaller areas with limited natural communities such as fens. In Michigan, there are five different fen communities [12], and they can exist adjacent to each other. Being able to accurately map natural communities at different scales is important, in order to understand, describe, document and restore natural habitat community diversity.
Consideration must also be given to what classes should be incorporated into the classification for areas influenced by modern anthropogenic distributions such as agriculture, mining and development. Non-natural habitat community classes were added as needed in this study. This is not a robust approach. These classes should be well defined, mutually exclusive and hierarchical in structure.

Conclusions
Community classification for Laurentian Mixed Forest is challenging, due to the complexity of the landscape. In this paper, for the first time, a natural habitat community level classification using an integrated approach of spectral transformation and enhancement techniques, field data, ancillary datasets and MLAs was implemented. Feature selection methods such as JMIM and varImp were used to evaluate the utility of a wide variety of ancillary data including elevation, various measures of texture, and vegetation and soil moisture indices and to guide the selection of the best-performing ancillary data. Highspatial resolution data and machine learning algorithms contributed to a successful and accurate classification.
Five complex natural habitat communities and four non-natural habitat communities were successfully classified. Due to the spectral limitation of the four NAIP bands, the classification showed confusion between similar natural habitat communities (e.g., Rich Conifer Swamp vs. Poor Conifer Swamp vs. Northern Shrub Thicket), with accuracies ranging from 72.55% down to 64.70% (Table 5). Discrimination between Mesic Northern Forest, Emergent Marsh, Impervious, Open Land and Water (Open Water and Inland Lake) had higher accuracies (100% to 89.36%) ( Table 5).
RF and SVM both showed promising performance for classifying a complex Laurentian Mixed Forest community. RF outperformed SVM in the final classification results but SVM performed well when there were fewer ancillary datasets. The choice of MLA may vary for users, depending on the site, type of communities being mapped, number of ancillary datasets and quality of the training data. In R, parameter optimization is allowed and can help provide better performance, but the use of optimization parameters may increase the processing times of the classifiers. In this study, we used the default parameters of the classifiers, as they were accurate enough and achieved the desired results. We also believe that using more spectral bands might improve the classification and could help overcome the complexity of the vegetation classes.