Wetlands are a vital natural resource providing habitat for a variety of wildlife and plants, flood and storm surge protection, water quality improvement through treatment of runoff, and recharge of aquifers [1
]. However, a significant number of wetlands in the U.S. have been destroyed or repurposed for agricultural or development purposes [2
]. The need to protect wetlands is widely recognized and required by federal law and regulations, specifically through Section 404 of the Clean Water Act [3
]. Section 404 of the Clean Water Act sets forth a goal of maintaining the nation’s remaining wetland base by avoiding adverse impacts to these ecosystems. To comply with regulations, entities including state departments of transportation (DOTs) must consider potential impacts to wetlands in their infrastructure development projects. DOTs in particular, as well as other organizations, must sufficiently prove that a selected construction plan is the Least Environmentally Damaging Practical Alternative (LEDPA) by, among other tasks, providing wetland delineations [4
]. The U.S. Army Corps of Engineers (USACE) evaluates these corridors as the governing authority in wetland permitting.
Although there are a variety of wetland types, all wetlands share common environmental characteristics based on the interaction of hydrology, vegetation, and soil [5
]. USACE guidelines for wetland delineation use these common features; they are based on the presence of hydrologic conditions that inundate the area, vegetation adapted for life in saturated soil conditions, and hydric soils [6
]. Field verification is the most accurate method to confirm these diagnostic environmental characteristics; however, performing detailed field delineations for large regions can be costly in terms of resources and time. The creation of a screening tool that leverages nationally available georeferenced datasets and modern classification algorithms could aid in the impact assessment process by allowing agencies to target field mapping efforts to smaller areas within a larger region identified as potential wetland areas through the screening process.
The U.S. Fish and Wildlife Service (USFWS) National Wetland Inventory (NWI) is the best example of a national scale inventory of wetland locations in the United States. Initiated in 1974, the NWI is one of the earliest and most commonly used sources of wetland data in the U.S. [7
]. NWI maps were intended to provide biologists and others with information on the distribution and type of wetlands to aid in conservation efforts [8
]. However, these data were never intended to map federally regulated wetlands [6
], and research has shown that relying solely on the NWI may fail to protect a significant fraction of wetlands in the U.S. [10
]. Limitations of the NWI can be attributed to reliance on manual photointerpretation, which is subjective and may fail to identify certain types of wetlands [11
]. Furthermore, the NWI is not funded at a level that would be necessary to conform to the federal wetland mapping standard [8
Coupling nationally available geospatial data with machine learning can offer the opportunity to identify areas within larger regions that have a high likelihood of including wetlands in an automated and repeatable way. Remote sensing is recognized as one of the most useful information sources for wetland identification by the USACE [6
], and it has been widely used for wetland studies in the past 50 years [13
]. The most commonly used wetland remote sensing data are Landsat multispectral imagery, which as of Landsat 8 is 30 m in resolution for most bands, repeats its cycle every 16 days, includes 11 bands, and is freely available [13
]. Researchers have achieved accurate wetland identification results by incorporating Landsat imagery, specifically from the Landsat 8 Operational Land Imagery (OLI) satellite (e.g., [14
]). At this spatial resolution, however, it is unlikely that approaches could identify exact wetland locations obtained through field delineations by experts, but it is possible that such data could rule out large areas as not being likely to include wetlands.
Machine learning techniques commonly applied in wetland studies include traditional techniques such as Maximum Likelihood classification (e.g., [14
]) and newer techniques such as random forest classification. Random forest is an ensemble classifier that produces many classification and regression-like trees. Each tree is generated from different bootstrapped samples of training data, and input variables are randomly selected for generating trees [20
]. Random forest has become a widely used method for its ability to handle high dimensional data, incorporate both continuous and categorical data, and produce descriptive variable importance measures (e.g., [21
]). Researchers have used random forest to integrate multispectral imagery, topography, and other ancillary geospatial data for wetland identification (e.g., [12
]). Furthermore, studies show that random forest can produce higher classification accuracies than traditional techniques for land cover classification (e.g., [17
While past studies have demonstrated the potential for remote sensing and machine learning frameworks to identify wetlands, they share common elements that limit implementation of their proposed algorithms as a national-scale tool to support environmental planning efforts like those needed by DOTs in the LEDPA process. These limitations include (i) failure to automate workflows making the classification task time-consuming and difficult to replicate, (ii) failure to leverage freely available geospatial data outside of just remote sensing imagery, (iii) inclusion of costly remote sensing data not always available to support environmental planning, and (iv) reliance on software not typically available or used by state DOTs. The objective of this study was to design a methodology that addresses these shortcomings and to implement the methodology as a wetland screening tool in a widely used commercial geographic information system (GIS). As a wetland screening tool, the methodology emphasizes minimizing false negative predictions (i.e., cases of wetland omission) while also maintaining a reasonable overall wetland accuracy. We obtained verified wetland delineations created by experts for a large region (33 km2) in the coastal plain of Virginia to evaluate the methodology. We trained a classification model on a subset of the verified wetland delineation dataset and then tested the classification model using a separate subset of the verified wetland delineation dataset to evaluate its accuracy. Finally, we compared the performance of the wetland tool against the NWI given that this is the standard wetland inventory available for supporting early environmental planning efforts in the absence of more detailed, costly, and time-consuming field surveys.
This study presents a methodology designed to screen for wetlands over a large geographic region using nationally available geospatial data as input and random forest classification. The methodology was motivated by the desire to streamline environmental permitting for transportation corridor projects over large regions. By using a tool to screen for potential wetland areas, field surveying efforts could be focused to areas that are likely to contain wetlands. The methodology was implemented as an automated workflow in a commercially available geographic information system (GIS) software commonly used by DOTs. The tool was applied to identify potential wetland locations for a region in the coastal plain of Virginia, USA. The preliminary implementation of this workflow was evaluated against professionally conducted field surveys and results were compared to the commonly used NWI dataset as a benchmark for accuracy.
Results showed that, when compared to the NWI, the wetland screening methodology produced a significantly lower false negative rate (22.6% vs. 69.3%) and a higher kappa statistic (0.46 vs. 0.34). From this, we conclude that the methodology was able to capture many wetlands missed by NWI. However, this improvement in false negative predictions did result in a higher false positive rate (24.3% vs. 1.3%) and, because the study area has more non-wetland area, a slightly lower overall accuracy (76.1% vs. 80.5%). From this, we conclude that, while the method identifies significantly more true wetlands than the NWI, it comes at the cost of a slight reduction in overall prediction accuracy. This was largely by design, however, as the method purposely avoids false positives because such errors would result in missing wetlands and be costly in environmental planning. False negative errors are less costly because they could be field verified by targeted, on-the-ground surveys by experts to fine tune the wetland delineation. With additional wetland areas mapped and verified by experts, the method could be tested for other regions given that, by design, it only relies on nationally available input datasets.
While successful as a screening tool, the ultimate goal should be to achieve the highest possible overall classification accuracy at a spatial scale relevant for environmental planning purposes. Doing so would move the approach from being a wetland screening tool to a wetland mapping tool, opening up additional potential uses. For example, because the approach is largely automated now, future work could also investigate the potential for deriving time varying wetland maps using Landsat imagery and a dynamic input to capture change in wetland patterns across regions. Making this transition will likely require much higher resolution data and more data for training classification algorithms. The wetland detection rate produced by the current screening algorithm, which again only makes use of nationally available geospatial data in order to be widely applicable, is encouraging for creating approaches able to identify wetlands at a high resolution. Future work should investigate the role of higher resolution input data, like LiDAR, and alternative parameterizations of the classification algorithm to improve wetlands predictions.