Development of a Hydrologic Connectivity Dataset for SWAT Assessments in the US

: Model-based water quality assessments are an important informer of conservation and environmental policy in the U.S. The recently completed national scale Conservation Effects Assessment Project (CEAP) is being replicated using an improved model populated with new and higher resolution data. National assessments are particularly difﬁcult as models must operate with both a very large spatial extent (the contiguous U.S.) while maintaining a level of granularity required to capture important small scale processes. In this research, we developed datasets to describe the hydrologic connectivity at the U.S. Geological Survey (USGS) 12-digit Hydrologic Unit Code (HUC-12) level. Connectivity between 86,000 HUC-12s as provided by the Watershed Boundary Dataset (WBD) was evaluated and corrected. We also detailed a method to resolve the highly detailed National Hydrography Dataset (NHD) stream segments within each HUC-12 into vastly simpliﬁed representative channel schemes suitable for use in the recently developed Soil and Water Assessment Tool + (SWAT+) model. This representative channel approach strikes a balance between computational complexity and accurate representation of the hydrologic system. These data will be tested in the upcoming CEAP II national assessment. Until then, all the WBD corrections and NHDPlus representative channel data are provided via the web for other researchers to evaluate and utilize.


Introduction
Model based assessments of water quality and quantity are important tools in watershed and conservation planning. At the local level, water quality models are often used to develop watershed plans and Total Maximum Daily Loads (TMDL) for the U.S. Environmental Protection Agency (USEPA), state agencies, and local cooperators. These watershed-based programs seek to identify mitigation strategies to improve impaired waterbodies. Models are also used by the U.S. Department of Agriculture at the field and national levels to evaluate non-point source loads associated with agriculture and how existing or additional conservation practices or policies may alter those loads. The Congressionally-mandated Conservation Effect Assessment Project (CEAP) is an example of a current national effort where model based assessments are useful.
In 2002, conservation spending under the farm bill increased by 80% [1] and increased to $3.7 billion per year by 2008 [2]. The U.S. Department of Agriculture (USDA) scientists recently completed this national scale non-point source pollution and policy assessment [3,4]. The Conservation Effect Assessment Project (CEAP) framework includes a system of two models to evaluate the effect of cropland conservation on sediment and nutrient loads to U.S. waters. The Agricultural Policy/Environmental eXtender (APEX) [5] is used to simulate landscape processes on cropland and to predict edge-of-field pollutant loads. The Soil and Water Assessment Tool (SWAT) [6] is used to predict loads from other land uses such as urban, forest, grassland, and rangeland. SWAT is also used to predict the fate and transport of pollutants from all sources through streams, rivers, and reservoirs [3,4].
The Soil and Water Assessment Tool (SWAT) is a distributed hydrologic model which allows a basin to be subdivided into multiple subbasins to incorporate spatial detail. Previous national assessments associated with the CEAP effort utilized SWAT models with subbasins constructed to match existing 8-digit Hydrologic Units Codes (HUC-8). The Hydrologic Unit Model for the U.S. (HUMUS) [7], a previous national scale modeling effort using SWAT, was also conducted at this scale. For CEAP, within each HUC-8, all non-cultivated areas and all streams and reservoirs were simulated using SWAT. Edge of field APEX predictions for cropland were statistically weighted by an "expansion factor", an assignment of the number of hectares the sample was thought to represent, into a single pollutant load prediction for each HUC-8. This load was then subject to a simple delivery ratio calculation and passed to the SWAT model for routing. Additional information concerning this integration is given by [8,9].
Conservation Effect Assessment Project II is a next generation effort currently under way using the improved SWAT+ populated with more recent and higher resolution data. SWAT+ is a completely revised version of SWAT where hydrologic resource units, aquifers, channels, and reservoirs are spatial objects allowing greater flexibility in the spatial representation and connectivity of processes within a watershed. The CEAP II project is being conducted at a finer spatial resolution than CEAP, where subbasins are 12-digit HUCs as opposed to the previous 8-digit. Twelve-digit HUCs are available as part of the national seamless Watershed Boundary Dataset (WBD). These data represent approximately 40 times more detail than the previously used HUC-8 categorization and allow for more accurate representation of various land uses, water sources, and their interfaces. However; these data are still under active development and may contain connectivity errors which must be evaluated prior to use. APEX field scale simulations are also being incorporated at a finer spatial scale. Virtually all data used in CEAP II is being redeveloped at a finer scale, including management scheduling [10][11][12] and climate [13]. Soils, topography, and land use data were derived from updated sources, aggregated, and assessed at a finer level of detail.
One weakness of the previous national assessment was the use of a lumped delivery ratio to account for sediment and nutrient processing between the edge-of-field agricultural loads from APEX and the 8-digit streams simulated using SWAT. Low order streams comprise the majority of all streams and significantly modify the loads they transport. Nadeau et al. [14] found that headwater streams comprise 53% of the total stream length in the U.S. They concluded that headwater and downstream streams are best described as individual elements with an integrated hydrologic system. Alexander et al. [15] found that headwater streams contributed 70% of the flow and 65% of nitrogen loads to downstream segments. Clearly, the inclusion of low order streams into the SWAT+ is important and is preferable to the lumped delivery ratio previously used in CEAP.
The SWAT model has recently been enhanced with new capabilities that allow the landscape to be modeled with greater fidelity. Previous SWAT models represented only a single main channel within each subbasin, regardless of subbasin size. The enhanced version, SWAT+ [16], allows multiple channels within each subbasin, each with differing characteristics and connectivity. Given the finer spatial scale of the current CEAP II national assessment, the importance of headwater streams, and the new capabilities of SWAT+, the inclusion of low order streams into a national scale SWAT+ modeling framework is needed.
In this research we detail the development of connectivity datasets necessary to create a national 12-digit model which includes low order streams. The primary model discussed in the present research is the SWAT+ model. The APEX model is extensively used in CEAP II, but as a field scale model it does not require the connectivity data discussed herein. The first objective of this research was to evaluate the connectivity between individual HUC-12s in the WBD and make corrections as necessary. The second objective was to examine the National Hydrography Dataset (NHD) and summarize it to develop representative channels and their related connections within each HUC-12.

Hydrologic Unit Code (HUC)-12 Connectivity
The latest Watershed Boundary Dataset (WBD) as of March of 2015 was downloaded and used for this analysis. Although WBD data are periodically updated with new or corrected information, it is critical that these data be held as a static permanent snapshot of watershed boundaries for use in national assessments. All other model input data, (i.e., soils, climate, land use, and management) utilize these data, and would require reprocessing if HUC-12 boundaries were to change. Upon examination of these WBD data, errors in the connectivity between HUC-12 were evident in many areas. Hydrologic and Water Quality System (HAWQS) [17], another national scale SWAT modeling framework, also utilized these data and noted similar errors. The WDB data for the contiguous U.S. contained more than 86,000 HUC-12s, far too many to examine individually, thus, a set of automated programs were developed to identify potential errors and flag the HUC-12s in which they occur. Each flagged HUC-12 was subsequently examined manually using NHDPlus streams to identify and correct the hydrologic connections.
Several different types of errors in the WBD were noted. Drainage loops, rare in natural systems, were found in many areas of low topographic relief, especially along river floodplains. In these loops, HUC-12s eventually drain back to themselves after passing through other HUC-12s. All drainage loops were corrected as SWAT+ cannot process them. In the western U.S., where closed basins are common, many HUC-12s have no outside drainage and may have no delineated streams. Many closed basins receive contributions from upstream HUC-12s and terminate in dry lakebeds. Closed basins receiving flow from more than one other HUC-12 were manually examined and corrected as needed. Singular closed basins in areas where such basins should be rare were also examined and corrected. HUC-12s which drain to the ocean or terminal drainages were manually examined, the vast majority of these required no correction. HUC-12s with the incorrectly specified downstream drainage were screened by identifying HUC-12s which drain to nonadjacent HUC-12s. Most often, only a few HUC-12s were skipped, and the connection was reestablished downstream. The impact of this kind of error is minimal at the national scale. In several cases, downstream connections were very distant (hundreds of km) or the downstream HUC-12 number did not exist.
Connectivity errors were clustered in several locations ( Figure 1). Two clusters, located in Florida and Mississippi, were in areas of low topographic relief, making drainage difficult to establish. Another cluster in Michigan was in an area with clearly defined drainage, and the cause for these errors is unknown. Throughout the dataset there were some random errors that were not clustered, which were slightly more common in the west and southeast. Researchers associated with the HAWQS effort had previously identified 273 of these issues. In this research, an additional 410 corrections were made. These corrections have been provided to the WBD team for verification and potential inclusion in future releases.

HUC-12 Internal Channels and Connections
Connectivity between HUC-12s was established as previously described, but there is also a need to configure the connectivity between hydrologic features within each HUC-12. Typically, SWAT applications used Digital Elevation Models (DEMS) to identify subbasin boundaries and a single main channel. All water yield from the entire subbasin and that from upstream subbasins is routed through the main channel to the subbasin outlet. With SWAT+, it is now possible to have multiple channels within each subbasin, and these can be connected as needed to more accurately portray the flow path from edge-of-field to subbasin outlet. The most current national dataset describing low order streams in the U.S. is the National Hydrography Dataset Plus (NHDPlus) Version 2. The NHDPlus V2 is a multiagency product released in 2012 based on a snapshot of the NHD which contains a variety of additional valuable attributes including connectivity, accumulated drainage, and channel slopes. The NHDPlus V2 is a vast dataset containing more than 3 million stream segments in the contiguous U.S., making it difficult to process and apply. It also contains random topographic errors throughout the dataset [18], which must be considered. Huang et al. [18] identified pseudo nodes as the most common error. A pseudo node is where a single stream is needlessly broken into multiple entities without topological cause (i.e., not at a confluence or intersection with another hydrologic entity). Huang et al. [18] noted other errors such as disconnected segments (pseudo outlets), overlapping entities, and crossing entities. These errors make it more difficult to extract meaningful physiological parameters for use in SWAT+.
Despite containing errors, NHDPlus V2 data are still viable when assessing the connectivity of hydrologic objects with a HUC-12. Connectivity between HUC-12s is defined as described previously, not from NHDPlus V2. The overall effect of any single error in NHDPlus V2 data is limited to the HUC-12 in which it occurs. Random isolated errors in headwater and low-order channel properties are likely preferable to model configurations which lack these channels and instead rely on simple delivery ratio estimates.

HUC-12 Internal Channels and Connections
Connectivity between HUC-12s was established as previously described, but there is also a need to configure the connectivity between hydrologic features within each HUC-12. Typically, SWAT applications used Digital Elevation Models (DEMS) to identify subbasin boundaries and a single main channel. All water yield from the entire subbasin and that from upstream subbasins is routed through the main channel to the subbasin outlet. With SWAT+, it is now possible to have multiple channels within each subbasin, and these can be connected as needed to more accurately portray the flow path from edge-of-field to subbasin outlet. The most current national dataset describing low order streams in the U.S. is the National Hydrography Dataset Plus (NHDPlus) Version 2. The NHDPlus V2 is a multiagency product released in 2012 based on a snapshot of the NHD which contains a variety of additional valuable attributes including connectivity, accumulated drainage, and channel slopes. The NHDPlus V2 is a vast dataset containing more than 3 million stream segments in the contiguous U.S., making it difficult to process and apply. It also contains random topographic errors throughout the dataset [18], which must be considered. Huang et al. [18] identified pseudo nodes as the most common error. A pseudo node is where a single stream is needlessly broken into multiple entities without topological cause (i.e., not at a confluence or intersection with another hydrologic entity). Huang et al. [18] noted other errors such as disconnected segments (pseudo outlets), overlapping entities, and crossing entities. These errors make it more difficult to extract meaningful physiological parameters for use in SWAT+.
Despite containing errors, NHDPlus V2 data are still viable when assessing the connectivity of hydrologic objects with a HUC-12. Connectivity between HUC-12s is defined as described previously, not from NHDPlus V2. The overall effect of any single error in NHDPlus V2 data is limited to the HUC-12 in which it occurs. Random isolated errors in headwater and low-order channel properties are likely preferable to model configurations which lack these channels and instead rely on simple delivery ratio estimates.

National Hydrography Dataset (NHD) Preprocessing
The NHDPlus V2 data are publicly available and include a variety of useful information, including channel length and slope, which are important attributes. Each stream segment has a unique common identifier of an NHD flowline (comID) that can be used to link attributes from differing tables. The connectivity between stream segments, cumulative drainage area of all upstream segments, and the incremental drainage areas of each segment are included in the data.
Data are summarized by regions which approximate two digit HUC boundaries. Tabular data for each region was imported into a single set of master tables residing in Microsoft SQL Server 2014. The following tables were imported: CumulativeArea, DivFracMP, elevslope, flowline, HeadwaterNodeArea, MegaDiv, PlusARPointEvent, PlusFlow, PlusFlowAR, and PlusFlowlineVAA. It was necessary to use a single set of tables as opposed to regional tables as tabular data for a stream segment in one region were sometimes erroneously found in the tables from a different region. A total of 3,003,072 stream segments were imported.

Main Channel Definition
In previous versions of SWAT, a single main channel receives all water yield from that subbasin and any streamflow contributions from the upstream subbasin. SWAT+ is more flexible and allows multiple channels, each with unique properties, and does not require type classification. For the purposes of this research, we elected to classify channels as main or tributary and process them in different ways. Each of the 3 million NHDPlus V2 channels were subjected to this classification. To be classified as a main channel, a segment must have had a cumulative drainage area of at least 50% of the HUC-12 in which it resided for headwater HUCs. For non-headwater HUCs, the cumulative drainage area must have been larger than the HUC-12 in which the segment resided. A minimum length of main channel was enforced within each HUC-12 as 10% of the length of the side of a square with the same area as the HUC-12. Additional segments may have been attributed as main to meet this criterion. All other segments not attributed as main were considered tributaries. An example of this classification is given in Figure 2. The NHDPlus V2 data are publicly available and include a variety of useful information, including channel length and slope, which are important attributes. Each stream segment has a unique common identifier of an NHD flowline (comID) that can be used to link attributes from differing tables. The connectivity between stream segments, cumulative drainage area of all upstream segments, and the incremental drainage areas of each segment are included in the data.
Data are summarized by regions which approximate two digit HUC boundaries. Tabular data for each region was imported into a single set of master tables residing in Microsoft SQL Server 2014. The following tables were imported: CumulativeArea, DivFracMP, elevslope, flowline, HeadwaterNodeArea, MegaDiv, PlusARPointEvent, PlusFlow, PlusFlowAR, and PlusFlowlineVAA. It was necessary to use a single set of tables as opposed to regional tables as tabular data for a stream segment in one region were sometimes erroneously found in the tables from a different region. A total of 3,003,072 stream segments were imported.

Main Channel Definition
In previous versions of SWAT, a single main channel receives all water yield from that subbasin and any streamflow contributions from the upstream subbasin. SWAT+ is more flexible and allows multiple channels, each with unique properties, and does not require type classification. For the purposes of this research, we elected to classify channels as main or tributary and process them in different ways. Each of the 3 million NHDPlus V2 channels were subjected to this classification. To be classified as a main channel, a segment must have had a cumulative drainage area of at least 50% of the HUC-12 in which it resided for headwater HUCs. For non-headwater HUCs, the cumulative drainage area must have been larger than the HUC-12 in which the segment resided. A minimum length of main channel was enforced within each HUC-12 as 10% of the length of the side of a square with the same area as the HUC-12. Additional segments may have been attributed as main to meet this criterion. All other segments not attributed as main were considered tributaries. An example of this classification is given in Figure 2.

Pseudo Node Removal
Tributary channels within each HUC-12 were categorized, parametrized, and ultimately summarized by stream order. To develop stream parameters needed by SWAT+, such as width, depth, length, and slope, it was necessary to process NHDPlus V2 data in a way which ignores pseudo nodes. Figure 3 contains an illustration of three HUC-12s and all the individual NHDPlus V2 stream segments present. Pseudo nodes are visible throughout the stream network. A program was developed using Microsoft SQL server and Visual Basic which traces contiguous stream segments (referred to herein as a trace) in each HUC-12 by stream order while ignoring these pseudo nodes. The program began processing in a first order headwater stream segment, and traced downstream until it intersected a higher order stream or one attributed as a main channel. All segments comprising the trace were treated as a single channel. The drainage area for the trace and where it goes was also recorded. The slope for that channel was calculated as the average for all segments in the trace weighted by segment length. The effective length for first order streams was calculated utilizing the cumulative drainage area of each segment, the segment length, and the total drainage area of the entire trace. This effective length represents the average travel distance runoff to which the entire trace would be subject. The properties of this channel were written to a trace table in SQL Server for later summary; an excerpt of this table is given in Table 1. After tracing and parametrizing all first order streams, the process was repeated for second order streams beginning at the most upstream second order segments. Each subsequent stream order was processed in a similar fashion. The three HUC-12s described in Figure 3 and Table 1 contained 73 NHDPlus V2 segments which were reduced to 21 trace tributary channels and three main channels.

Pseudo Node Removal
Tributary channels within each HUC-12 were categorized, parametrized, and ultimately summarized by stream order. To develop stream parameters needed by SWAT+, such as width, depth, length, and slope, it was necessary to process NHDPlus V2 data in a way which ignores pseudo nodes. Figure 3 contains an illustration of three HUC-12s and all the individual NHDPlus V2 stream segments present. Pseudo nodes are visible throughout the stream network. A program was developed using Microsoft SQL server and Visual Basic which traces contiguous stream segments (referred to herein as a trace) in each HUC-12 by stream order while ignoring these pseudo nodes. The program began processing in a first order headwater stream segment, and traced downstream until it intersected a higher order stream or one attributed as a main channel. All segments comprising the trace were treated as a single channel. The drainage area for the trace and where it goes was also recorded. The slope for that channel was calculated as the average for all segments in the trace weighted by segment length. The effective length for first order streams was calculated utilizing the cumulative drainage area of each segment, the segment length, and the total drainage area of the entire trace. This effective length represents the average travel distance runoff to which the entire trace would be subject. The properties of this channel were written to a trace table in SQL Server for later summary; an excerpt of this table is given in Table 1. After tracing and parametrizing all first order streams, the process was repeated for second order streams beginning at the most upstream second order segments. Each subsequent stream order was processed in a similar fashion. The three HUC-12s described in Figure 3 and Table 1 contained 73 NHDPlus V2 segments which were reduced to 21 trace tributary channels and three main channels.

Representative Channels
There are two potential ways to utilize traced channel data in SWAT+. Each traced tributary could be considered a separate channel, each subbasin contains on average eight total channels (Table 1). In this case, specific predictions of flow and nutrient loads for each channel are produced, although

Representative Channels
There are two potential ways to utilize traced channel data in SWAT+. Each traced tributary could be considered a separate channel, each subbasin contains on average eight total channels (Table 1). In this case, specific predictions of flow and nutrient loads for each channel are produced, although uncertainty increased. Alternatively, representative channels can be developed. In this research, channels of the same classification (stream order or main) within each HUC-12 were grouped into a single representative channel. The representative channel was typical of the class from which it was derived. This simplification reduces the computational complexity in SWAT+ channel processes simulation. Table 2 contains representative channels for the traces listed in Table 1. Using this method, only two to three representative channels were required instead of eight. The total number of channels simulated in any given HUC-12 was limited to a main channel: first order and sometimes a second, third or fourth order channel. Channel width, depth, and slope for each representative channel were calculated as a drainage area weighted average of all channels of that class in a HUC-12. While the slope of all stream segments was included in the NHDPlus V2 dataset, channel width and depth were calculated using regression equations [19] based on field data for over 1300 sites across the U.S. The equations relating bankfull width and depth to the drainage area were integrated into the software used to process the NHDPlus V2 data. Other SWAT+ parameters including Mannings' n and hydrologic conductivity of the streambed were not given and must be assumed upon incorporation.
Each traced tributary channel drains to a different location, some drain to a higher stream order and some drain directly to the main. It is important that the total area drained by each channel type be maintained in the representative channel configuration. It would be inappropriate to force runoff from the entire HUC-12 area through the representative first order stream when much of the drainage area contributes directly to the main channel or a higher order stream. For example, in the enhanced configuration depicted in Figure 4, 20-30% of the drainage area contributes directly to a main. For each representative channel described in Table 2, we included the incremental drainage area which describes the portion of the HUC-12 that drains directly to that representative channel and which was not contributed from a lower order stream. In addition, we determined the quantity of representative-channel flow that should be passed to the next higher order representative channel ( Table 2). For example, in HUC-12 0604 (Table 2), flow from the representative first order channel was divided between the representative second order channel (66%) and the main channel (34%). All processing for the development of representative channels was also done using Microsoft SQL Server and Visual Basic. These data were summarized for stream order up to fifth although this is not shown in Table 2. These summaries provide all the critical data needed to parametrize channels and connections in the SWAT+ model. uncertainty increased. Alternatively, representative channels can be developed. In this research, channels of the same classification (stream order or main) within each HUC-12 were grouped into a single representative channel. The representative channel was typical of the class from which it was derived. This simplification reduces the computational complexity in SWAT+ channel processes simulation. Table 2 contains representative channels for the traces listed in Table 1. Using this method, only two to three representative channels were required instead of eight. The total number of channels simulated in any given HUC-12 was limited to a main channel: first order and sometimes a second, third or fourth order channel. Channel width, depth, and slope for each representative channel were calculated as a drainage area weighted average of all channels of that class in a HUC-12. While the slope of all stream segments was included in the NHDPlus V2 dataset, channel width and depth were calculated using regression equations [19] based on field data for over 1300 sites across the U.S. The equations relating bankfull width and depth to the drainage area were integrated into the software used to process the NHDPlus V2 data. Other SWAT+ parameters including Mannings' n and hydrologic conductivity of the streambed were not given and must be assumed upon incorporation.
Each traced tributary channel drains to a different location, some drain to a higher stream order and some drain directly to the main. It is important that the total area drained by each channel type be maintained in the representative channel configuration. It would be inappropriate to force runoff from the entire HUC-12 area through the representative first order stream when much of the drainage area contributes directly to the main channel or a higher order stream. For example, in the enhanced configuration depicted in Figure 4, 20-30% of the drainage area contributes directly to a main. For each representative channel described in Table 2, we included the incremental drainage area which describes the portion of the HUC-12 that drains directly to that representative channel and which was not contributed from a lower order stream. In addition, we determined the quantity of representative-channel flow that should be passed to the next higher order representative channel ( Table 2). For example, in HUC-12 0604 (Table 2), flow from the representative first order channel was divided between the representative second order channel (66%) and the main channel (34%). All processing for the development of representative channels was also done using Microsoft SQL Server and Visual Basic. These data were summarized for stream order up to fifth although this is not shown in Table 2. These summaries provide all the critical data needed to parametrize channels and connections in the SWAT+ model.    Table 2. Representative channel characteristics for enhanced SWAT+ configuration as depicted in Figure 4.

Summary and Conclusions
Models used at a national scale must operate with a very large spatial extent (the contiguous U.S.) while maintaining a level of granularity required to capture important small-scale processes. Previous versions of SWAT required a very fine subbasin delineation to simulate processes occurring in headwater streams, such a fine delineation would make a national scale model unfeasible. A detailed national assessment is now possible due to the enhanced connectivity of spatial objects in SWAT+. The hydrologic connectivity both within and between HUC-12s was analyzed to support the upcoming CEAP II national assessment. The WBD dataset was corrected as needed which greatly improved the overall value of the dataset for SWAT+ assessments. The NHDplus dataset was simplified using a representative channel approach yielding a reduced level of complexity adequate for SWAT+ input.
The representative channel approach detailed herein strikes a balance between computational complexity and accurate representation of the hydrologic system. Models are simplifications of reality and will always contain uncertainty. Although it is possible to utilize all 3 million NHDPlus channel segments, given the uncertainty in the properties of each segment, there may be little advantage in doing so. The simulation of many small channels as opposed to fewer longer channels certainly results in additional computational requirements. The SWAT+ model has not been tested with very small (<30 m) channels and may be problematic as SWAT channel processes were not developed to be short finite elements. Additional model testing could be used to refine the optimal level of channel aggregation. This representative channel approach will be tested in the upcoming CEAP II national assessment. Until then, all the WBD corrections and NHDPlus V2 representative channel data are provided via the web for other researchers to evaluate and utilize. Like other datasets developed for national assessments, these data may continue to evolve and improve as deficiencies are identified and new information becomes available. A more detailed NHDPlus High Resolution (HR) dataset is currently being developed by the USGS and partners. These data may allow the parameterization of even smaller headwater channels for SWAT+ assessments. This representative approach is also being considered for future CEAP II simulation of gullies and smaller wet-weather streams. With the full consideration of the entire flow path from edge-of-field, through headwater streams, main channels, rivers, and reservoirs to the ocean being considered, agricultural loads at the national scale can be more accurately assessed.