1. Introduction and Related Work
Stream width and stream cross sectional area are some of the most important metrics for hydrologic and sediment transport modeling and is a reflection of flow magnitude and sediment load capacity [
1,
2,
3,
4,
5]. Global hydrological maps do not yet include complete hydrological datasets in the unglaciated regions of Antarctica such as the McMurdo Dry Valleys (MDVs) because their ephemeral nature makes streams difficult to detect. In the MDVs, channel runoff and sediment load capacity are a reflection of glacial ablation magnitudes, therefore it is important to extract stream extents to better understand spatial patterns of glacial ablation [
6]. Patterns of channel dimension and occurrence across the MDV landscape will provide information on regional glacial runoff, hence ablation magnitudes. The MDVs were once considered stable, but due to an observed increase and leveling off of solar radiation, it is now making a shift to Arctic and alpine-style permafrost and glacial melt [
7]. This has caused an increase in surficial warming on the glaciers and permafrost, leading to an increase in flood frequency and magnitude as well as the thawing of permafrost soils [
7]. The thawing of permafrost decreases the consolidation of sediments, allowing for easier mobilization of sediments and landscape adjustments [
8]. In order to obtain glacial runoff estimates and fluvial geomorphic changes across a large scale, the detection of stream boundaries and their extents are necessary. The detection of streams and their boundaries over other available temporal datasets in the MDVs will aid in the detection of fluvial geomorphic shifts spatially and supplement global hydrological datasets.
Detection of stream boundaries in hyper-arid regions is a difficult task, especially over large-scale regions because stream beds are periodically dry. Bankfull events typically recur every 1.5 to 2 years, but are even less frequent in arid climates than more humid regions [
4,
9]. Therefore, streams with very little activity may be narrow and not exhibit a clear distinction between active channel (normal high water lines) and bankfull extents (where the stream meets the floodplain), as a result we categorize these channel boundaries in the same class in this paper (
Figure 1) [
2,
3]. Collecting sufficient stream boundary data in the field is arduous even over smaller, more hydrologically active areas, therefore remote sensing methods are required. Elevation and slope derived from lidar have been used for stream location and/or stream boundary detection in the past [
1,
3,
10,
11,
12,
13,
14,
15,
16,
17,
18]. Digital elevation models (DEMs) have been commonly used to create flow accumulation models for both channel detection and centerline extraction for runoff simulation [
10,
11]. Stream boundaries have previously been estimated with topographic cues from lidar returns and/or imagery [
3,
12,
17,
18,
19]. The stream centerline extraction and detection of the area of stream inundation and saturated soils of streams have also been accomplished using both elevation and intensity returns [
14,
15,
16,
20,
21]. In this study, we consider an alternative stream boundary detection method using topographic and surface reflectance indicators as features from lidar-derived rasters for hyper-arid regions such as the MDVs. These topographic and surface reflectance features are utilized in a U-Net architecture for the detection of stream boundaries in the 770 km
2 of glacier-free regions within Taylor Valley; one of the centrally located valleys the MDV. Based on a literature review, the use of deep learning for detection of small stream (∼10–150 m wide) extents in hyper-arid regions on a multi-basin scale has not been reported.
Existing remote sensing techniques are not suitable for large-scale, hyper-arid, polar regions such as the MDVs. MDV streams are ephemeral and runoff is controlled by local climatic changes that can vary spatially and on an hourly basis, therefore a stream may not be detected during data collection. Considering this, conventional methods that are reliant on waterbody extraction are not sufficient for our purposes. Existing remote sensing methods for channel extraction require a DEM and/or imagery where topographic cues are used to delineate channel extents [
3,
12,
13,
17].
Current stream boundary extraction methods utilize cross sections that are equally distanced and tangent to the stream centerline and/or require additional precipitation data or imagery [
3,
12,
13,
17,
18]. In Li et al. [
22] small rivers (>30 m wide) were extracted using spectral indices from Sentinel-2 imagery and DEM data. In hyper-arid regions such as the MDVs, even visual inspection of imagery may not be sufficient for manual detection of streams because stream flow, wetted soil, and ice change temporally and spatially. The main source of runoff in Taylor Valley is glacially sourced, and runoff magnitudes vary widely across the landscape, making stream boundary detection methods that rely on precipitation data unsuitable. Cross section-based methods must extrapolate the boundaries between cross sections, which can introduce large errors if the spacing between cross sections are too large. Accurately identifying stream boundaries for large regions with multiple complex watersheds is considered computationally expensive, as cross sections must be adequately proximal and wide enough across the channel length to accurately represent stream extents along the stream channel.
When cross section-based methods are used on large-scale regions with a wide range of stream geometries, they may require extensive editing. Cross section-based stream boundary detection methods typically are better suited for straight stream geometries compared to meandering and braided geometries. These algorithms may require manual editing of cross sections when it comes to streams that are meandering, multithreaded, at an intersection, or in close proximity to another stream (
Figure 2b–d) [
3,
17]. Braided streams and streams that are close in proximity or near to a junction will exhibit multiple terraces and inflection points along the cross section, causing the algorithm to possibly misclassify stream boundaries by selecting the incorrect pair of inflection points (
Figure 2b,d) [
3]. Similarly, very tightly meandering streams at inner bends may have regions where the tangent to the stream centerline may cause a single cross section to intersect upstream and downstream rather than intersect the stream boundaries tangentially (
Figure 2c) [
3]. Instances such as these must be manually edited, which can become increasingly time consuming for extensive networks on a multi-basin scale [
3]. Ideally, a stream boundary segmentation algorithm would be able to classify stream boundaries even along tightly meandering stream channels without manual editing of cross sections. Stream boundary extraction in Taylor Valley would require extensive manual editing with current methods as it has a wide range of stream geometries. In this study, we delineate stream boundaries across a wide array of stream geometries across the full extent of Taylor Valley using airborne lidar (ALS) collected in 2014. This study explores a stream boundary detection method that does not require cross sections, precipitation data, or imagery and can handle multiple basins with streams that are ephemeral in nature, such as Taylor Valley.
Convolutional neural networks (CNNs) have demonstrated excellent results in pattern recognition across scales in remote sensing [
23,
24,
25]. A CNN is a type of neural network that can assign importance or weights and biases to different objects in an image in order to distinguish two or more classes [
26]. Analogous to a human’s visual cortex, a CNN uses hidden layers to pass results successively to build a mathematical model for object identification [
27]. CNNs have the ability to learn and detect objects at different localities and scales within an image, making it suitable for land surface segmentation [
28,
29,
30]. Because of this ability, we use a CNN to detect dry stream networks and their stream boundaries in Taylor Valley, eliminating the need for cross sections; therefore set distanced interpolation. U-Net has been shown to be highly successful in pattern recognition and land surface segmentation using fewer training samples than other CNN algorithms [
31,
32]. Additionally, studies within remote sensing have shown that U-Net models create smoother and more connected features, which is important for hydrologic applications [
21]. For these reasons, U-Net was selected for semi-automatic detection of streams and their stream boundaries in Taylor Valley.
The aim of this study is to define a semi-automatic method for stream boundary detection using high-resolution, lidar-derived rasters as features in a U-Net CNN architecture along stream channels in Taylor Valley. Our study only aims to detect stream boundaries that are visually distinguishable. Furthermore, this study does not try to discern stream boundary turning points from those caused by undercutting of the underlying strata. Provided that stream flow in the MDVs is unconventionally periodic and that there is limited field data, we are constrained to work with some level of ambiguity in the correct labeling of stream boundaries. Manual segmentation and prediction of stream boundaries is particularly difficult for less active, unmonitored streams. Given the extreme ephemeral nature of streams in the MDV, we propose a new method for detection of stream boundaries. Topographic, geometric, and lidar surface reflectivity indicators derived from airborne lidar were utilized for model training and prediction. To our knowledge, a semi-automatic stream boundary detection method that has been adapted for hyper-arid regions like the MDVs has not been attempted before on a multi-basin scale. This will aid future studies that would like to automatically classify stream boundaries in hydrologically similar regions for simulation of fluvial processes and estimation of geomorphic rates of change. In Taylor Valley specifically, detection of stream boundaries on available, temporally-spaced DEMs will allow researchers to detect and simulate changes within the different climatic zones to observe correlation between stream channel morphology and climatic perturbations.
2. Study Area
The study area is located in Taylor Valley in the MDVs in East Antarctica (
Figure 3). Taylor Valley is centrally located in the MDVs and encapsulated by the Asgard Mountains, the Kurkri Hills, and the Ross Sea. Taylor Valley has one terminal glacier (Taylor Glacier) that flows from the Polar Plateau and 15 smaller alpine glaciers that flow from the side valleys. The Transantarctic Mountains to the west obstruct ice flow from the East Antarctic Ice Sheet to the valley producing a severe rain shadow. The valley is approximately 70 km long and 12 km wide. Elevation ranges from 0 to 2000 m and slopes are generally gentle to moderate in regions where stream flow exists. The valley primarily consists of glacial tills and fluviolacustrine deposits and at higher elevations there is exposed bedrock. The grounds are underlain by permafrost with active layers that are 45–70 cm thick near the coastal regions, 20–45 cm thick 60 km inland, and <20 cm along the polar plateau during the summer months [
33].
The MDVs are one of the most arid regions of the world with sublimation (∼35 cm/year) exceeding what little precipitation (<10 cm/year) the valleys receive [
34,
35,
36]. It is continuously dark during the winter months and temperatures can get as low as −60
C [
37]. During the 6–12 weeks of summer, temperatures can reach 5
C [
37]. Temperatures typically oscillate above and below the melting point of water throughout the summer days.
The MDVs have three geomorphic zones called microclimate zones. These zones are the upper stable zone (USZ), the inland mixing zone (IMZ), and the coastal thaw zone (CTZ) and are defined by summer temperature, soil moisture, and relative humidity. The sporadic runoff is almost entirely glacially sourced and flows to the perennially frozen closed-basin lakes [
38,
39]. The lack of vegetation allows streams with unconsolidated sediment to transport large sediment loads during times of flooding [
38]. Due to the stream’s ephemeral nature, runoff may or may not exist at a given point in time. This makes the detection of the streams and its boundaries very difficult with existing methods. Several streams in Taylor Valley have already experienced extensive incision and bank undercutting due to climatic changes [
6,
7]. Although stream runoff is quite sporadic, runoff is predicted to become more frequent and will in turn further shape the valleys [
40].
3. Methods
The creation of the training data, labeling, U-Net training, and prediction are depicted in
Figure 4 and described in the subsections below.
3.1. Data Preparation
Airborne lidar data collected in 2014 by the National Center for Airborne Laser Mapping (NCALM) on the ice-free regions of Taylor Valley (∼560 km
2) with a point density of 2.7 pts/m
2 was utilized for stream boundary detection [
41]. Digital imagery with 5–20 cm resolution collected simultaneous to the lidar survey as well as elevation and slope rasters were used for manual segmentation of stream boundaries to compile training data. The lidar surveys include the collection of ground point locations in space and raw intensity of lidar returns. Natural neighbor interpolation was used to convert the lidar dataset into a digital elevation model (DEM) and an intensity raster with spatial resolutions of 1 × 1 m
2. A slope and flow accumulation model were calculated from the DEM using the slope and multi-flow direction (MFD) algorithms implemented in the ArcGIS software [
42]. The DEM, intensity, slope, and flow accumulation rasters were then utilized as input features into the U-Net algorithm for training to find the best model for stream boundary prediction. Only the top performing models will be discussed here, but all performance metrics are available in
Table A1.
The test set locations were manually selected, divided into raster tiles, normalized, and then manually digitized for ground truth. The selected feature rasters were broken up into 217 well-distributed 300 × 300 m raster tiles (covering ∼1% of the study area) throughout Taylor Valley. The test set includes regions with different channel geometries (straight, sinuous, meandering, and braided/intersecting streams) and stream sizes as well as those that lack streams with various topography. Stream test set locations were identified and selected based on evidence of stream existence where there was visibly distinguishable boundaries. Stream existence was determined using stream centerlines of major streams mapped by the Long-Term Ecological Research (LTER) and imagery exhibiting linear patterns of water, ice, or wetted ground downstream of glaciers and overlapped by stream centerlines extracted from MFD flow accumulation. Evidence of stream existence and imagery were used to avoid misclassifying turning points associated with other topographic features such as underlying strata.
The minimum and maximum within each tile for elevation, slope, intensity, and flow accumulation were calculated. To normalize the data the pixel values of each tile were subtracted from the tile minimum and then divided by the pixel value range. Normalization of slope and intensity by the minimum and maximum values help in regions where indicators are not as noticeable. This includes regions where stream boundaries exist, but are more easily detected with accentuation of the slope. Flow accumulation normalization was helpful for regions that had subtle patterns of ice or wetted ground. After normalization, the bit depth output was 8-bit unsigned. The resulting normalized elevation, slope, intensity, and flow accumulation, airborne digital imagery, and profiles were used for manually locating streams and digitizing their stream boundaries. Two classes were defined in the raster: “stream areas” and “non-stream areas”.
Stream boundary indicators include: inflection points, a change in sediment texture, and staining of the floodplain as diagrammed in
Figure 1 [
2]. Other stream indicators in the MDV are linear patterns of snow, ice, water, or wetted ground. Regions with concave profiles with topographic and visual indicators such as these were utilized to digitize stream boundaries. Topographic profiles, elevation, slope, and flow accumulation were used for manual detection of the stream boundaries using topographic cues such as inflection points and linear depressions (
Figure 5a–c). The digital imagery, intensity, flow accumulation, and a hillshade were supplementary for manual stream identification (
Figure 5d–f). Digital imagery, intensity, and flow accumulation can only provide visual information about stream location if either water, snow, or wetted soil exists in the channel, but cannot provide enough information for stream boundary detection.
It should be noted that CNN training relies on correctly labeled stream boundaries in the training data. Stream boundary segmentation is limited by the ability to manually differentiate stream boundary turning points from other topographic turning points such as underlying strata or outcrops. Any inflection points not distinguishable from stream boundaries such as underlying changes in strata in close proximity to stream boundaries can cause manual misclassification. Therefore, only samples with distinguishable stream boundaries such as breaks in slope were included. Prediction in turn will be focused on correctly identifying stream networks with distinguishable stream boundaries.
3.2. U-Net
U-Net is commonly used for raster segmentation. Normally, a RGB or greyscale image and its user classified ground truth are used as inputs; the output is an image where each pixel is assigned to a class [
43]. This is accomplished by designing an architecture that incorporates a stacked convolutional layer and padding framework to ensure preservation of spatial resolution [
44]. Furthermore, U-Net takes advantage of an encoder-decoder structure where the encoder learns abstract low-level features while the decoder develops high-level features through upsampling [
45]. Essentially, the encoder decreases the spatial resolution of the image in order to increase computational efficiency and more easily differentiate different classes [
46]. The decoder restores the image to it’s full-resolution to recover spatial information [
46].
3.3. Training
The open source Landcover Dronedeploy tool was used for training and prediction of stream boundaries in Taylor Valley [
47]. This code was implemented with fastai and PyTorch version 1.1.0 and uses a pre-trained ResNet-18 encoder model within the U-Net semantic segmentation framework. The various features and feature combinations with their corresponding ground truth were input into U-Net. A 50/20/30 split of the 217 samples was chosen for training, validation, and testing respectively and were used throughout the training. This split was chosen because the test set had to be large enough to decipher any trends in stream boundary segmentation accuracy (accuracy trends spatially and for stream geometry type), while not compromising the number of training data. The test set was manually selected to fairly represent the accuracies of each stream geometry and to get the right class balance. The training data was augmented with three 90
rotations. Prediction and fitting were carried out using the freely available Graphics Processing Units (GPUs) within Google Colaboratory (Nvidia K80s, T4s, P4s, and P100s) [
48]. Depending on the available GPUs in Google Colaboratory, the total runtime for training and testing the algorithm was between four and six hours. The final tuned hyper-parameters included an epoch of 200, a batch size of 16, a learning rate of 1 × 10
−5, and weight decay of 1 × 10
−4. These parameters were found empirically based on the observed desirable loss curve characteristics and the speed of training, which include a smoothly decreasing loss as a function of epoch towards a low plateau. The top four performing models were saved for segmentation of the entire valley. A performance assessment of stream boundary segmentation was carried out on various combinations of the input features. Prediction performance was measured based on the proportion of correctly classified pixels. Performance metrics include recall, precision, and F1 score as they are commonly used metrics for evaluating a binary segmentation [
49,
50].
where:
represents true positives,
represents false positives,
represents false negatives,
P is the precision,
R is the recall and
is the F1 score.
The precision is representative of stream segmentation performance, while the recall represents the segmentation performance of non-stream regions. Performance was analyzed based on spatial location within Taylor Valley and by stream geometry (straight, sinuous, meandering, and those with braided or intersecting streams).
3.4. Prediction
Following model training, the full valley was broken up into 852 1500 × 1500 m2 overlapping tiles for prediction using the models with the four highest performing feature combinations: elevation, slope, intensity, and flow accumulation. Tiles were overlapped because pixels near the edges are susceptible to more error and may not include enough information to accurately predict stream boundaries. Tile sizes smaller than 1500 × 1500 m2 often classified the outer boundaries as “stream areas” and misclassified stream bottoms as “non-stream”. This resulted due to tiles only including a section of a stream and therefore did not provide enough information for the algorithm to accurately predict. Increasing the tile size from 300 × 300 m2 to 1500 × 1500 m2 provided a larger picture for the model so it could predict bankfull boundaries with very little noise across the full valley. Normalization across a larger area may cause a larger deviation in pixel values, which could cause misclassification of less active streams as ground because a larger tile will exhibit more gentle gradient changes with normalization compared to a smaller tile. The total runtime for prediction across the full valley after the model is trained was ∼15 min. A subset of the data was selected and digitized for a visual comparison of the predictions from the top two performing models.
4. Results
Here, we discuss the U-Net segmentation performance of stream boundaries on a valley-wide scale in Taylor Valley. Performance results from the different feature combinations are shown in
Table 1. Training each of the models with single features outperformed the models that had two or more features. All four of these features were found to perform well on the test set with the precision, recall, and F1 score ranging from
to
,
to
, and
to
, respectively (
Table 1). Elevation and slope had the highest performances in stream boundary prediction over the test set and had similar accuracy metrics.
Spatial prediction accuracy on the test set was nearly identical for elevation and slope, therefore the elevation-based model is the only one depicted. The prediction of stream boundaries on the test set across Taylor Valley show that coastal regions typically have higher performance with F1 scores of 0.81–0.99 (
Figure 6). Other regions across the valley had mixed F1 scores of 0.70–0.98, with a decreased score approaching slightly inland. The average for true positives was 0.71, while for true negatives it was 0.93.
Figure 6a–f show the general trend in Taylor Valley where underprediction is more common than overprediction. Regions that did not perform as well did not have as distinct breaking points in slope at the stream boundaries.
When prediction accuracy was compared to the three microclimate zones (CTZ, IMZ, and USZ), the CTZ acheived the most confident results (
Figure 7). However, upon inspecting outcrop geology and stream boundary prediction accuracy, there seemed to be some correlation of high stream boundary prediction to outcrop geology. The glacial till region near the coast received the highest prediction accuracies on average. The test regions that had lower stream boundary prediction were typically located at or near bedrock (in grey and green) (
Figure 7).
The performance of stream boundary prediction was assessed across the four feature classes and over different stream types (straight, sinuous, meandering, and braided or intersecting streams) (
Figure 8). The elevation and slope features received the highest F1 scores and had a lower range of F1 scores across the different stream types (
Figure 8). Straight streams across all of the features performed the worst while meandering streams typically performed the best followed by intersecting and sinuous streams (
Figure 8). In contrast, cross section-based methods perform better on single-channel streams in comparison to braided or tightly meandering channels. The lidar return intensity and flow accumulation features received the lowest precision, recall, and F1 score, therefore they are not discussed here in length, rather elevation and slope will be our main focus.
The manually labeled test images and the stream boundary prediction agree with each other (
Figure 6). The prediction over Commonwealth Stream located between Commonwealth glacier and the Ross Sea show that elevation and slope are comparable to the manually digitized ground truth (
Figure 9). Elevation was a little less noisy and did not over-predict the stream boundaries compared to slope (
Figure 9). Tributaries were more difficult to detect accurately than the main branch (
Figure 9).
5. Discussion
Overall, both the elevation and slope features had the highest performance for stream boundary segmentation. The intensity and flow accumulation features do not provide topographic information needed to accurately identify stream boundaries as indicated by features such as inflection points. These two features were meant to be supplementary support features to slope and elevation, however increasing the dimensionality of attributes with a very small test set increases the difficulty of convergence [
51]. In more hydrologically active regions, models including intensity may have higher performance in stream boundary prediction because of the high reflectance of water. Our study area is a hyper-arid region with typically dry or wetted soil, therefore, there may be more homogeneity in intensity values than that of a hydrologically active one.
The elevation and slope models performed similarly overall and across different stream geometries (
Figure 8). Elevation or slope are the only features needed for prediction of stream boundaries, reducing computational time by reducing the dimensionality of the training data and the need to compute the other lidar derived features. Coastal regions had the highest stream boundary prediction performances. This could be due to the warmer, wetter climate on the coast and the higher concentration of sediments meaning more hydrological activity and highly evolved stream channels (
Figure 7). Streams across the valley tend towards underprediction rather than overprediction (
Figure 7). This is likely because stream boundary breaks are gently sloping and therefore smaller channels or tributaries are a lot more difficult to detect. As shown in
Figure 9 (Commonwealth Stream), the slope-based model overpredicted on its tributary boundaries, while the elevation-based model did not.
Across all of the models, straight streams performed the worst while meandering performed the best followed by sinuous and intersecting streams, in contrast to current methods. This is very likely due to the shallowness of straight streams, meaning breaks in slope that are not as distinguishable [
52]. Meandering streams have had the time to develop more distinguishable stream boundaries [
52]. Intersecting or braided streams have more stream flow at the junction due to water accumulation at the junction and therefore also have more detectable boundaries.
Some of the limitations of this study are reliance on correctly labeled data by the user and utilizing training data that only include streams that exhibit well-established stream boundaries. Inflection points due to underlying strata was not accounted for in this study due to lack of field data. Straight channels and newly formed streams are just a few examples of streams that will likely not be identified as well as meandering streams due to their undeveloped stream boundaries. Even to the human observer, stream boundaries may not be apparent on shallower streams with small gradient changes at their stream boundaries. Unlike other methods, the proposed methods are not limited to less sinuous or non-bifurcating channels and can detect highly ephemeral streams.
While this study focuses on stream boundaries, future research of the MDVs should attempt to identify stream boundaries of water, wetted soil, and/or snow and ice to identify streams that are more active, entailing more rapid deglaciation. This method could also be tested on other hyper-arid or hydraulically active regions of the world. This study has only explored prediction performances of stream boundaries using the U-Net algorithm. Future studies could compare different algorithms or feature classes with additional training data. Segmentation results could be used to isolate fluvial regions for change detection within the stream extent and to simulate fluvial processes. This new data will supplement the global hydrography datasets as complete hydrological datasets are not yet available for the MDVs. A more complete hydrological dataset will lead to a more complete representation of fluvial responses to climate change over a multi-basin scale.