CO-RIP : A Riparian Vegetation and Corridor Extent Dataset for Colorado River Basin Streams and Rivers

Here we present “CO-RIP”, a novel spatial dataset delineating riparian corridors and riparian vegetation along large streams and rivers in the United States (US) portion of the Colorado River Basin. The consistent delineation of riparian areas across large areas using remote sensing has been a historically complicated process partially due to differing definitions in the scientific and management communities regarding what a “riparian corridor” or “riparian vegetation” represents. We use valley-bottoms to define the riparian corridor and establish a riparian vegetation definition interpretable from aerial imagery for efficient, consistent, and broad-scale mapping. Riparian vegetation presence and absence data were collected using a systematic, flexible image interpretation process applicable wherever high resolution imagery is available. We implemented a two-step approach using existing valley bottom delineation methods and random forests classification models that integrate Landsat spectral information to delineate riparian corridors and vegetation across the 12 ecoregions of the Colorado River Basin. Riparian vegetation model accuracy was generally strong (median kappa of 0.80), however it varied across ecoregions (kappa range of 0.42–0.90). We offer suggestions for improvement in our current image interpretation and modelling frameworks, particularly encouraging additional research in mapping riparian vegetation in moist coniferous forest and deep canyon environments. The CO-RIP dataset created through this research is publicly available and can be utilized in a wide range of ecological applications.


Introduction
The Colorado River is one of the most important water resources for people, wildlife, and agriculture in the southwestern United States (U.S).Its headwaters begin at 2743 m a.s.l. at La Poudre Pass in Colorado and, under natural flow regimes, empties into the Gulf of California 2333 km downstream.Ephemeral, seasonal, and persistent riparian habitats are found throughout the Colorado River Basin, which hosts some of the most important vegetative communities for wildlife species in the predominantly arid landscape [1].Riparian corridors are associated with numerous ecosystem services, including water and sediment flow dynamics [2,3], water quality [3], habitat, forage and fiber for wildlife, maintenance of biodiversity [3,4], and recreation and aesthetics [4].Riparian areas have rich floral and faunal diversity [2], with many terrestrial animal species reliant on riparian corridors [4] for the provisioning of food, water, shelter, and corridors for species migration [4][5][6][7].Still, riparian areas represent a relatively small percentage of the total land cover of the western US [4], are intensively disturbed [2,8], and are often considered to be at-risk ecosystems in semi-arid regions [4,9] due to the effects of urbanization [10], agriculture [11], and modified water flow regimes [12].The spatial distribution of riparian areas is often inconsistently reported, as defining the extent of a riparian corridor or determining what composes riparian versus non-riparian vegetation remains an ongoing debate.Currently, there is no universal definition of "riparian vegetation" or a "riparian corridor."Riparian vegetation is only vaguely represented at the landscape scale in existing US land cover datasets, such as the National Land Cover Database [13,14] and LANDFIRE existing vegetation layers [15].
The need to create and improve upon existing riparian area datasets has been shown to be a key scientific management priority with the creation of US based initiatives to improve mapping of riparian vegetation types, such as that occurring in the Southwest ReGAP Program [16] in the United States and in international initiatives, such as the Copernicus Program's "Riparian Zones" [17] layer in Europe.In the United States, a paucity of explicit information on the location and extent of riparian corridors remains, partially because existing datasets do not include hydro-geomorphological data, such as the extent of valley bottoms, to delineate riparian cover [4,15].A detailed map of riparian corridors is expressly needed in the Colorado River Basin, where the social-ecological significance and the increasing vulnerability of riparian systems have led to several management and conservation initiatives.Two examples include The Nature Conservancy's Colorado River Program, which manages 15 restoration and conservation projects in the basin, and the Bureau of Reclamation's Lower Colorado River Multi-Species Conservation Program (LCR-MSCP), which manages riparian habitat for the recovery of listed species including the endangered Southwestern willow flycatcher (Empidonax traillii extimus) [18][19][20].Comprehensive information concerning the location and extent of riparian vegetation can support these and similar habitat conservation and management actions across the basin.
At local and regional scales, riparian corridors are delineated in numerous ways, as their definition is usually dependent on research approach or agency targets.Generally, they are the transition between terrestrial and freshwater ecosystems [2,21,22] and include components of topography, vegetation, and soils.Riparian corridors are dynamic regions with complex heterogenic landscapes formed by frequent disturbances [23], and therefore, are challenging to delineate and map across large spatial scales [4,[24][25][26].Even fixed buffers along streams have been broadly employed in delineating riparian corridors.The total potential maximum extent of riparian corridors can be captured based on geomorphology [27], and within this area, temporal fluctuations in riparian corridor vegetation can be evaluated with spectral imagery [24,27].In many areas, riparian vegetation is spectrally inimitable because of its more persistent greenness when compared to upland areas and other landcover classes [28].Here, we use topography to define the boundaries of the riparian corridor and use spectral remote sensing to differentiate non-riparian vegetation and landscape features from riparian vegetation within these corridors.
The goal of this study was to create "CO-RIP", a publicly available dataset of riparian vegetation and riparian corridors along the US portion of the Colorado River and its major tributaries.To achieve this goal, this study had the following objectives: (1) Delineate the maximum extent of the riparian corridor as defined by topographic features, (2) use remote sensing to detect riparian vegetation, and (3) integrate these products to create a cumulative riparian extent and vegetation dataset for the major streams and rivers in the Colorado River Basin.In addition to producing the highest spatial resolution riparian corridor and vegetation dataset available for the Colorado River Basin, this research provides a new framework for defining riparian vegetation in the context of remote sensing and integrates two existing methodologies (valley bottom delineation and spectral remote sensing of vegetation) into a single, process based framework that can be employed to delineate riparian corridors and vegetation in regions across the globe.

Study Area
The study area encompassed the entirety of the US portion of the Colorado River Basin, an extensive and ecologically diverse watershed that covers 637,000 km 2 over seven states, including Colorado, Wyoming, New Mexico, Arizona, Utah, Nevada, and California (Figure 1).The Colorado River Basin stretches longitudinally over 2300 km and is characterized by its intense topographic diversity, with elevations ranging from sea level at the Gulf of California to over 4000 m in the highest areas of the Rocky Mountains of Colorado.The basin's complex network of streams and rivers drain into the Colorado River, one of the largest rivers by length and drainage area in the United States [29].The river and its basin are characterized by twelve different ecologically distinct regions ( [30], Appendix A) and dominated by four (Figure 1).Coniferous forests and alpine meadow environments characterize the River's mountainous headwaters, where the Colorado River is relatively small and primarily fed by seasonal snowpack.Further southward, the Arizona and New Mexico Plateau and Mountains are characterized by dense coniferous forests at high elevations, pinyon (Pinus spp.) and juniper (Juniperus spp.) forests at mid-elevations and sagebrush (Artemisia spp.) and chaparral in lower elevation areas where precipitation is low, resulting in a dry, arid environment.The Colorado River flows through the Grand Canyon in Arizona, into the dry, sparsely vegetated semi desert and desert environments flanking much of the southern basin.In its final stretch, the river travels through the Sonoran Basin and Range, encompassing a portion of northern Mexico, passing through arid, desert environments where saguaro (Carnegiea gigantean) and cholla cactus (Cylindropuntia spp.) and creosote bush (Larrea tridentate) dominate the landscape [30].A detailed description of each ecoregion, modified from [30], is provided in Appendix A.

Mapping Valley Bottoms a Proxy for the "Riparian Corridor Extent"
Riparian areas are shaped and bounded by the same topographic and hydrologic processes that define the topography of valley bottoms.Valley bottoms are therefore representative of the "maximum riparian corridor extent" [31][32][33], an area which separates the vegetative, topographic, and environmental characteristics of riparian areas from those of the upland (hereafter, the "maximum riparian corridor extent" will be referred to as "riparian corridor").We used the Valley Bottom Extraction Tool (V-BET) [33], a freely available ArcMap Toolbox [34], to map valley bottoms as a proxy for the riparian corridor across the Colorado River Basin.The V-BET tool allows for efficient delineation of valley bottoms across larger spatial extents and at higher resolutions than has typically been possible with publicly available datasets in the past [33].We applied the V-BET algorithm to all streams and rivers present within the Colorado River Basin that were greater than or equal to Strahler stream order "3" [35], thereby capturing the extent of riparian corridors along the majority of large streams and rivers within the Colorado River Basin.
The V-BET tool's derivative inputs can be obtained from two primary data sources: A high resolution (10 m 2 or less) digital elevation model (DEM) and hydrologic flowline data with high cartographic precision.We obtained a 10 m 2 digital elevation model from the National Elevation Dataset [36] and a hydrologic flow line data from the National Hydrography Dataset (NHD) [37].Flowlines representing ephemeral streams, storm water infrastructure, pipelines, canals, and aqueducts were removed as well as the NHD dataset's 'artificial paths' within lakes and waterbodies with an area greater than 0.0005 km 2 .If there were sinks present within the DEM they were filled to remove local depressions, and then slope, aspect, flow direction, flow accumulation and drainage area rasters were created from the DEM as derivative inputs to V-BET.To maintain processing efficiency, we separated the study area using hydrological unit codes (HUCs) at the HUC-6 level [38] and processed each HUC independently within the V-BET toolbox in ArcMap v10. 3 [34].Resulting valley bottom extents, which are output from the tool as polygon files, were qualitatively verified by trained interpreters and edited manually in a geographic information system (GIS) to remove any superfluous channels or over/under estimations of extent using the refinement and editing process detailed in reference [33].Due to the absence of field measurements, large size of the study area, and the lack of a standardized measure of valley bottomness, an accuracy assessment of the resulting valley bottom extents was conducted qualitatively using ancillary data sources [33], including high resolution (1 m 2 ) National Agriculture Imagery Program (NAIP) imagery, hill shade, and slope rasters in conjunction with ocular validation.

Mapping Riparian Vegetation
Our valley bottom analyses delineated the extent of the riparian corridor by land area, but a secondary analysis was necessary to determine the extent and cover of riparian vegetation within delineated valley bottoms.Due to the large spatial extent of the study area and the need to collect historical training samples for classification models, digital image sampling was used to collect environmental response data.Trained image interpreters used high resolution NAIP imagery for 2016 to ocularly interpret the presence and absence of riparian vegetation within the edited valley bottom extents.NAIP imagery is collected either biannually or on a 3 year rotating cycle, with some states in the Colorado River Basin collected in even years and others during odd years, so imagery was selected as close to the focal year of 2016 as possible (± 1 year).Where 4-band NAIP imagery was available, interpreters also used false color band combinations that helped to highlight the presence of vegetation.
We separated our digital image sampling into 12 distinct ecoregions using the Environmental Protection Agency (EPA) Level III ecoregion product [39] to reduce vegetative and environmental variability within the training data.Using a scripted Google Earth Engine image visualization and data collection interface [34], riparian vegetation presence and absence training points were collected in each ecoregion in a purposive fashion [40].We used purposive sampling to integrate the expert opinion of the interpreter in the selection and distribution of our sampling locations.This helps to ensure sampled locations represented the full range of environmental and spectral variability present on the landscape.Riparian vegetation presence points had to meet multiple criteria (Figure 2) to be classified as riparian vegetation: (1) The location sampled had vegetation present and ( 2) the vegetation at that location was either visually assessed to be influenced by the corresponding water feature [41,42], or the vegetation was noticeably unique from that of the upland with regard to species composition, structure, or vegetative vigor.Riparian vegetation absence points were defined as locations within the valley bottom where riparian vegetation is not present (i.e., bare ground, rock, structures) or locations where vegetation is present at the location in the valley bottom but was not characteristic of riparian vegetation (e.g., agricultural areas or vegetation located in the valley bottom that could not be differentiated from those observed in upland areas).Sampling points were placed in areas where the classification type dominated the immediate area, with care taken so that points were not placed on the edge of multiple classification types (i.e., points were placed in the middle of a riparian vegetation clump instead of on the edge of a vegetation-bare soil gradient).We collected hundreds of training samples in each ecoregion so that a range of vegetative densities would be represented (Appendix B).An overview of the process used to delineate riparian vegetation.First, a decision tree process was used by trained image interpreters to evaluate whether a sampled location should be classified as "riparian vegetation presence" or "riparian vegetation absence".These training points were then used to build classification models of riparian vegetation presence and riparian vegetation absence, and applied as a prediction across each ecoregion.In post-processing, vegetation classified as riparian vegetation that was outside of the valley bottom was then excluded using the valley bottom extraction tool (V-BET) GIS output.
Interpreters intentionally placed points to represent the diversity of riparian and non-riparian vegetation, including in areas where agriculture had multiple spectral signals/gradients.Training data locations were spaced sufficiently to prevent double sampling of a pixel in Landsat imagery, which has a spatial resolution of 30 × 30 m.In total, we collected 6672 presence samples and 7839 absence samples across the Colorado River Basin.These samples are further summarized by each ecoregion in Appendix B.
To create spectral predictors of riparian vegetation, we composited Landsat 8 Operational Land Imager (OLI) imagery for 2016 using the median pixel values of all available cloud-free image collections from a two month period in the growing season (1 July-31 August).Median pixel value images from the growing season were selected to represent leaf-on riparian vegetation in a pre-senescent state.Additional spectral indices were then created using these composited image products including the tasseled cap transformation [43] specifically created for Landsat 8 OLI [44], normalized difference vegetation index [45], Soil Adjusted Vegetation Index (SAVI) [46] and Modified Normalized Difference Water Index (mNDWI) [47].In total, we selected eight Landsat bands corrected to top of atmosphere reflectance, four spectral indices, and two topographic surface rasters derived from the DEM as potential predictor variables (Table 1).We extracted predictor variable values from all rasters that corresponded with each of the recorded riparian presence/absence locations in preparation for our classification analyses.We fit a unique random forests classification model [48] for each of the twelve ecoregions using the riparian vegetation training samples as the response and extracted predictor values.Random forests is a decision tree based classification algorithm that is beneficial for use in remote sensing analyses, as it is non-parametric, simple to evaluate and difficult to overfit [49].A detailed description of the operation and evaluation of random forests is presented in reference [48] and a detailed description of the application of the algorithm for classification of landcover is offered in reference [50].In an effort to balance computational power and efficiency, each random forests model was built with 1000 trees, with all other algorithmic tuning parameters left at the R Package Random forest's default settings.Riparian vegetation was modelled at the EPA ecoregion level [39].While there remains environmental heterogeneity within EPA ecoregions, this delineation was carried out to ensure models performed suitably across all riparian environments in the Colorado River Basin, which may have substantially varied spectral signatures from region to region.
We selected final combinations of model predictors using rfUtilities [51], a model selection package available in Program R v3.3 [52] that was created specifically for random forests.rfUtilities' "rf.modelSel" function implements a model selection approach created in reference [53], which identifies the most parsimonious model by selecting the smallest combination of predictors with the highest pseudo-R 2 and lowest mean squared error [53,54].A covariate correlation plot was generated to remove correlated predictors and rfUtilities was used to generate variable importance.These two procedures were used together to determine the final set of predictors.We removed variables with less than 5% importance in the initial model selection.In the remaining set of covariates, if two variables were correlated above 0.8, only the variable with greater relative importance was kept.Removing correlated predictor variables enables clearer interpretation of variable importance values and can improve model performance [55].We subsequently evaluated each model individually using out-of-bag metrics [48] and a 10-fold cross validation approach [56] to calculate error rates and kappa values.We additionally reported the percent contribution of each predictor in each final model and out of bag and cross-validated error, confusion matrices, class errors, as well as user's and producer's accuracies for each individual model (Appendix C).Given the size of our study area, on-the-ground field validation of our results was not possible, but we believe these standardly employed model evaluation approaches offer an accurate representation of model performance.Our final models were applied using our predictive raster surfaces to generate spatial predictions of the presence and absence of riparian vegetation across the Colorado River Basin, which was clipped to the V-BET extent.

Valley Bottom/Maximum Riparian Corridor Extent Delineation
Our final valley bottom extent results produced using V-BET are available as polygons (Figure 3).The final results from the V-BET algorithm serve as our definition of the riparian corridor, but are not limited to this application and are distributed independently of our riparian vegetation layer for use in related analyses.Substantial visual validation and manual editing was carried out by trained interpreters to correct any VBET extent errors that were observed in the results.Manual editing and validation was extensive, and took multiple interpreters a total of 320 h to complete.The majority of the errors occurred within two main environments: (1) Flatter floodplain areas and (2) high-elevation or high relief headwater regions.The flatter floodplain errors were most evident in the southern portion of the study area, primarily the Sonoran Basin and Mojave Basin.In contrast, the model errors which were observed in northern ecoregions, such as the southern Rockies, primarily occurred in locations displaying steep slopes.Valley bottom delineation errors likely occurred as a result of the 10 m 2 digital elevation model used to delineate valley bottoms.While a 10 m 2 digital elevation model is considered to be high resolution in many research settings, limitations remain in areas with steep slopes or narrow channels where an even higher spatial resolution DEM would be required to improve delineation accuracy.Depending on the width of the valley bottom and slope of surrounding features, valley bottoms could be both over or underestimated.Our errors are consistent with the observations seen in reference [33], which describes the reasoning for errors in V-BET outputs and provides detailed examples.

Riparian Vegetation
Each of the vegetation classification models were evaluated independently, separated by EPA Level III ecoregion.Overall, models performed reasonably well in the classification of riparian vegetation, with models in some ecoregions outperforming other ecoregions.Each final model was built using a slightly different combination of environmental predictors, which was expected based on the high degree of environmental heterogeneity between EPA ecoregions (Figure 4).Near-infrared reflectance and thermal values (band 5 and band 11 of Landsat 8), and indices including NDVI, SAVI, mNDWI, tasseled cap brightness and wetness, and topography (slope and elevation) were retained most often in our final models across ecoregions.The random forests classification models had a median out of bag (OOB) error of 10.2% (range 4.2-28.8%),median Kappa of 0.80 (range 0.42-0.90)and class errors ranging from 5-30% (median 10%) (Figure 5).Appendix C shows out of bag and cross-validated error, confusion matrices, class errors, as well as user's and producer's accuracies for each individual model.Representative examples of riparian vegetation model predictions across four major ecotones are provided in Figure 6.Most models performed well.However, models located primarily in coniferous forest environments (e.g., Wasatch Uinta ecoregions) performed poorly in comparison to those carried out in arid riparian systems (e.g., Sonoran Basin; Figure 5).

Dataset Availability
Our final valley bottom extent polygons and riparian vegetation maps analysis are available freely online through the Dryad Digital Repository at https://doi.org/10.5061/dryad.3g55sv8.The riparian vegetation dataset is distributed in a raster format, with a value of "0" corresponding to riparian vegetation absence and a value of "100" corresponding to riparian vegetation presence for a particular pixel (Figure 6).Valley bottom extents, which we have defined as a proxy for the spatial extent of the riparian corridor, are distributed as polygons in shapefile format.Data files can be obtained for the entire Colorado River Basin or for each individual ecoregion based upon the user's preference.

Discussion
We created CO-RIP, a basin-wide riparian corridor extent and riparian vegetation distribution dataset that has broad applications in riparian area management, wildlife surveys and conservation, invasive species management and ecohydrological research.The spatial distribution and resolution of this dataset will allow for researchers to conduct landscape scale analyses that would not have been possible for the Colorado River Basin in the past.Our methods integrating topographic delineation of valley bottoms combined with vegetation classification modelling allowed for a consistent definition of riparian corridors and vegetation across an extremely ecologically diverse study area.This process also emphasized the challenges associated with creating a dataset of this scale in a setting where no conclusive definition of "riparian corridors" or "riparian vegetation" exists.
The dataset produced through these analyses was created to match our definition of "riparian corridors" and "riparian vegetation", and as such, should not be construed to fully represent any of the definitions of riparian corridors put forth in past research or in governmental reports.Instead, we integrated specific concepts from past research and governmental guidance to craft a definition (Figure 2) and corresponding dataset that was possible to accurately produce using only guided image interpretation, and topographic, GIS, and remote sensing based analyses.While our definition fit well within a remote sensing based modelling framework, modelling riparian vegetation within specific habitats presented some challenges.
We found our definition and associated analyses fit and performed well in arid regions of the Colorado River Basin where riparian vegetation was spectrally discernable from non-riparian vegetation.In these areas, the availability of water provided by riparian corridors creates a strikingly different composition of plant species and vibrancy of vegetation that is clearly interpretable and discernable using high resolution imagery to conduct sampling, coupled with Landsat spectral information (Figure 6).However, in high elevation areas with dense coniferous vegetation, particularly in the Wasatch Uinta Mountains and Southern Rockies ecoregions, the consistent presence of moist coniferous forests made it difficult to interpret differences between riparian and upland vegetation, as the vegetative gradient was largely consistent from the upland to the water feature's bank.While we only sampled riparian vegetation presence locations where vegetation was clearly unique from the upland, models in these areas are likely underestimating riparian vegetation within the valley bottom (Figure 6).This confusion also resulted in poorer model statistics in these ecoregions.Users must employ caution, especially in these ecoregions, by ensuring that their particular application of this dataset fits the definition set forth by the authors (Figure 2), which requires riparian vegetation to be different in either observed species composition or vegetative vigor from that of the upland to be classified as such.A definition that is more inclusive in the classification of coniferous vegetation as riparian vegetation, even if it is not easily distinguishable from that of upland vegetation, may be more appropriate in these particular areas, but was outside the scope of this research.We also conducted image based sampling for riparian vegetation throughout many canyons (Figure 6).Issues presented by shadows, pixel size, canyon depth, and occlusion complicate vegetation mapping with remotely sensed imagery.While these limitations are certainly important considerations in the application of our dataset in subsequent analyses, these issues manifest in a relatively small portion of the study area overall, and we are confident the dataset is broadly applicable to a wide variety of research.
This dataset can support novel research on the potential effects of riparian habitat fragmentation, and inform future land-use and riparian network connectivity planning in the basin [57].The riparian vegetation map could be immediately applied to prioritize potential areas for restoration and conservation activities in the Colorado River Basin [58].For example, in the western US, there are more than 70 avian species considered to be riparian habitat obligate or dependent [59,60].While many studies describe the importance and use of riparian habitat by such species, few have quantified habitat use or population densities at landscape scales [20,61,62] in part because of a previous lack of a spatially explicit riparian dataset.Thus, there are many potential applications of the dataset, and these ideas represent only a subset of the research made possible through the CO-RIP.
Recent advancements in access to remote sensing data and the computational power provided by cloud computing enabled this project to be successfully carried out in a relatively short period of time (~1 yr).The ability to access, process, and mosaic high resolution imagery for interpretation, as well as to process, mosaic, and composite Landsat images across a growing season in seconds would have been impossible for a standard user just years ago.We suggest that users interested in utilizing these types of geo-information across large spatial extents or implementing our riparian vegetation modelling framework to highly consider the use of cloud computing platforms [34] to expedite data processing, modelling, and image interpretation.
The Valley Bottom Extraction Tool [33] was a valuable and approachable tool for delineating valley bottoms across vast river networks.It was time-efficient and, in a qualitative sense, an accurate representation of valley bottoms and riparian corridors, especially considering the extensive region of interest.The flexibility of visually validating and manual editing of valley bottom results allows for an enhanced product that characterizes the diverse nature of valley bottoms.We do, however, recommend manual editing be conducted by scientists with an expert understanding of hydrological and ecological characteristics of the study region; erroneous manual editing could significantly degrade the V-BET algorithm results.
While the use of EPA ecoregions to subset a study area for modelling is a standard practice [63,64], we believe parsing the study site by ecoregion likely improved model performance, as these boundaries helped create a more spectrally consistent definition of riparian vegetation within specific regions of our expansive study region.While these ecoregion boundaries are not perfect delineations of separate habitat types, in broad scale analyses like those employed in this study, they can serve as a useful tool to collect largely spectrally consistent training samples.While we used composited Landsat imagery to ensure the availability of cloud free imagery in our study area, with additional time and resources a multi-temporal approach could be employed to refine these models.Including imagery that represented spring, summer, and fall seasons in each of these models (a "multi-temporal modelling framework") [65,66] could help capture vegetation with short leaf on periods and improve classification discrimination between riparian vegetation and irrigated agriculture.
Finally, we took care to collect a sufficient number of training samples in each ecoregion to effectively capture the full range of cover that could be present within the boundaries of a Landsat pixel.However, this method could likely be improved by estimating the area of a pixel vegetated by riparian vegetation, instead of simply the presence and absence of vegetation, and then working in a regression based modelling framework instead of a presence/absence classification.This could likely improve area estimates and would make samples more representative of what is present within the boundaries of a specific pixel.While this could complicate sampling and modelling, and require additional time and resources, we believe it should be considered in future digital sampling designs created for mapping of riparian vegetation.

Conclusions
The research and analyses conducted to produce the "CO-RIP" datasets provides a valuable contribution to methodology to delineate riparian area and corridor extent networks at the landscape scale, and could be further applied in new areas or across larger spatial extents.Satellite image acquisition, image interpretation and modelling using newly available interfaces, advancements and availability of higher resolution topographic models, and the availability of peer-reviewed, open source valley bottom network models allow for regional scale analyses that would have been prohibitively time consuming and costly just years ago.The "riparian corridor" and "riparian vegetation" definitions adopted in this study provide a standardized, remote sensing and GIS-friendly framework for mapping of riparian areas in the future.Still, moist coniferous forest environments represent a challenging ecotype to separate riparian vegetation from upland vegetation because of the spectral consistency across these ecotones.Further research is warranted on the differentiation of upland coniferous vegetation from that present in the riparian corridor, and may require on the ground sampling to improve future landscape-scale models.Finally, this research resulted in a novel dataset for the Colorado River Basin, titled "CO-RIP" that provides ecologists, conservationists, and land managers with the highest spatial resolution delineation of riparian areas and valley bottom networks available for the ecologically diverse and topographically complex basin environment.

Appendix B
Number of riparian vegetation presence and absence points collected within each ecoregion.

Figure 1 .
Figure 1.The study area extent, which encompasses the entire US portion of the Colorado River Basin, classified by EPA Level 3 ecoregion.

Figure 2 .
Figure 2.An overview of the process used to delineate riparian vegetation.First, a decision tree process was used by trained image interpreters to evaluate whether a sampled location should be classified as "riparian vegetation presence" or "riparian vegetation absence".These training points were then used to build classification models of riparian vegetation presence and riparian vegetation absence, and applied as a prediction across each ecoregion.In post-processing, vegetation classified as riparian vegetation that was outside of the valley bottom was then excluded using the valley bottom extraction tool (V-BET) GIS output.

Figure 3 .
Figure 3. Example subsets of valley bottom extent results across three different environments.Green lines represent the external boundaries of the valley bottom, which we define as the riparian corridor.Vegetation outside of valley bottoms is considered to be outside of the riparian corridor.Within the valley bottom, riparian vegetation was separated from non-riparian vegetation such as agriculture (example in middle image) using riparian vegetation classification.

Figure 4 .
Figure 4. Final predictors included in each random forests classification model.Each row represents a separate model for the specified ecoregion.The numbers in each row are the relative percent contribution of each predictor in each individual model.White boxes with numbers represent the lowest relative contribution, light blue a medium relative contribution, and darker blue a high relative contribution.White boxes with no numbers correspond to predictors that were not retained in the final models, and therefore did not contribute to the final model.From Landsat 8; B2: Blue band; B3: Green band; B4: Red band; B5: Near infrared band; B6: Shortwave infrared 1 band; B7: Shortwave infrared 2 band; B10: Thermal infrared 1 band; B11: Thermal infrared 2 band; NDVI: Normalized Difference Vegetation Index; SAVI: Soil Adjusted Vegetation Index; mNDWI: Modified Normalized Difference Water Index; TCAP B, G, W: Tasseled cap brightness, greenness, and wetness.

Figure 5 .
Figure 5. Radar charts displaying cross-validated error rates (A) (0-100%) and Kappa (B) (0-1) for each of the final random forests classification models.Model performance is best when error rates (A) are closer to the center and when kappa values (B) are closer to the perimeter of the chart.

Figure 6 .
Figure 6.Representative examples of our final results in four different ecoregions.(A) Displays the differentiation of agriculture from riparian vegetation and the delineation of the riparian corridor; (B) displays delineation of riparian vegetation in narrow, sinuous canyon environments, which can be challenging using remotely sensed datasets; (C) displays an underprediction of riparian vegetation that is characteristic of results seen in moist coniferous forest environments; (D) displays the delineation of riparian vegetation in arid environments, where models performed extremely well in delineating riparian vegetation from non-riparian vegetation and landscape characteristics.

Table 1 .
The environmental predictors that were employed in each random forests classification model.Predictors were subsequently pared down through variable selection.The following bands from Landsat 8 OLI/TIRS were applied: Red, green, blue, near infrared, shortwave infrared and thermal bands.The panchromatic, cirrus, and coastal/aerosol bands were excluded from all random forests classification models. *