Developing an Effective Model for Predicting Spatially and Temporally Continuous Stream Temperatures from Remotely Sensed Land Surface Temperatures

Although water temperature is important to stream biota, it is difficult to collect in a spatially and temporally continuous fashion. We used remotely-sensed Land Surface Temperature (LST) data to estimate mean daily stream temperature for every confluence-to-confluence reach in the John Day River, OR, USA for a ten year period. Models were built at three spatial scales: site-specific, subwatershed, and basin-wide. Model quality was assessed using jackknife and cross-validation. Model metrics for linear regressions of the predicted vs. observed data across all sites and years: site-specific r2 = 0.95, Root Mean Squared Error (RMSE) = 1.25  ̋C; subwatershed r2 = 0.88, RMSE = 2.02  ̋C; and basin-wide r2 = 0.87, RMSE = 2.12  ̋C. Similar analyses were conducted using 2012 eight-day composite LST and eight-day mean stream temperature in five watersheds in the interior Columbia River basin. Mean model metrics across all basins: r2 = 0.91, RMSE = 1.29  ̋C. Sensitivity analyses indicated accurate basin-wide models can be parameterized using data from as few as four temperature logger sites. This approach generates robust estimates of stream temperature through time for broad spatial regions for which there is only spatially and temporally patchy observational data, and may be useful for managers and researchers interested in stream biota.


Introduction
Stream water temperature is one of the most important factors influencing productivity, species composition, community structure, and life history expression of stream organisms.Timing of significant ecological events like invertebrate and fry emergence, onset and duration of photosynthesis, spawning, and migration are all determined in part by temperature (e.g., [1,2]).For example, fish growth and survival are optimized at particular temperatures (e.g., [3]) and these responses can be used in life cycle and bioenergetics modeling frameworks to incorporate fish activity and metabolic response to varied environmental conditions, such as climate change scenarios [4].Therefore, the ability to track temperatures across multiple seasons and life stages becomes critical to accurately predicting fish response to changing conditions.
Measures of daily, seasonal, and annual variation of stream temperature are ubiquitous in stream ecology research.Annual mean and maximum temperatures, such as seven-day running means and maximums, are commonly used as indicators of stream "health" or habitat quality for aquatic species [5] and are also used as criteria for evaluation of adherence to the Clean Water Act [6,7].Changes in the thermal regime of a given stream through human or natural alterations of the surrounding landscape (e.g., changes in riparian structure shading capacity, large-scale fire effects, physical changes to the stream channel either directly or through changes in other factors like sedimentation) are of particular concern to local, regional, and federal land managers [8].
Many aquatic organisms live for years and travel far during their life cycle.It is important for a complete understanding to be able to characterize stream temperatures not just at the local scale, but also across broad spatial and temporal scales, from reach to watershed to region [9].Vagile species like salmon experience the temperature of hundreds to thousands of stream kilometers across years, and therefore population-level models need stream temperature estimates across regions.Similarly, life cycle models have benefited from inclusion of life-stage specific temperature estimates at these expansive spatial scales as well as through time [10][11][12].
Generating useful and accurate measures of water temperature requires monitoring.Digital thermographs (temperature loggers) are often placed in streams and can provide high temporal resolution data for a site, but only for the duration of their placement in the stream.There are also spatial limitations of temperature logger use, as the spatial structure of temperatures in streams can vary over short distances (e.g., within 10 m), such as when loggers are located close to tributary junctions [13].Although digital thermographs are relatively inexpensive, the effort required to place and maintain the logger and retrieve the data is not.High implementation costs therefore can inadvertently affect logger placement within a watershed and even limit spatial extent, as some parts of many stream systems are almost entirely inaccessible to field crews.Conversely, spatially continuous surface water temperatures collected with remote sensing can illuminate spatial patterns at very fine resolutions over broad extents [14][15][16].However, data are typically only available for snapshots in time at a handful of locations.Remote sensing based approaches can also be expensive due to the cost of flights and sensors and required expertise in image processing.
Researchers have addressed the mismatch between the need for spatially and temporally continuous stream temperature data and the reality of the limited nature of almost all stream temperature datasets through a variety of estimation approaches with varying degrees of success.One of the most common methods for estimating temperature at large spatial scale is to develop models based on continuously available attributes.Air temperature is a relatively easy dataset to acquire for many localities and it has similar daily, seasonal, and annual variations to stream temperature, making it an obvious covariate choice in stream temperature models.However, the spatial patterns of variability in air and stream temperature are different and operate at different scales [17].Thus, air temperature is not always a good predictor of stream temperature particularly when modeling across long time periods [18][19][20].Complex heat budget models that include thermodynamic processes are able to improve temperature estimations in specific spatial contexts [21,22].Other approaches use explicitly spatial methods of stream temperature estimation, such as inverse distance weighting and kriging, directly addressing spatial variability [23][24][25].However, streams have unique spatial properties, like network connectivity and directionality of information flow, that are hard to capture using traditional spatial interpolation methods.In these models, predictions across time require new interpolations for each time step, making estimates into unmonitored time periods difficult [26].
Peterson et al. [26] recent model innovations that explicitly include the unique spatial properties of stream networks have been particularly successful in regions where spatially dense logger datasets are available.However, they are still limited by the temporal resolution and duration of observations used to parameterize models.Vatland et al. [16] were able to develop robust temperature estimates at high spatial (~100 m) and temporal (hourly) resolution over ~100 km of stream using thermal infrared imagery from a fixed wing aircraft mounted imaging system.
Remotely sensed data from Earth orbiting satellites offer a potential data source of both temporally and spatially abundant temperature data and are often freely available [27].For example, Water 2015, 7,[6827][6828][6829][6830][6831][6832][6833][6834][6835][6836][6837][6838][6839][6840][6841][6842][6843][6844][6845][6846] daily data are available from the U.S. National Aeronautics and Space Administration's (NASA) Moderate Resolution Imaging Spectroradiometer (MODIS) sensors for 36 spectral bands and derivative products.Two MODIS platforms, Terra and Aqua, are in sun-synchronous, near-polar, circular orbits with 10:30 a.m.ascending and 1:30 p.m. descending nodes, respectively.One of the available MODIS products, Land Surface Temperature (LST), is produced using the generalized split-window LST algorithm [28,29].It is available daily at 1 km 2 spatial resolution for most of the planet and is a measure of thermal conditions at the Earth's surface.These conditions are influenced by air temperature, climate, surface geology, vegetation, elevation, and physiography, among other factors, and these are the same factors that influence the temporal and spatial variation of temperature in many stream systems.
In this paper, we provide examples of using public LST data to estimate stream water temperature in a spatio-temporally continuous manner in watersheds where we have a set of point-wise stream temperature observations from in-water sensors.We are not attempting to directly estimate water temperature from LST (like, for example, direct estimates of sea surface temperature [30]).Instead, we use LST as a covariate in linear models, first in the John Day River (JDR; a tributary to the Columbia River, USA) and then in other interior Columbia River (ICR) basins (the Lemhi (LRB), Tucannon (TRB), Upper Grande Ronde (UGR) and Wenatchee (WRB) Rivers (Figure 1)), at spatial and temporal scales of reaches and days.We chose the JDR basin for several reasons.First, it is one of the longest undammed rivers in the continental United States [31], which makes it an ideal candidate for elucidating spatially variable relationships between land surface and water temperature.Second, the John Day River provides habitat for Endangered Species Act (ESA) listed steelhead (Oncorhynchus mykiss) and bull trout (Salvelinus confluentus), as well as Chinook salmon (O.tshawytscha) and several trout species (cutthroat (O.clarkii) and redband (O.mykiss gairdnerii)).The basin is unique in that there are no hatchery salmonids released into the JDR, and thus it provides important habitat for wild fish populations.Moreover, the size, topology, and geographic extent of the John Day basin makes deployment of temperature loggers challenging, therefore modeling stream temperature using remote data sources is an appealing option for managers.Other ICR basins, while not as large as the JDR, also provide important fish habitat in a variety of landscape contexts.Our specific objectives were to: (1) estimate stream temperature using remotely-sensed MODIS land surface data; (2) evaluate the predictive capability of models developed at site, subwatershed, and basin scales; and (3) assess how many sites in a basin are necessary to develop robust temperature estimates.
Water 2015, 7, page-page 3 split-window LST algorithm [28,29].It is available daily at 1 km 2 spatial resolution for most of the planet and is a measure of thermal conditions at the Earth's surface.These conditions are influenced by air temperature, climate, surface geology, vegetation, elevation, and physiography, among other factors, and these are the same factors that influence the temporal and spatial variation of temperature in many stream systems.
In this paper, we provide examples of using public LST data to estimate stream water temperature in a spatio-temporally continuous manner in watersheds where we have a set of point-wise stream temperature observations from in-water sensors.We are not attempting to directly estimate water temperature from LST (like, for example, direct estimates of sea surface temperature [30]).Instead, we use LST as a covariate in linear models, first in the John Day River (JDR; a tributary to the Columbia River, USA) and then in other interior Columbia River (ICR) basins (the Lemhi (LRB), Tucannon (TRB), Upper Grande Ronde (UGR) and Wenatchee (WRB) Rivers (Figure 1)), at spatial and temporal scales of reaches and days.We chose the JDR basin for several reasons.First, it is one of the longest undammed rivers in the continental United States [31], which makes it an ideal candidate for elucidating spatially variable relationships between land surface and water temperature.Second, the John Day River provides habitat for Endangered Species Act (ESA) listed steelhead (Oncorhynchus mykiss) and bull trout (Salvelinus confluentus), as well as Chinook salmon (O.tshawytscha) and several trout species (cutthroat (O.clarkii) and redband (O.mykiss gairdnerii)).The basin is unique in that there are no hatchery salmonids released into the JDR, and thus it provides important habitat for wild fish populations.Moreover, the size, topology, and geographic extent of the John Day basin makes deployment of temperature loggers challenging, therefore modeling stream temperature using remote data sources is an appealing option for managers.Other ICR basins, while not as large as the JDR, also provide important fish habitat in a variety of landscape contexts.Our specific objectives were to: (1) estimate stream temperature using remotely-sensed MODIS land surface data; (2) evaluate the predictive capability of models developed at site, subwatershed, and basin scales; and (3) assess how many sites in a basin are necessary to develop robust temperature estimates.

Study Area
The Columbia River basin covers ~668,000 km 2 in the Pacific Northwest of North America.The John Day River basin includes over 15,000 stream kilometers covering ~20,000 km 2 in north-central Oregon, USA (Figure 2).Elevations in the John Day River basin range from 82 to over 2700 m, and it drains parts of the Ochoco, Blue, Elkhorn, and Strawberry mountain ranges, ultimately ending at the Columbia River.Notable natural features within the watershed include the John Day Fossil Beds National Monument, parts of the Ochoco, Umatilla, and Malheur National Forests, and 237.4 km of Wild and Scenic designated rivers [32].The flora ranges from sage-steppe near the mouth to high desert grasslands and pine and fir forests in the mountains.There is a single natural barrier to anadromous fish passage on the south fork of the JDR, and otherwise the mainstem, middle, and north forks are entirely undammed and winter and spring flows fluctuate based on precipitation and snowmelt.Stream structure and temperature in the JDR has been altered through historic and contemporary mining, logging, grazing, and water withdrawal activities throughout the basin [33].

Study Area
The Columbia River basin covers ~668,000 km 2 in the Pacific Northwest of North America.The John Day River basin includes over 15,000 stream kilometers covering ~20,000 km 2 in north-central Oregon, USA (Figure 2).Elevations in the John Day River basin range from 82 to over 2700 m, and it drains parts of the Ochoco, Blue, Elkhorn, and Strawberry mountain ranges, ultimately ending at the Columbia River.Notable natural features within the watershed include the John Day Fossil Beds National Monument, parts of the Ochoco, Umatilla, and Malheur National Forests, and 237.4 km of Wild and Scenic designated rivers [32].The flora ranges from sage-steppe near the mouth to high desert grasslands and pine and fir forests in the mountains.There is a single natural barrier to anadromous fish passage on the south fork of the JDR, and otherwise the mainstem, middle, and north forks are entirely undammed and winter and spring flows fluctuate based on precipitation and snowmelt.Stream structure and temperature in the JDR has been altered through historic and contemporary mining, logging, grazing, and water withdrawal activities throughout the basin [33].The Lemhi River (LRB) is a tertiary tributary to the Columbia River located in SW Idaho, USA.It is ~100 km long and drains parts of the Lemhi and Bitterroot ranges.The Tucannon River is a secondary tributary to the Columbia River in SE Washington, USA.It flows for ~80 km, draining ~1300 km 2 of Oregon Butte in the Blue Mountain range [34].The Upper Grande Ronde River watershed is part of the Grande Ronde River subbasin located in the NE corner of Oregon, USA.It drains ~4248 km 2 of the Blue and Wallowa Mountains [35].The Wenatchee River basin is the most northern basin included in this analysis and is ~3500 km 2 and is located in central Washington, USA.The Lemhi River (LRB) is a tertiary tributary to the Columbia River located in SW Idaho, USA.It is ~100 km long and drains parts of the Lemhi and Bitterroot ranges.The Tucannon River is a secondary tributary to the Columbia River in SE Washington, USA.It flows for ~80 km, draining ~1300 km 2 of Oregon Butte in the Blue Mountain range [34].The Upper Grande Ronde River watershed is part of the Grande Ronde River subbasin located in the NE corner of Oregon, USA.It drains ~4248 km 2 of the Blue and Wallowa Mountains [35].The Wenatchee River basin is the most northern basin included in this analysis and is ~3500 km 2 and is located in central Washington, USA.It drains portions of the eastern Cascade Crest.It has the broadest elevational range of the included watersheds (185-2870 m).

Geospatial Data
Vector coverages of stream segments in the included basins were generated using a 10 m DEM [36] with the Functional Linkage of Waterbasins and Streams (FLoWs) toolbox in ArcGIS 9.3 [37,38].FLoWs builds a flow-directed network from a DEM and partitions the landscape into land surface drainage areas.For example, the derived network for the John Day included 5532 stream reaches, defined as each confluence-to-confluence stream extent.The FLoWS toolbox was also used to generate reach-contributing area (RCA) polygons representing the non-overlapping land surface area draining into each stream segment.These RCAs were used as the working spatial resolution of analysis and prediction, and averaged in area about 3.7 km 2 (standard deviation 3.28 km 2 ) within the JDR.Elevation for each RCA was taken from a 10 m digital elevation model for the downstream boundary of the defining stream segment [39].Elevation at each logger site was also determined.

Land Surface Temperature Data and Handling Data Gaps
Daily Land Surface Temperature (LST) for 2000-2009 (MOD11A1 v005, 1 km 2 spatial resolution rasters), and 8-day LST (MOD11A2 v005) for 2012 (also 1 km 2 ), from the MODIS sensor were downloaded from NASA's Earth Observing System Data and Information System [40].The native Hierarchical Data Format (HDF) files are in a NASA-defined global sinusoidal projection.We extracted the "LST_Day_1 km" Scientific Data Sets, and converted them to Environmental Systems Research Institute (ESRI) raster format [41].We reprojected the data to a user-defined Albers equal-area conic projection using a NAD83 datum.We clipped the daily data to the JDR extent, and the 8-day data to each basin extent, using the Marine Geospatial Ecology Tools (MGET) in ArcGIS [42].Error estimates for MODIS v005 LST are <0.2-0.7 K in most cases [43,44].Pixels with error flags of >1 K were excluded from the analysis.
Time series gaps sometimes occur in LST due to atmospheric conditions and satellite location and function.In the JDR, multi-day gaps in image availability often occur from dense cloud cover, obscuring sensing for hours to days, particularly in the winter months.Rarely, multi-day gaps due to sensor malfunction can also occur.The daily data was more likely than the 8-day data to have gaps, though occasional gaps did occur in the 8-day data.As a result, it was necessary to develop a gap-filling method such that predictions of water temperature would be spatially and temporally continuous throughout the basins.Gap-filling methods were developed on the daily LST data.Three approaches to gap-filling were considered: use of an alternate data source, temporal interpolation and spatial interpolation.
We first tested an alternative remotely-sensed data source as a gap filler for the daily data: the Advanced Microwave Scanning Radiometer-Earth Observing System (AMSR-E) data, a passive microwave sensor that collects brightness temperature measurements (radiance of microwave radiation coming from the Earth's surface).Microwave radiation is cloud penetrating, so this sensor offers a potential supplemental data source when MODIS LST is unavailable due to weather conditions [45,46].Conveniently, the AMSR-E and MODIS sensors are both aboard the same satellite platform and therefore have a common spatial and temporal domain.We downloaded daily daytime descending pass AMSR-E brightness temperature for bands at 18, 23, 36, and 89 GHz (EASE grid format, 25 km 2 spatial resolution) from the National Snow and Ice Data Center (NSIDC) [47].Unfortunately, an artifact of the AMSR-E swath geometry and location of the JDR limits data availability in parts of the JDR every 5th day, so not all gaps could be filled using this method.Following Mao et al. [45], we parameterized and used the following polynomial structure to fill LST gaps: LST " α `β 89v `β2 p36v ´23vq `β3 p36v ´23vq 2 `β4 p36v ´18vq `β5 p36v ´18vq 2 (1) Water 2015, 7,[6827][6828][6829][6830][6831][6832][6833][6834][6835][6836][6837][6838][6839][6840][6841][6842][6843][6844][6845][6846] We also used a temporal linear interpolation to estimate missing LST values using the timeSeries package interpNA function in R [48] to fill gaps within the LST data.This function generates a linear interpolation across any missing values in a time series and is independent of spatial location.Linear interpolations were done on each year's daily LST data independently on a pixel-by-pixel basis.As this approach requires values on either side of data gaps in the time series, if the Julian day 1 or 365(6) in any year had a missing value, an estimate for that value was made using a 4th order polynomial built on the entire year's data.
Since spatial interpolation method are commonly used to generate spatially continuous data from patchy data [25], we tried filling gaps with universal kriging and inverse distance weighting (IDW) methods.However, after the initial review of these models, kriging and IDW were poor interpolators for LST, likely because cloud cover tended to be spatially continuous but temporally patchy, leaving little data to interpolate from [49,50].We therefore did not continue to pursue these methods.
To assess the utility of the two remaining gap-filling methods we compared the predictive power of unfilled vs. gap-filled daily LST for sites in each year that had a complete annual time series of Daily Mean Water Temperature (DMWT) data.We used root mean squared error (RMSE) and the predicted residual sum of squares (PRESS) statistics as measures of predictive power.Once time-series gaps were filled using the linear method, area-weighted means of daily LST were calculated for each RCA by intersecting the RCA and 1 km grid polygons in ArcGIS (Figure 3).
Water 2015, 7, page-page 6 We also used a temporal linear interpolation to estimate missing LST values using the timeSeries package interpNA function in R [48] to fill gaps within the LST data.This function generates a linear interpolation across any missing values in a time series and is independent of spatial location.Linear interpolations were done on each year's daily LST data independently on a pixel-by-pixel basis.As this approach requires values on either side of data gaps in the time series, if the Julian day 1 or 365(6) in any year had a missing value, an estimate for that value was made using a 4th order polynomial built on the entire year's data.
Since spatial interpolation method are commonly used to generate spatially continuous data from patchy data [25], we tried filling gaps with universal kriging and inverse distance weighting (IDW) methods.However, after the initial review of these models, kriging and IDW were poor interpolators for LST, likely because cloud cover tended to be spatially continuous but temporally patchy, leaving little data to interpolate from [49,50].We therefore did not continue to pursue these methods.
To assess the utility of the two remaining gap-filling methods we compared the predictive power of unfilled vs. gap-filled daily LST for sites in each year that had a complete annual time series of Daily Mean Water Temperature (DMWT) data.We used root mean squared error (RMSE) and the predicted residual sum of squares (PRESS) statistics as measures of predictive power.Once time-series gaps were filled using the linear method, area-weighted means of daily LST were calculated for each RCA by intersecting the RCA and 1 km grid polygons in ArcGIS (Figure 3).

In-Stream Thermal Logger Data
Daily summaries of hourly water temperature data from stream temperature loggers in the JDR were used as daily model response metrics and for model parameterization and validation.We utilized publically available logger data for sites where data had been collected between 2000 and 2009 by tribal, state, and federal organizations for unique and opportunistic monitoring purposes.These data had been compiled as part of the Integrated Status and Effectiveness Monitoring Program [51] and were reviewed and filtered prior to generating summary metrics of daily mean and maximum temperature (Figure 2).We utilized both automated data quality checks and a visual review of all data to remove common anomalies in water temperature data, such as leading and tailing air temperature readings, drying during low flow periods and burial in sediments [52].Stream temperature logger data sets were patchy in both space and time, as loggers were deployed by various agencies in different locations for less-than-one to several concurrent years.Most commonly, loggers were deployed from late spring through early fall.
Logger data from 510 spatially unique JDR sites across all years were included within models (Figure 2).The number of sites varied across years ranging from 13 in 2005 to 215 in 2001 (Table 1).Each site with at least 40 days of stream temperature measurements within a year was included in

In-Stream Thermal Logger Data
Daily summaries of hourly water temperature data from stream temperature loggers in the JDR were used as daily model response metrics and for model parameterization and validation.We utilized publically available logger data for sites where data had been collected between 2000 and 2009 by tribal, state, and federal organizations for unique and opportunistic monitoring purposes.These data had been compiled as part of the Integrated Status and Effectiveness Monitoring Program [51] and were reviewed and filtered prior to generating summary metrics of daily mean and maximum temperature (Figure 2).We utilized both automated data quality checks and a visual review of all data to remove common anomalies in water temperature data, such as leading and tailing air temperature readings, drying during low flow periods and burial in sediments [52].Stream temperature logger data sets were patchy in both space and time, as loggers were deployed by various agencies in different locations for less-than-one to several concurrent years.Most commonly, loggers were deployed from late spring through early fall.
Logger data from 510 spatially unique JDR sites across all years were included within models (Figure 2).The number of sites varied across years ranging from 13 in 2005 to 215 in 2001 (Table 1).Each site with at least 40 days of stream temperature measurements within a year was included in the dataset (the days did not have to be consecutive).The data set included 520 site-years (79,790 logger days) across all years (many sites had data from multiple years).Analyses were conducted in other ICR basins to assess 3 things: model sensitivity to data density, the relationship between LST and stream temperature in basins other than the JDR, and model quality effects of coarser temporal partitioning.Four other ICR basins were selected (the Lemhi, Tucannon, Upper Grande Ronde, and Wenatchee), along with the JDR, to represent the geomorphic and topological variability across the ICR.These basins are included in the Columbia Habitat Monitoring Program (CHaMP), and have temperature data collected following CHaMP protocols as part of a systematic, cross-agency effort to standardize fish habitat characterization methods across the interior Columbia River basin (champmonitoring.org).We utilized publically available 2012 logger data from CHaMP.2012 had the most data available across all the basins.
Eight-day summaries of hourly water temperature data from stream temperature loggers in the included basins were used as 8-day model response metrics and for model validation.Logger data were reviewed and filtered in a similar fashion as the daily data.CHaMP program data collection utilizes a spatially balanced survey design [53] and loggers are left instream year-round, with some sites visited annually and others on a 3-year rotating cycle.Sites are visited and logger data downloaded mid-year, so some sites only had logger data from either the first or second halves of the year, but those time-series were complete.While the 2000-2009 daily data were temporally patchy, the 2012 CHaMP temperature logger data were temporally continuous within each half of the year with few data gaps.

Model Development
We developed simple fixed-effects linear models for a 10-year period (2000)(2001)(2002)(2003)(2004)(2005)(2006)(2007)(2008)(2009) in the JDR.Mean elevation, stream gradient, and upstream catchment area are spatially continuous variables that have been shown to correlate with stream temperature (e.g., [54]) and were also considered as model predictor variables.Because there is spatial structure in the LST data, and the spatially continuous variables are correlated with each other, we used a simple linear regression of each candidate predictor variable on LST for the entire basin for all years to elucidate predictor variable correlation, and reviewed residuals before considering predictor inclusion in the LST model.Annual, 4th order polynomial models were briefly considered to model the annual temperature curve, but they yielded no better results than simple linear models and had less spatial predictive power due to over-fitting effects and were therefore excluded from the model development process.
We used a cross-correlation function (CCF) (Time Series Analysis (TSA) package in R) [55] to test for any significant time lags in the correlative relationships between LST and water temperature which would occur if land surface warming or cooling events detected in the LST were not reflected in concurrent water temperatures but instead lagged, and vice versa.We developed a CCF for data from 10 randomly selected sites around the basin.Daily mean water temperature and LST have shared time series structure, so a time series model for LST was parameterized using autoregressive [AR(1)] models.Residuals from this model were used to filter DMWT.This created a "white noise" series for input into the CCF from which significant lags can be identified.This process is known as prewhitening and differencing [56].This was done to each time series included and significant lags were identified through significant cross-correlations.
It was necessary to review the possible effects of using a land-surface based temperature with no limit in range to predict water temperature data, which have a floor of possible values.For example, no water temperatures below ´0.5 ˝C were observed in our data while land surface temperatures do fall below ´0.5 ˝C in winter months.We used censored regression ("tobit" models [57,58]) which sets a limit on the possible values of the dependent variable in one or both directions: in this case, a floor of ´0.5 ˝C was set (VGAM R package) [59].We compared these results to Ordinary Least Squares (OLS) models that we truncated a posteriori, so any predicted water temperature values below ´0.5 ˝C were truncated to ´0.5 ˝C.The tobit models may result in less prediction bias if the censored results are meaningful.However, because the tobit models include a latent variable which can complicate interpretation, the OLS approach would be preferred if censoring bias is not present.
Both spatial and temporal factors were tested during model development.Models were built at three spatial scales: individual models for each logger site ("site-specific models"); four models for each year built on aggregate datasets for each of the four 6th-field Hydrologic Units [60] in the basin ("HUC" models, for "Hydrologic Unit Code"); and a basin-wide model using all available logger data within the JDR ("global model").We used the site-specific models to evaluate the utility of LST as a model covariate to estimate stream temperature.The HUC and global models were developed as broad spatial scale models that might be more attractive products to conservation managers, but may have lower predictive accuracy.Models were developed at three temporal scales since the relationship between model predictor variables and LST may vary annually and seasonally.We developed a global model using all data across all years, and a global model including year and seasonal terms.Initial modeling efforts indicated that the model parameter estimates differed between the first and second halves of the year, so semi-annual models should be considered.So, we developed models at each spatial scale for each year, splitting data into two sets based on the date of the maximum DMWT and developed separate models for each half of the year.
We predicted daily mean stream temperature for every reach in the JDR for the 10 year period and then compared estimates to observed stream temperature logger data for validation.We assessed model quality and predictive power using two data-subsetting techniques to parse out data used for validation: (1) comparing Predicted Residual Sums of Squares (PRESS) produced from a leave-one-out (LOO) cross-validation on the daily data and (2) goodness of fit metrics from a leave-one-out jackknife by site.In each jackknife iteration (n = 100), data were sampled without replacement, such that data from N-1 sites were used to parameterize the models.The parameterized models in each iteration were used to estimate daily mean stream temperature across the basin, and data from the excluded site ("validation data") was used to test predictive accuracy.Predicted Root Mean Squared Errors (RMSEPs) were calculated for the validation data and used as model quality metrics.Root Mean Squared Errors (RMSEs) were also calculated for models at all spatial scales to assess goodness-of-fit and predictive power.Final model structure selection within years was made using Akaike information criterion (AIC) [61].
Eight-day models were developed as fixed-effects linear models for each basin with no spatial partitioning using aggregated logger data for that basin.Similar to the daily models, the data were split into first and second halves of the year.LST, LST 2 , Julian day, and elevation were included as candidate variables.Potential model structures that were considered included: Final model structure for each half of the year in each basin was determined using PRESS statistic and AIC.Area under the curve (AUC) was calculated for some sites for estimated annual time series to provide a metric to compare inter-annual variation at a site.

Sensitivity Analysis
Once final model structure was determined, the 8-day data were subjected to a jackknife approach to generate test statistics of model quality with decreasing sample size.In each jackknife iteration (n = 100), data were sampled without replacement, such that data from i sites were used to parameterize the models.The data were also bootstrapped (n = 100) to generate standard errors of the parameter estimates.The parameterized models in each iteration were used to estimate 8-day stream temperature across the basin, and data from N-i excluded sites ("validation data") were used to test predictive accuracy.RMSEPs and p 2 s (r 2 analogue from the validation data) were calculated for the validation data and used as model quality metrics.This process was repeated for each basin for each i of 2 . . .N-(N/2) where "N" is the total number of sites for each half year within a basin.

Estimating Missing Land Surface Temperature Data
Linear interpolation and AMSR-E gap filling methods both produced adequate proxies for daily LST data as RMSE and PRESS statistics were similar yet low, though the linear interpolation performed marginally better.The AMSR-E approach also still left some gaps in the data that required filling (Figure 3).Although the AMSR-E alternative data source approach was successful, the data processing, computational time, and management overhead necessary were intensive and much greater than the simple linear approach.We therefore concluded a simple temporal linear interpolation was most effective for filling in cloudy-day gaps in the LST for use in our models.

Predictor Variable Selection and Temporal Lag Analyses
We reviewed the residuals from linear regressions of potential predictor variables and stream temperatures to determine the most informative and least correlated physiographic variables to include in the LST model.Gradient, catchment area, stream order, an estimate of mean annual flow, and elevation were all considered.Though all the physiographic variables were significantly correlated with stream temperature and LST, they were also significantly correlated with each other.There was a slight though significant negative relationship between the residuals from a linear model and elevation, so elevation was included as the representative physiographic candidate variable.The final models used LST, Julian day, and elevation as predictor variables and water temperature as the response variable.
A cross-correlation analysis on prewhitened and differenced annual time series of DMWT and LST showed a zero lag with the highest cross-correlation function coefficient for all included sites indicating that LST and water temperature data were aligned temporally (Figure 4).Therefore, we used a direct temporal alignment for model building.
Water 2015, 7, page-page 9 iteration (n = 100), data were sampled without replacement, such that data from i sites were used to parameterize the models.The data were also bootstrapped (n = 100) to generate standard errors of the parameter estimates.The parameterized models in each iteration were used to estimate 8-day stream temperature across the basin, and data from N-i excluded sites ("validation data") were used to test predictive accuracy.RMSEPs and p 2 s (r 2 analogue from the validation data) were calculated for the validation data and used as model quality metrics.This process was repeated for each basin for each i of 2…N-(N/2) where "N" is the total number of sites for each half year within a basin.

Estimating Missing Land Surface Temperature Data
Linear interpolation and AMSR-E gap filling methods both produced adequate proxies for daily LST data as RMSE and PRESS statistics were similar yet low, though the linear interpolation performed marginally better.The AMSR-E approach also still left some gaps in the data that required filling (Figure 3).Although the AMSR-E alternative data source approach was successful, the data processing, computational time, and management overhead necessary were intensive and much greater than the simple linear approach.We therefore concluded a simple temporal linear interpolation was most effective for filling in cloudy-day gaps in the LST for use in our models.

Predictor Variable Selection and Temporal Lag Analyses
We reviewed the residuals from linear regressions of potential predictor variables and stream temperatures to determine the most informative and least correlated physiographic variables to include in the LST model.Gradient, catchment area, stream order, an estimate of mean annual flow, and elevation were all considered.Though all the physiographic variables were significantly correlated with stream temperature and LST, they were also significantly correlated with each other.There was a slight though significant negative relationship between the residuals from a linear model and elevation, so elevation was included as the representative physiographic candidate variable.The final models used LST, Julian day, and elevation as predictor variables and water temperature as the response variable.
A cross-correlation analysis on prewhitened and differenced annual time series of DMWT and LST showed a zero lag with the highest cross-correlation function coefficient for all included sites indicating that LST and water temperature data were aligned temporally (Figure 4).Therefore, we used a direct temporal alignment for model building.

Daily Model Performance Across Temporal Scales in the JDR
There is a strong correlative relationship between DMWT and LST across all sites and years (Figure 5).

Daily Model Performance Across Temporal Scales in the JDR
There is a strong correlative relationship between DMWT and LST across all sites and years (Figure 5).There was no significant difference in global models including all data across years from a tobit or OLS model (Table 2).This is likely due to the fact that the censored aspect of the water data is not a sampling error, but actually representative of the true population of water temperatures, so there is no bias for which to correct.Truncating the DMWT predictions from the OLS models a posteriori allowed for direct interpretation of the model coefficients without biasing the results.All models generated for spatial scale comparisons were built using simple linear regressions.
Table 2. RMSEs (in °C) and r 2 from OLS and Tobit models built using all data for the JDR across years.Models using LST only and LST + year and seasonal terms are included.

Model
Fixed The basin-wide model including all variables across all sites and all years (2000-2009) had reasonable goodness-of-fit metrics (r 2 = 0.77, RMSE 2.85) (Table 2).Global models across all sites and all years including a year term, a "season" term (i.e., first and second halves of the year split by the highest DMWT in the year), and both a year and season term, performed no better than global models with no temporal terms.However, separate individual global models for each season's data performed significantly better ("Spring" r 2 = 0.82, RMSE = 2.63; "Fall" r 2 = 0.82, RMSE = 2.33) (Table 3).Lumping the seasonal data together negates the utility of including Julian day in the models, as Julian day is positively correlated with temperature (both water and land surface) in the first half of the year, and negatively correlated in the second.Model quality improved even more when models were built on each year's data using the seasonal split (mean across years r 2 = 0.83, RMSE = 2.24).All subsequent modeling included models by year and the seasonal split in the data.There was no significant difference in global models including all data across years from a tobit or OLS model (Table 2).This is likely due to the fact that the censored aspect of the water data is not a sampling error, but actually representative of the true population of water temperatures, so there is no bias for which to correct.Truncating the DMWT predictions from the OLS models a posteriori allowed for direct interpretation of the model coefficients without biasing the results.All models generated for spatial scale comparisons were built using simple linear regressions.The basin-wide model including all variables across all sites and all years (2000-2009) had reasonable goodness-of-fit metrics (r 2 = 0.77, RMSE 2.85) (Table 2).Global models across all sites and all years including a year term, a "season" term (i.e., first and second halves of the year split by the highest DMWT in the year), and both a year and season term, performed no better than global models with no temporal terms.However, separate individual global models for each season's data performed significantly better ("Spring" r 2 = 0.82, RMSE = 2.63; "Fall" r 2 = 0.82, RMSE = 2.33) (Table 3).Lumping the seasonal data together negates the utility of including Julian day in the models, as Julian day is positively correlated with temperature (both water and land surface) in the first half of the year, and negatively correlated in the second.Model quality improved even more when models were built on each year's data using the seasonal split (mean across years r 2 = 0.83, RMSE = 2.24).All subsequent modeling included models by year and the seasonal split in the data.Table 3. Model comparison of global models built including all sites across all years, those including year and season terms, and those split into two datasets based on the highest DMWT across each year ("Spring" and "Fall")."Year" models are the mean metrics for models built on each year's data (including a seasonal split).Best fit model structure as determined using AIC for each year and across spatial partitions most often took the form of DMWT ~LST + LST 2 + Julian Day + Elevation.Elevation was not significant in the global models in 2003.As elevation does not vary at a site, site-specific models took the form DMWT ~LST + LST 2 + Julian Day.Site-specific models had a mean RMSE across all sites and all years of 1.28 ˝C (sd: 0.11) (Figure 6).

Model
The p 2 across all sites and all years for the leave-one-out validation data for the site-specific models had a mean of 0.86.The site-specific models showed almost no bias in prediction, while the global and HUC models showed slight bias toward under-prediction at high temperatures (Figure 7).This bias was also evident when the validation data were compared to estimated values.
Water 2015, 7, page-page 11 Table 3. Model comparison of global models built including all sites across all years, those including year and season terms, and those split into two datasets based on the highest DMWT across each year ("Spring" and "Fall")."Year" models are the mean metrics for models built on each year's data (including a seasonal split).

Annual, Daily Model Performance across Spatial Scales in the John Day River Basin
Best fit model structure as determined using AIC for each year and across spatial partitions most often took the form of DMWT ~ LST + LST 2 + Julian Day + Elevation.Elevation was not significant in the global models in 2003.As elevation does not vary at a site, site-specific models took the form DMWT ~ LST + LST 2 + Julian Day.Site-specific models had a mean RMSE across all sites and all years of 1.28 °C (sd: 0.11) (Figure 6).
The p 2 across all sites and all years for the leave-one-out validation data for the site-specific models had a mean of 0.86.The site-specific models showed almost no bias in prediction, while the global and HUC models showed slight bias toward under-prediction at high temperatures (Figure 7).This bias was also evident when the validation data were compared to estimated values.HUC model quality varied from HUC to HUC and year to year, as did the number of sites and days of data in each year.HUC and global models performed similarly across years.HUC models had a mean RMSE across all HUCs and all years of 2.3 °C (sd: 0.45).Global models had a mean RMSE across all years of 2.5 °C (standard deviation 0.26).HUC models performed slightly better than global models in years with sufficient data in each HUC, but global models performed better than HUC models in years with little data in one or more HUC.Of course, global models were necessary when there was no data in a HUC in a year.
An example of ten years of temperature estimates for one site is shown in Figure 8.The site is in the North Fork John Day River above the confluence with Camas Creek (elevation 826 m).HUC model quality varied from HUC to HUC and year to year, as did the number of sites and days of data in each year.HUC and global models performed similarly across years.HUC models had a mean RMSE across all HUCs and all years of 2.3 ˝C (sd: 0.45).Global models had a mean RMSE across all years of 2.5 ˝C (standard deviation 0.26).HUC models performed slightly better than global models in years with sufficient data in each HUC, but global models performed better than HUC models in years with little data in one or more HUC.Of course, global models were necessary when there was no data in a HUC in a year.
An example of ten years of temperature estimates for one site is shown in Figure 8.The site is in the North Fork John Day River above the confluence with Camas Creek (elevation 826 m).Temperature logger data at this site were available for about half the years included in this analysis, Water 2015, 7, 6827-6846 but estimates were generated for the complete time series.There is considerable variation between years in both the maximum temperature and the shape of each annual curve.Though years 2003, 2006, and 2007 have very similar annual maximum daily means (AMDM) for the year (24.3, 24.0, 24.4 ˝C, respectively), the AUCs for each year's temperature profile differ: 2919, 2849, and 3024 ˝C, respectively (Table 4).Though 2007 was cumulatively warmer across the entire year, 2003 was both warmer sooner in the first half of the year and cooler sooner in the second half of the year than both 2006 and 2007.Similarly, though 2004 had a lower AMDM and lower annual AUC at this site than either 2006 or 2007, the AUC for 1 January-15 April is higher for 2004 (Table 4).
Water 2015, 7, page-page 12 Temperature logger data at this site were available for about half the years included in this analysis, but estimates were generated for the complete time series.There is considerable variation between years in both the maximum temperature and the shape of each annual curve.Though years 2003, 2006, and 2007 have very similar annual maximum daily means (AMDM) for the year (24.3, 24.0, 24.4 °C, respectively), the AUCs for each year's temperature profile differ: 2919, 2849, and 3024 °C, respectively (Table 4).Though 2007 was cumulatively warmer across the entire year, 2003 was both warmer sooner in the first half of the year and cooler sooner in the second half of the year than both 2006 and 2007.Similarly, though 2004 had a lower AMDM and lower annual AUC at this site than either 2006 or 2007, the AUC for 1 January-15 April is higher for 2004 (Table 4).The validation data plot is the jackknife-by-site, leave-one-out validation data for all years vs. the predicted temperatures for those sites from the global models for each year.The solid black line is the simple linear regression between the predicted and observed temperatures (with r 2 for that relationship).The dashed line is the one-to-one relationship.
Water 2015, 7, page-page 12 Temperature logger data at this site were available for about half the years included in this analysis, but estimates were generated for the complete time series.There is considerable variation between years in both the maximum temperature and the shape of each annual curve.Though years 2003, 2006, and 2007 have very similar annual maximum daily means (AMDM) for the year (24.3, 24.0, 24.4 °C, respectively), the AUCs for each year's temperature profile differ: 2919, 2849, and 3024 °C, respectively (Table 4).Though 2007 was cumulatively warmer across the entire year, 2003 was both warmer sooner in the first half of the year and cooler sooner in the second half of the year than both 2006 and 2007.Similarly, though 2004 had a lower AMDM and lower annual AUC at this site than either 2006 or 2007, the AUC for 1 January-15 April is higher for 2004 (Table 4).4. Annual summary metrics for a site located in the North Fork John Day above the confluence with Camas Creek.Annual maximum daily mean (AMDM), total Area Under the Curve (AUC) in ˝C, the AUC for 1 January-15 April, January-June, and July-December are given.

Eight-Day Model Performance and Sensitivity Analysis across Basins
The number of sites per season per year varied across basins, ranging from 0 in the spring in the Lemhi to 52 in the fall in the Upper Grande Ronde.Final model structure for the eight-day models for all basins took the same form as the daily models: 8DMWT ~LST + LST 2 + Julian Day + Elevation.Eight-day model quality metrics varied from basin to basin, with mean model metrics across all basins of: r 2 = 0.91 (sd: 0.02), RMSE 1.29 ˝C (sd: 0.45) (Figure 9).Root mean squared prediction errors ranged from 1.2 ˝C in the Wenatchee to 2.74 ˝C in the Upper Grande Ronde (Table 5).
Water 2015, 7, page-page 13 Table 4. Annual summary metrics for a site located in the North Fork John Day above the confluence with Camas Creek.Annual maximum daily mean (AMDM), total Area Under the Curve (AUC) in °C, the AUC for 1 January-15 April, January-June, and July-December are given.

Eight-Day Model Performance and Sensitivity Analysis across Basins
The number of sites per season per year varied across basins, ranging from 0 in the spring in the Lemhi to 52 in the fall in the Upper Grande Ronde.Final model structure for the eight-day models for all basins took the same form as the daily models: 8DMWT ~ LST + LST 2 + Julian Day + Elevation.Eight-day model quality metrics varied from basin to basin, with mean model metrics across all basins of: r 2 = 0.91 (sd: 0.02), RMSE 1.29 °C (sd: 0.45) (Figure 9).Root mean squared prediction errors ranged from 1.2 °C in the Wenatchee to 2.74 °C in the Upper Grande Ronde (Table 5).Results of the sensitivity analysis were remarkably consistent across basins.Model quality metrics for models parameterized using only two sites were not good for any basin, but improved significantly with increasing sample size, stabilizing at approximately N = 4 (Figure 10).Results were similar for either spring or fall models within a basin.

Basin
Water 2015, 7, page-page 14 Results of the sensitivity analysis were remarkably consistent across basins.Model quality metrics for models parameterized using only two sites were not good for any basin, but improved significantly with increasing sample size, stabilizing at approximately N = 4 (Figure 10).Results were similar for either spring or fall models within a basin.

Discussion
We have demonstrated that remotely-sensed LST can be used in linear regression models to generate robust, spatially and temporally continuous estimates of stream temperature.The strongest relationship was between DMWT and LST, though the inclusion of Julian day did improve predictive power.Elevation was not included in the site-specific models, and it was not always significant in the HUC and global models.This approach does not attempt to explicitly model spatial variation or network structure.Instead, we leverage the spatial structure inherent in the LST datasets to facilitate the development of estimates over large geographic regions and across years.We found that other commonly used spatial variables often related to stream temperature did not increase model quality when included along with LST.The spatial structure inherent in the LST seems to captures that

Discussion
We have demonstrated that remotely-sensed LST can be used in linear regression models to generate robust, spatially and temporally continuous estimates of stream temperature.The strongest relationship was between DMWT and LST, though the inclusion of Julian day did improve predictive power.Elevation was not included in the site-specific models, and it was not always significant in the HUC and global models.This approach does not attempt to explicitly model spatial variation or network structure.Instead, we leverage the spatial structure inherent in the LST datasets to facilitate the development of estimates over large geographic regions and across years.We found that other commonly used spatial variables often related to stream temperature did not increase model quality when included along with LST.The spatial structure inherent in the LST seems to captures that spatial variation at the RCA scale; changes in the thermal profile on the landscape mirror changes in the thermal profile of the water.This is not to say that only near-stream local factors influence local temperatures, only that the patterns are co-variable.
Model performance of the daily JDR models was relatively consistent across spatial partitions.Model quality did improve when the data were partitioned into annual and seasonal subsets.This could be the result of a number of factors related to climate variation, or inter-annual variation in the data.For example, the number and location of sites with data available in each year varied considerably.A consistent dataset from the same sites across years could be used to tease this out.This inter-annual variation could also be indicative of the broad climate difference between years (e.g., cold and dry, warm and wet) and bears further investigation.If broad climate categories can be identified, models could potentially be generalized for future predictions based on expected changes in climate.
The final site-specific, HUC, and global models generally performed well regardless of site or year, except in a few notable cases.HUC-level models for the Middle Fork John Day did a poor job of estimating the summer high, fall cool-off, or winter lows in some areas, though the post-snowmelt spring increase is well represented (an example is given in Figure 11).The John Day River is almost entirely rain and snowmelt fed, with few known springs in the basin.However, sites located at Phipps Meadow on the Middle Fork indicate a nearby spring influencing in-stream temperature in that segment.In every year the winter daily mean temperature hovers around 10 ˝C, dropping to as low as 2 ˝C for a few weeks during snowmelt, then rising to a summer high slightly cooler than other sites in the area, and dropping throughout the fall back to winter lows (Figure 11).
HUC and global models did not estimate this annual profile; however, site-specific models did produce good temperature estimates (Figure 12).
Water 2015, 7, page-page 15 spatial variation at the RCA scale; changes in the thermal profile on the landscape mirror changes in the thermal profile of the water.This is not to say that only near-stream local factors influence local temperatures, only that the patterns are co-variable.
Model performance of the daily JDR models was relatively consistent across spatial partitions.Model quality did improve when the data were partitioned into annual and seasonal subsets.This could be the result of a number of factors related to climate variation, or inter-annual variation in the data.For example, the number and location of sites with data available in each year varied considerably.A consistent dataset from the same sites across years could be used to tease this out.This inter-annual variation could also be indicative of the broad climate difference between years (e.g., cold and dry, warm and wet) and bears further investigation.If broad climate categories can be identified, models could potentially be generalized for future predictions based on expected changes in climate.
The final site-specific, HUC, and global models generally performed well regardless of site or year, except in a few notable cases.HUC-level models for the Middle Fork John Day did a poor job of estimating the summer high, fall cool-off, or winter lows in some areas, though the post-snowmelt spring increase is well represented (an example is given in Figure 11).The John Day River is almost entirely rain and snowmelt fed, with few known springs in the basin.However, sites located at Phipps Meadow on the Middle Fork indicate a nearby spring influencing in-stream temperature in that segment.In every year the winter daily mean temperature hovers around 10 °C, dropping to as low as 2 °C for a few weeks during snowmelt, then rising to a summer high slightly cooler than other sites in the area, and dropping throughout the fall back to winter lows (Figure 11).
HUC and global models did not estimate this annual profile; however, site-specific models did produce good temperature estimates (Figure 12).The JDR models built using data from typical sites can be applied generally in the basin, but this suggests that LST modeling methods could produce robust temperature estimates for other types of stream systems.In addition, systems with mixed inputs would require more spatial partitioning for optimal model results.If some subset of reaches are known to be spring fed, data from spring-fed sites can be used to parameterize models used to estimate temperature for those reaches selectively.
Our approach generates temporally robust, continuous estimates of temperature, allowing inter-annual variation evaluation at single sites or across multiple sites or watersheds.These estimates also allow the development of summary metrics that characterize the in-stream thermal regime for each stream reach such as: seasonal metrics, growing season thermal inputs, counts of days above minima and maxima, and the timing of thermal milestones [62].Aquatic organisms experience the complete thermal regime in their habitat, not just the extreme seasonal events.Extreme metrics have their utility, but the cumulative effects of temperature on growth and development are also potentially important.Terrestrial biologists use Growing Degree Days (GDD) to anticipate insect The JDR models built using data from typical sites can be applied generally in the basin, but this suggests that LST modeling methods could produce robust temperature estimates for other types of stream systems.In addition, systems with mixed inputs would require more spatial partitioning for optimal model results.If some subset of reaches are known to be spring fed, data from spring-fed sites can be used to parameterize models used to estimate temperature for those reaches selectively.
Our approach generates temporally robust, continuous estimates of temperature, allowing inter-annual variation evaluation at single sites or across multiple sites or watersheds.These estimates also allow the development of summary metrics that characterize the in-stream thermal regime for each stream reach such as: seasonal metrics, growing season thermal inputs, counts of Water 2015, 7, 6827-6846 days above minima and maxima, and the timing of thermal milestones [62].Aquatic organisms experience the complete thermal regime in their habitat, not just the extreme seasonal events.Extreme metrics have their utility, but the cumulative effects of temperature on growth and development are also potentially important.Terrestrial biologists use Growing Degree Days (GDD) to anticipate insect emergence or bud break [63,64] and continuous stream temperature estimates could be used to develop similar aquatic-based metrics to estimate the timing of significant events for in-stream biota.For example, daily temperature estimates for a single site (above Camas Creek in the North Fork John Day River) are shown in Figure 8.Not only do annual maxima vary, but there are different rates of warming and cooling across years.For example, the earlier spring warm-up suggests temperature-dependent events (like invertebrate emergence) would occur earlier in 2004 even though it was cooler than 2006 and 2007 in terms of summer maxima.
Water 2015, 7, page-page 16 emergence or bud break [63,64] and continuous stream temperature estimates could be used to develop similar aquatic-based metrics to estimate the timing of significant events for in-stream biota.For example, daily temperature estimates for a single site (above Camas Creek in the North Fork John Day River) are shown in Figure 8.Not only do annual maxima vary, but there are different rates of warming and cooling across years.For example, the earlier spring warm-up suggests temperature-dependent events (like invertebrate emergence) would occur earlier in 2004 even though it was cooler than 2006 and 2007 in terms of summer maxima.Many aquatic managers and biologists work with weekly summaries of stream temperature.Estimates from eight-day models in this analysis were more accurate than daily model estimates.For example, r 2 and p 2 for eight-day models for 2012 in the JDR were >0.9 and RMSEs were below 1.8 °C (Table 5).Part of the improvement in model quality may be due to having longer time series in 2012 for the eight-day models, but likely also because the eight-day models are averaging across the daily fluctuations in both water and land surface temperature.These fluctuations may just be noise and thus uninformative over long time series, where the eight-day means better represent the actual trend through time.Model quality across basins varied, with the highest error in the UGR models (mRMSE = 1.7 °C) and the lowest in the Wenatchee (mRMSE = 0.86 °C).This variation in model quality is likely due to factors that vary across the basin but are uncoupled from surface temperature dynamics.For example, there may be groundwater influx in parts of the UGR but not others.
The results of the sensitivity analysis were surprising.Models parameterize with data from only two sites were not good, but model quality quickly improved with the inclusion of 3-4 sites and then stabilized at full dataset levels.There was still variability in model quality across basins, but the models for a basin got as good as they were going to get when including data for 4-5 sites, and did not improve with the inclusion of more sites (Figure 10).The same pattern was seen in models for both halves of the year ("spring" and "fall" models for the JDR are shown in Figure 10).These results indicates that sampling effort in terms of logger placement can shift from putting as many temperature loggers as feasible in a basin to having fewer loggers in place for longer time series, and then focusing logger placement in specific streams that may not be well-characterized by the basin-wide model (like streams fed by springs, groundwater influx, etc.).
We did not examine temporal model sensitivity in these analyses.It would be informative to know how many (or few) observations in a time series are necessary for robust estimates, and also if when loggers are in place effects annual model quality.That is, there is a strong bias toward short (1-3 month) summer time series in extant logger datasets, perhaps these models would be better served by having shorter time series but in the spring and fall instead, or as well.These questions Many aquatic managers and biologists work with weekly summaries of stream temperature.Estimates from eight-day models in this analysis were more accurate than daily model estimates.For example, r 2 and p 2 for eight-day models for 2012 in the JDR were >0.9 and RMSEs were below 1.8 ˝C (Table 5).Part of the improvement in model quality may be due to having longer time series in 2012 for the eight-day models, but likely also because the eight-day models are averaging across the daily fluctuations in both water and land surface temperature.These fluctuations may just be noise and thus uninformative over long time series, where the eight-day means better represent the actual trend through time.Model quality across basins varied, with the highest error in the UGR models (mRMSE = 1.7 ˝C) and the lowest in the Wenatchee (mRMSE = 0.86 ˝C).This variation in model quality is likely due to factors that vary across the basin but are uncoupled from surface temperature dynamics.For example, there may be groundwater influx in parts of the UGR but not others.
The results of the sensitivity analysis were surprising.Models parameterize with data from only two sites were not good, but model quality quickly improved with the inclusion of 3-4 sites and then stabilized at full dataset levels.There was still variability in model quality across basins, but the models for a basin got as good as they were going to get when including data for 4-5 sites, and did not improve with the inclusion of more sites (Figure 10).The same pattern was seen in models for both halves of the year ("spring" and "fall" models for the JDR are shown in Figure 10).These results indicates that sampling effort in terms of logger placement can shift from putting as many temperature loggers as feasible in a basin to having fewer loggers in place for longer time series, and then focusing logger placement in specific streams that may not be well-characterized by the basin-wide model (like streams fed by springs, groundwater influx, etc.).
We did not examine temporal model sensitivity in these analyses.It would be informative to know how many (or few) observations in a time series are necessary for robust estimates, and also if when loggers are in place effects annual model quality.That is, there is a strong bias toward short (1-3 month) summer time series in extant logger datasets, perhaps these models would be better served by having shorter time series but in the spring and fall instead, or as well.These questions remain open to future investigations.There is a trade-off with this modeling approach between resolution (both spatial and temporal) and generality.The 1 km 2 spatial resolution may be too coarse for some applications, such as locating thermal refugia, for example.However, the ability to generate reasonable estimates of daily or weekly stream temperature across entire basins or regions should prove useful in many research and management applications.

Figure 2 .
Figure 2. John Day River basin.Green dots represent the location of all temperature logger sites across all years.The yellow star is the site upstream of Camas Creek.

Figure 2 .
Figure 2. John Day River basin.Green dots represent the location of all temperature logger sites across all years.The yellow star is the site upstream of Camas Creek.

Figure 3 .
Figure 3. Example of LST gap-filling for one site in the JDR in 2003.Blue highlight is the original unfilled LST data: pink lines and points are gap-filled using the AMSR-E method and the black line is gap-filling using a simple linear interpolation.

Figure 3 .
Figure 3. Example of LST gap-filling for one site in the JDR in 2003.Blue highlight is the original unfilled LST data: pink lines and points are gap-filled using the AMSR-E method and the black line is gap-filling using a simple linear interpolation.

Figure 4 .
Figure 4. Cross-correlation function for pre-whitened and differenced time series of DMWT and LST for one site in the JDR showing the highest correlation at lag 0. Blue dotted line indicates significance threshold of p < 0.05.

Figure 4 .
Figure 4. Cross-correlation function for pre-whitened and differenced time series of DMWT and LST for one site in the JDR showing the highest correlation at lag 0. Blue dotted line indicates significance threshold of p < 0.05.

Figure 5 .
Figure 5. Relationship between Daily Mean Water Temperature (°C) and Land Surface Temperature (K) for all sites, 2000-2009 in the John Day River basin.The black line and r 2 are from a simple linear regression (DMWT ~ LST).

Figure 5 .
Figure 5. Relationship between Daily Mean Water Temperature ( ˝C) and Land Surface Temperature (K) for all sites, 2000-2009 in the John Day River basin.The black line and r 2 are from a simple linear regression (DMWT ~LST).

Figure 6 .
Figure 6.Root Mean Squared Errors from site-specific models for all years.Green dots are median values for the year; blue boxes are the first-third quartiles and the whiskers are the min and max values.

Figure 6 .
Figure 6.Root Mean Squared Errors from site-specific models for all years.Green dots are median values for the year; blue boxes are the first-third quartiles and the whiskers are the min and max values.

Figure 7 .
Figure 7. Stream temperature predicted from the global, site-specific, and HUC models for 2000-2009 vs. the observed values.The validation data plot is the jackknife-by-site, leave-one-out validation data for all years vs. the predicted temperatures for those sites from the global models for each year.The solid black line is the simple linear regression between the predicted and observed temperatures (with r 2 for that relationship).The dashed line is the one-to-one relationship.

Figure 8 .
Figure 8. Example of observed and predicted daily mean stream temperature from one site across 10 years (2000-2009).The site is located on the North Fork John Day River above the confluence with Camas Creek.Blue dots are observed daily mean stream temperature and the gray line is estimated daily stream temperature from site-specific models in years where data are available, and the black line is from HUC-based models.

Figure 7 .
Figure 7. Stream temperature predicted from the global, site-specific, and HUC models for 2000-2009 vs. the observed values.The validation data plot is the jackknife-by-site, leave-one-out validation data for all years vs. the predicted temperatures for those sites from the global models for each year.The solid black line is the simple linear regression between the predicted and observed temperatures (with r 2 for that relationship).The dashed line is the one-to-one relationship.

Figure 7 .
Figure 7. Stream temperature predicted from the global, site-specific, and HUC models for 2000-2009 vs. the observed values.The validation data plot is the jackknife-by-site, leave-one-out validation data for all years vs. the predicted temperatures for those sites from the global models for each year.The solid black line is the simple linear regression between the predicted and observed temperatures (with r 2 for that relationship).The dashed line is the one-to-one relationship.

Figure 8 .
Figure 8. Example of observed and predicted daily mean stream temperature from one site across 10 years (2000-2009).The site is located on the North Fork John Day River above the confluence with Camas Creek.Blue dots are observed daily mean stream temperature and the gray line is estimated daily stream temperature from site-specific models in years where data are available, and the black line is from HUC-based models.

Figure 8 .
Figure 8. Example of observed and predicted daily mean stream temperature from one site across 10 years (2000-2009).The site is located on the North Fork John Day River above the confluence with Camas Creek.Blue dots are observed daily mean stream temperature and the gray line is estimated daily stream temperature from site-specific models in years where data are available, and the black line is from HUC-based models.

Figure 9 .Table 5 .
Figure 9. Eight-day mean stream temperature predicted vs. observed for the leave-one-out validation data for all basins.The solid black line is the simple linear regression between the predicted and observed temperatures (with r 2 for that relationship).The dashed line is the one-to-one relationship.RMSE = 1.29 °C.

Figure 9 .
Figure 9. Eight-day mean stream temperature predicted vs. observed for the leave-one-out validation data for all basins.The solid black line is the simple linear regression between the predicted and observed temperatures (with r 2 for that relationship).The dashed line is the one-to-one relationship.RMSE = 1.29 ˝C.

Figure 10 .
Figure 10.Sensitivity analysis mean root mean squared errors (and standard deviation of those means) from the N-i site jackknife for all basins.Mean Root Mean Squared Errors on the Y-axis, and the number of included sites on the X-axis for all graphs."Spring" indicates data from the first half of the year (and "Fall" the second half) as determined by the peak mean temperature across the year.(a) Tucannon River basin; (b) Upper Grande Ronde River basin; (c) Wenatchee River basin; (d) Lemhi River basin;(e) & (f) John Day River basin.

Figure 10 .
Figure 10.Sensitivity analysis mean root mean squared errors (and standard deviation of those means) from the N-i site jackknife for all basins.Mean Root Mean Squared Errors on the Y-axis, and the number of included sites on the X-axis for all graphs."Spring" indicates data from the first half of the year (and "Fall" the second half) as determined by the peak mean temperature across the year.(a) Tucannon River basin; (b) Upper Grande Ronde River basin; (c) Wenatchee River basin; (d) Lemhi River basin; (e,f) John Day River basin.

Figure 11 .
Figure 11.Observed (blue dots) and estimated (black lines) daily mean stream temperature from a site in the Middle Fork John Day River near Phipps Meadow.Estimated stream temperature is from models built using all the data for the HUC 4 encompassing the Middle Fork.

Figure 11 .
Figure 11.Observed (blue dots) and estimated (black lines) daily mean stream temperature from a site in the Middle Fork John Day River near Phipps Meadow.Estimated stream temperature is from models built using all the data for the HUC 4 encompassing the Middle Fork.

Figure 12 .
Figure 12.Observed (blue dots) and estimated (black lines) daily mean stream temperature from a site in the Middle Fork John Day River near Phipps Meadow.Estimated stream temperature is from models built using site-specific data.

Figure 12 .
Figure 12.Observed (blue dots) and estimated (black lines) daily mean stream temperature from a site in the Middle Fork John Day River near Phipps Meadow.Estimated stream temperature is from models built using site-specific data.

Table 1 .
Number of temperature logger sites and days of data per year for the John Day River daily datasets included in these analyses.

Table 2 .
RMSEs (in ˝C) and r 2 from OLS and Tobit models built using all data for the JDR across years.Models using LST only and LST + year and seasonal terms are included.

Table 5 .
Model metrics for eight-day models for all basins.RMSEP and p 2 are from leave-one-out cross validation.JDR = John Day River, LRB = Lemhi River, TRB = Tucannon River, UGR = Upper Grande Ronde, WRB = Wenatchee River.