A Call for More Snow Sampling

: The snowpack is important for water resources, tourism, ecology, and the global energy budget. Over the past century, we have gone from point measurements of snow water equivalent (SWE) to estimate spring and summer runoff volumes, to remote sensing of various snowpack properties at continuously ﬁner spatial and temporal resolutions, to various complexities of snowpack and hydrological modeling, to the current fusion of ﬁeld data with remote sensing and modeling, all to improve our estimates of the snowpack and the subsequent runoff. However, we are still limited by the uncertainty induced by scaling from point ﬁeld measurements to the area represented by remote sensing and modeling. This paper uses several examples of ﬁne-resolution sampling to issue a call to snow hydrologists and other earth scientists to collect more data, or at least to thoroughly evaluate their sampling strategy for collecting ground-truth measurements. Recommendations are provided for different approaches to have more representative sampling, when at all possible, to collect at least a few more samples or data points.


Introduction-Estimating the Amount of Snow in the Pack
At the turn of the 20th century, James E. Church began measuring snow water equivalent (SWE) in the mountains above Reno Nevada in an attempt to estimate the runoff volumes in the Truckee River [1]. We continue to sample snowpack properties to forecast water resources in various parts of the world [2][3][4]. We also use these snow data to understand hydrological processes [5], climate dynamics [6,7], and the impacts that snow has on ecosystems [8,9]. Snow has a huge financial impact on the global economy, estimated in the order of trillions of dollars annually [10]. Thus, especially in the context of climate variability and change, there is an urgent need for accurate estimates of snow distribution, especially for SWE [11].
Various review papers provide in-depth summaries of snow research, including snow research in general [12] and representing SWE in hydrologic models [13]. Dozier et al. [11] present different methods to estimate SWE across a mountain range: spatial interpolation for station observations, remote sensing, numerical models, and combinations thereof. This combination is presented in a schematic by Sturm [12] and is the recommended way forward to understand what is actually occurring with the snowpack, especially at different scales [14]. We typically use field measurements to provide ground truth for remotely sensed estimates and to evaluate models. Data collected on the ground often represent a point, such as individual field measurements or station monitoring. Conversely, the resolution within large extent hydrological models, such as those covering 1000 or more km 2 (e.g., Precipitation Runoff Modeling System or PRMS [15,16], Variable Infiltration Capacity or VIC [17,18]) or for daily resolution remote sensing (e.g., Moderate Resolution Imaging Spectroradiometer or MODIS optical imagery [19,20], passive microwave imagery [21,22]) is from 0.5 to 25 km. Yet, the question: "How good or representative are the ground-truth measurements?" always arises. The purpose of this paper is to issue a call The simplest way to compare different techniques is sampling the same location, such as remote sensing versus field measurements, or adjacent to one another for field measurements, assuming limited spatial variability [24]. Similarly, comparing measurements using the same sampling method at various distances can assess scales of variability [14]. This further introduces the question of how representative a specific sample is for an area [25][26][27]. There are also errors associated with sampling [24,28] that may be considered random or systematic [29]. Random errors, such as natural variability, landscape elements, or variability in meteorological conditions, are not be easily identified while systematic errors are a bias, such as due to instrumentation [24] or the influence of anthropogenic elements on meteorology. Some of the impacts of each error can be evaluated [29], such as using a Monte Carlo approach [30], which has been used to assess the influence of unknown and instrumentation uncertainty [31].
Semi-variograms are a tool to evaluate the spatial structure of an earth system property [32]. In the simplest form, it is a plot of semi-variance among all samples or measurements over a specific range of distances versus this distance. An example in linear space is the variance between all samples that are up to 2 m apart, then the semi-variance between all samples from 2 to 4 m apart, etc. Important properties of a semi-variogram are the distance at which variance no longer increases, called the correlation length or scale break, and the magnitude of the semi-variance beyond the correlation length, called the sill. For snow data, the slope of the variogram in log-log space up to the correlation length is a measure of the fractal dimension [14,33,34].
The shape of the semi-variograms derived from concurrent lidar and snow depth measurements are compared in the first example. The second example of snow accumulation illustrates random snowfall variability over the meter scale and likely systematic error of the 10 s of meters, while the third example illustrates mostly systematic shade-induced snowmelt differences over the centimeter scale. The differences shown in the fourth and fifth examples are mostly random, while the variations in the sixth examples are likely systematic due to differences in the composition of the measurement equipment.

Airborne LiDAR Snow Depth
Light distance and ranging, or LiDAR, uses light in the short-wave portion of the spectrum to estimate distance by sending out a beam of light and measuring the travel time [35]. Different wavelengths are used (532, 1064, 1550 nm) based on the measurement distance, power availability, and other requirements [36]. LiDAR can be in the form of a single beam of light or as a waveform that provides multiple returns and thus information on canopy structure as well as snow depth [36]. LiDAR measurements can be made at millimeter to meter horizontal resolution with terrestrial units [36,37], at 1 to 3 m resolution for airborne systems [36,38], and at 10 s of meters for satellite systems, specifically ICESat [39] and ICESat2 [40]. The stated vertical resolution of the LiDAR is millimeters but is typically in the range of 10-20 cm [36].
Since Hopkinson et al. [35] used the difference between snow-on and snow-off scans to estimate snow depth from airborne LiDAR, LiDAR has become an effective tool to measure snow depth [36]. It was added for the last intensive observation period (IOP4) of the NASA Cold Land Process Experiment (CLPX) in Northern Colorado in the western U.S. [41], and that dataset has been studied extensively [34,42,43]. Various studies have compared LiDAR-based snow depth estimates to manual depth probe measurements (see [44] for a summary). The errors associated with snow depth estimates from LiDAR are a function of the slope of the terrain, look angle (i.e., horizontal scanning terrestrial or vertical scanning from airborne and satellite), the horizontal resolution, and the nature of the snow-off surface [36]. A common problem is the presence of shrubs below the snowpack, which does not allow for an adequate measurement of the ground surface [36].

Data Used to Compare LiDAR Versus Probe Snow Depth Measurements
The CLPX LiDAR data collected in early April 2003 were used to measure snow depth in the first example. These measurements were compared to manual snow depth measurements for six 1 km 2 intensive study areas (ISA) with > 300,000 LiDAR snow depth estimates [41] (see Figure 1a). Note that the data collected at three additional sites (ISAs) in North Park were not assessed due to the shallow snow depth (typically less than 50 cm) and known errors for shallow snowpack [25], and mixed LiDAR returns among shrub cover (e.g., Artemisia tridentata [45]). The spacing of the LiDAR data was about 1.5 m. The snow depth was estimated to the nearest centimeter [34], with a vertical accuracy of about 5 cm [41], depending on the slope [36].

Snow Depth Probe Measurements
Manual measurements of snow depth are easier and quicker than SWE measurements. A snow depth probe is essentially a ruler pointed at the end that is inserted vertically into the snowpack. These depth probes can be compact, such as for backcountry use, i.e., avalanche probes, or what is commonly used is an anodized aluminum probe with 1 cm increments [46]. Snow depth measurements for the first, fourth, and fifth datasets used the latter probe.

Fresh and Shallow Snow Depth, SWE and Density Measurements
Bulk SWE over the snowpack is manually measured by extracting a snow core and determining the mass or depth of water [1]. Many different samplers have been created to extract a core and measure the mass [24,47,48]. Almost any tube can be used for such sampling with proper care and acknowledging its limitations; this includes a stove pipe [49] or a plastic drainage pipe [50]. For sampling individual snowpack layers, a short metal tube sharpened at one end, often called a SIPRE (Snow, Ice, and Permafrost Research Establishment) tube [51], is often used, while for shallow fresh, snow (snow depth < 30 cm), a plastic SnowMetrics snow tube http://www.snowmetrics.com (accessed on 30 July 2021) can be used. Here, the latter was used to measure fresh snow. The 5.4 cm diameter SnowMetrics corer is rotated as it is inserted vertically into the snowpack. The depth is measured on the outside of the tube. Using a 10 cm wide scraper, the snow on one side of the snow tube is removed, and the scraper is inserted between the soil/ground surface and the tube. The tube is inverted and hung from a spring scale. The mass in millimeters of SWE is estimated from the spring scale. Snowpack density, compared to the density of water, is computed as the ratio of SWE to depth. The snow tube was used in examples two and three. If obvious layers can be identified in each of the snow cores extracted from the snow tube, the density of each of those layers can be estimated by isolating them from the remainder of the core. Snow surface roughness can be measured with a snow roughness board and converted into data using various methods [52,53]. If the snow roughness board is inserted through to flat ground, as was the case in example three, snow depth data can be estimated at sub-millimeter resolution. A photograph of the snow surface contrasting the black roughness board was converted into X-Y coordinates as per the method described in Fassnacht et al. [53]. In the third example, a 123 cm long snow roughness board was used to estimate snow depth at a~0.25 mm resolution SWE can be estimated by extracting and weighing any area of snow; early efforts to evaluate snow pillows used this technique by removing all the snow on top of a snow pillow [54]. In the third example, a second roughness board was inserted parallel to the first at a 30 cm offset. All the snow was excavated in 8 to 14.5 cm increments based on approximate uniform snow depth along the trench.

Local-Scale Accumulation and Melt Measurements
If one goes outside and measures the snow, that measurement may not be representative [25,27], depending on where we go and what the meteorological conditions are. Typically, measurements are made at meteorological stations that have specific citing requirements or in an assumed natural or undisturbed setting. However, this does not stop us from going outside of our homes and measuring the snow, even if it is while we are shoveling the sidewalk. As such, one simple experiment was performed to evaluate 1 to 10 m differences in accumulation versus the station measurements across town (a kilometer away). Another simple experiment evaluates differences in melt over the centimeter scale. While these two examples may not be realistic, they are meant to illustrate random and systematic errors.
Fresh snow accumulation was measured after a 20 to 30 cm snowfall event on 26 October 2020 in Fort Collins Colorado USA. Using the SnowMetrics tube on a grass surface, snow depth and SWE were measured, from which density was estimated, with samples in close proximity of one another; replicates were taken about 1 m apart and repeated about 10 m from one another on each of the north, east, south, and west side of the author's house. These were compared to measurements 2.4 km away at two National Weather Service (NWS) stations https://www.ncdc.noaa.gov (accessed on 30 July 2021). The two NWS stations were about 10 m from one another.
A different snowfall event on 16 April 2020 produced 20 to 30 cm of fresh snow depth in Fort Collins. The next day was sunny with a maximum incoming short-wave radiation of almost 1000 W/m 2 , measured at FTC http://ccc.atmos.colostate.edu/~autowx/ (accessed on 30 July 2021). Just after solar noon on the following day (18 April), the author noticed irregular melt patterns due to partial shading from an overhead obstruction. Using the SnowMetrics tube, the author extracted cores at 11 locations and dug a trench between two snow roughness boards to depth and SWE. Density was computed per core or increment along the trench. Obvious layers were identified in each of the snow cores extracted from the snow tube; within the core, as each layer was removed, the remaining depth was estimated and remaining mass or SWE was measured. The density was estimated for each layer from SWE divided by depth by subtracting the measurements after removal of the layer from the measurements prior to removal of the layer. A photograph of the full inserted snow roughness board provided depth estimates to the nearest 0.25 mm [53].  [41]) and manual snow depth probe measurements (data from [46]) across six 1 km 2 intensive study areas (ISA) in early April 2003 from the NASA Cold Land Process Experiment. The LiDAR-based depth estimates are from differentiating snow-on versus snow-off scans. The "all-LiDAR" dataset includes all depth estimates within each ISA (> 300,000 samples). The mean difference between the snow depth probe and LiDAR estimates are presented above the bars for all matching points (teal) and the first points only (orange in italics). (b) There are 680 depth probe measurements per ISA with 100 being the first depth probe point (see Elder et al. [46]).

Operational Measurement
The operational snow course data include depth, SWE, and density. Such measurements are made using a variety of samplers e.g., [24] in countries that have seasonal snowpacks. In the United States, these data are collected by the Natural Resources Conservation Service (NRCS) https://www.wcc.nrcs.usda.gov/ (accessed on 20 October 2021) with a Federal Sampler [1,55], and the data are available through the National Water and Climate Center of the U.S. Department of Agriculture.
Globally, automated monitoring has been less extensive, with automated weather stations including sonic snow depth sensors [56]. More recently, SWE has been measured with a snow scale https://sommer.at/ (accessed on 20 October 2021) or through the attenuation of gamma radiation by the snowpack emitted from potassium and thallium in the earth <https://campbellsci.com/>. Snow pillows are used in some locations e.g., [57], but are only used operationally in the United States. There, the NRCS has been operating the automated snow telemetry (SNOTEL) network since the late 1970s [58]. These stations originally monitored SWE and cumulative precipitation and now also monitor air temperature and snow depth. SWE is measured using a snow pillow [59,60], while depth is measured using an ultrasonic depth sensor [56]. Snow course and SNOTEL depth measurements are used in examples four and five, and SNOTEL SWE data are used in example six. In the last example, the accumulation and melt patterns were compared for different materials used in the construction of the snow pillow. Currently, the standard snow pillow is black with a surface mesh; a tan/grey snow pillow and a black snow pillow each with a chain-link fence covering were also tested.

Airborne LiDAR Versus Snow Depth Probe Data
There are up to 680 manual snow depth probe measurements [46]. All 680 measurements were only taken at the Buffalo Pass locations, with nine missing at Spring Creek ( Figure 1). The random stratified sampling survey design was composed of ninety-six 100 × 100 m squares with five depth measurements randomly spaced 5, 10, 15, 20 or 25 m apart  [41]) and manual snow depth probe measurements (data from [46]) across six 1 km 2 intensive study areas (ISA) in early April 2003 from the NASA Cold Land Process Experiment. The LiDAR-based depth estimates are from differentiating snow-on versus snow-off scans. The "all-LiDAR" dataset includes all depth estimates within each ISA (>300,000 samples). The mean difference between the snow depth probe and LiDAR estimates are presented above the bars for all matching points (teal) and the first points only (orange in italics). (b) There are 680 depth probe measurements per ISA with 100 being the first depth probe point (see Elder et al. [46]).

Operational Measurement
The operational snow course data include depth, SWE, and density. Such measurements are made using a variety of samplers e.g., [24] in countries that have seasonal snowpacks. In the United States, these data are collected by the Natural Resources Conservation Service (NRCS) https://www.wcc.nrcs.usda.gov/ (accessed on 30 July 2021) with a Federal Sampler [1,55], and the data are available through the National Water and Climate Center of the U.S. Department of Agriculture.
Globally, automated monitoring has been less extensive, with automated weather stations including sonic snow depth sensors [56]. More recently, SWE has been measured with a snow scale https://sommer.at/ (accessed on 30 July 2021) or through the attenuation of gamma radiation by the snowpack emitted from potassium and thallium in the earth <https://campbellsci.com/>. Snow pillows are used in some locations e.g., [57], but are only used operationally in the United States. There, the NRCS has been operating the automated snow telemetry (SNOTEL) network since the late 1970s [58]. These stations originally monitored SWE and cumulative precipitation and now also monitor air temperature and snow depth. SWE is measured using a snow pillow [59,60], while depth is measured using an ultrasonic depth sensor [56]. Snow course and SNOTEL depth measurements are used in examples four and five, and SNOTEL SWE data are used in example six. In the last example, the accumulation and melt patterns were compared for different materials used in the construction of the snow pillow. Currently, the standard snow pillow is black with a surface mesh; a tan/grey snow pillow and a black snow pillow each with a chain-link fence covering were also tested.

Airborne LiDAR versus Snow Depth Probe Data
There are up to 680 manual snow depth probe measurements [46]. All 680 measurements were only taken at the Buffalo Pass locations, with nine missing at Spring Creek ( Figure 1). The random stratified sampling survey design was composed of ninety-six 100 × 100 m squares with five depth measurements randomly spaced 5, 10, 15, 20 or 25 m apart in an "L" and four squares with 50 measurements in an "L" spaced 1 m apart ( Figure 1b) [46]. In the field, the initial manual depth probe measurement location (dark blue diamond in Figure 1b) was identified with a semi-permanent PVC pole. This pole was located with a hand-held GPS unit. Occasionally, the pole could not be found, and its location was estimated by GPS; the stated GPS error was typically 3 to 5 m, and the author found 5 to 10% of the poles to be missing during the IOP4 survey. The location of the other four measurements was based on the implementation by each snow surveyor, i.e., what the snow surveyor did in the field. The author was in the field for this survey and used the depth probe to measure the horizontal distance between measurement locations with the pre-specified direction being taken using a compass. It is assumed that the probes were inserted vertically into the snowpack just to the ground.
A direct comparison of point snow depth measurements to LiDAR-based depth estimates (Figure 2a) is not recommended [34,61,62] due to the inaccuracy of locating each measurement [36] and the scale difference [14]. The two issues with locating the measurements are the reported and the actual location differences. Relating to the former, the average difference in the location between the reported manual measurement and center of the LiDAR return was 83 cm, ranging from 0 to 395 cm. However, the locations of the manual measurements were reported to the nearest meter and the LiDAR to the nearest centimeter. Relating to the latter, an assessment of concurrent aerial photography illustrated that the location of these four points could deviate from their survey design [63]. From trigonometry, the depth probe point is 5% off from the assumed location for every degree that the bearing is off from true north (east, south, or west, from Figure 1b). For example, if the declination was not considered,~10 • E for CLPX sites in northern Colorado, then with a measurement interval of 25 m, the final depth probe point would be about 12.3 m from the assumed location. With a LiDAR resolution of~1.5 m, then the depth probe measurement could be eight LiDAR pixels from where it should be. However, even comparing only the first point, which should be within one LiDAR pixel, the mean snow depth had a similar difference between the two datasets ( Figure 1a). In fact, at the three Fraser MSA sites, the difference of probe minus LiDAR snow depth was 0.03 to 0.06 m greater when only comparing the first points.
The mean snow depth from all hundreds of thousands of LiDAR estimates was less than the manual snow depth probe measurements (blue-vs. gold-colored bars in Figure 1a). With the exception of Spring Creek (RS), the mean snow depth for all depth probe points was greater than the coincident LiDAR estimate (gold-vs. lime-colored bars in Figure 1a). The spatial structure of the distributed snow depth measurements can be seen from semi-variograms; the three ISAs (Alpine, Buffalo Pass, Walton Creek) evaluated by Deems et al. [34,64] are presented here (Figure 2b). The semi-variance across the range of lag distances was the same for the three datasets at Buffalo Pass (Figure 2bii), similar for both LiDAR datasets at Alpine (Figure 2bi), but different for the three datasets at Walton Creek ( Figure 2biii). The Fool Creek, St. Louis Creek, and Spring Creek variograms (not shown) had similar patterns to Alpine with all LiDAR and LiDAR-matching points being essentially the same, with larger semi-variance for the point measurements at lag distances greater than the scale break.
There is noise, illustrated as variability with lag distance, in the depth probe semivariograms, as also seen by Erxleben et al. [65] for Fool Creek. The semi-variograms for the LiDAR matching point data are less noisy than the depth probe semi-variograms, but not as smooth as the all LiDAR data (Figure 2b). This is due in part to the number of data points; there are 100 s of points collected from the depth probe versus 100,000 s of points collected from the airborne LiDAR survey. Similar scale breaks could be extracted from the three datasets for the Alpine and Buffalo Pass ISAs, since the datasets with fewer points still represent the snowpack, in comparison to the complete LiDAR dataset [66]. However, for the other for ISAs (Walton Creek shown in Figure 2biii), it is difficult to determine the scale break for the depth probe semi-variograms. The horizontal uncertainty and lack of data make it more difficult to assess spatial structure using depth probe measurements. determine the scale break for the depth probe semi-variograms. The horizontal uncertainty and lack of data make it more difficult to assess spatial structure using depth probe measurements.  [34], and for all matching LiDAR estimates (green triangles) and probe depth measurements (gold circles).
Since the average difference is between −0.03 and 0.48 m (Figure 1a), over-probing, while known to over-measure (over-probe) snow depth in the order of 5 cm due to penetration into the soil below the snow [28], is likely not the cause of these differences. The largest differences of 0.42 (all) and 0.48 (first point only) are at Fool Creek and St. Louis Creek, which had the densest forest canopy [34,43]. The Alpine ISA is divided into forest in the northwest and alpine in the southeast; however, there, the results were mixed with mean depth probe LiDAR difference for all points (first points only) was 0.26 (0.30) m for the open alpine terrain and 0.20 (0.36) m in the forest.

Fine Resolution Accumulation Differences
The two NWS stations reported 25 to 28 cm of accumulation over the 26 October 2020 Fort Collins snow storm (Figure 3a) based on twice-daily snowpack measurements [56]. While the two NWS measurements were only 10 m apart, there was a 10% difference in snow depth (Figure 3a). The around-the-house samples varied by 30% (north vs. south)  [34], and for all matching LiDAR estimates (green triangles) and probe depth measurements (gold circles).
Since the average difference is between −0.03 and 0.48 m (Figure 1a), over-probing, while known to over-measure (over-probe) snow depth in the order of 5 cm due to penetration into the soil below the snow [28], is likely not the cause of these differences. The largest differences of 0.42 (all) and 0.48 (first point only) are at Fool Creek and St. Louis Creek, which had the densest forest canopy [34,43]. The Alpine ISA is divided into forest in the northwest and alpine in the southeast; however, there, the results were mixed with mean depth probe LiDAR difference for all points (first points only) was 0.26 (0.30) m for the open alpine terrain and 0.20 (0.36) m in the forest.

Fine Resolution Accumulation Differences
The two NWS stations reported 25 to 28 cm of accumulation over the 26 October 2020 Fort Collins snow storm (Figure 3a) based on twice-daily snowpack measurements [56]. While the two NWS measurements were only 10 m apart, there was a 10% difference in snow depth (Figure 3a). The around-the-house samples varied by 30% (north vs. south) (Figure 3a), while difference between locations (e.g., N1 vs. N2) was 1 cm or less. Around the house, spatial variability in SWE (Figure 3b) showed the same patterns as snow depth (Table S1), but the NWS SWE estimates were only 1% different from one another. The maximum difference in bulk density was about 10% around the house, and also 10% between the two NWS stations (Figure 3c). The four locations around the house are within 10 m of the house, and W, S, and E are within 5 m of a fence or hedge, and it is assumed that there is similar shelter from wind. Over the course of the snowfall event, the wind blew from almost all direction (Figure 3d). Wind caused limited to no redistribution as the maximum 10 min (gust) wind speed at NWS station FTC was 2.64 (3.89) m/s with a mean 10 min (gust) wind speed o 1.13 (2.23) m/s (Figure 3e), but likely influenced accumulation patterns. Snow depth meas urements during other snowfall events have not shown the same north-to-south differ ence (Figure 3).
One difference between the two NWS station snowpack estimates is due to precision and methodology. FTC is the official NWS COOP station, while FCL is the CoCoRaH (Community Collaborative Rain, Hail and Snow Network https://www.cocorahs.org/ (ac cessed on 20 October 2021) station that is at the FTC weather station, so the measurement are within 10 m of one another. The FTC snow depth precision is 2.54 cm (1"), based on the COOP data entry system, while the FCL precision is 0.254 cm (0.1") based on the Co CoRaHS data entry system. NWS snow depth was taken as snow on the ground (SNWD rather than snowfall (SNOW) [Ryan et al., 56;, in order to compare the NWS meas urements to the around-the-house snow cores. At the FCL station, SWE was measured a SWE on the ground (WESD) to the nearest 2.54 mm (0.1"). However, for the FTC station there were no such measurements, and cumulative precipitation was used to estimat SWE, as per Meinhardt and Fassnacht [67]. The four locations around the house are within 10 m of the house, and W, S, and E are within 5 m of a fence or hedge, and it is assumed that there is similar shelter from wind. Over the course of the snowfall event, the wind blew from almost all directions (Figure 3d). Wind caused limited to no redistribution as the maximum 10 min (gust) wind speed at NWS station FTC was 2.64 (3.89) m/s with a mean 10 min (gust) wind speed of 1.13 (2.23) m/s (Figure 3e), but likely influenced accumulation patterns. Snow depth measurements during other snowfall events have not shown the same north-to-south difference ( Figure 3).
One difference between the two NWS station snowpack estimates is due to precision and methodology. FTC is the official NWS COOP station, while FCL is the CoCoRaHS (Community Collaborative Rain, Hail and Snow Network https://www.cocorahs.org/ (accessed on 30 July 2021) station that is at the FTC weather station, so the measurements are within 10 m of one another. The FTC snow depth precision is 2.54 cm (1 ), based on the COOP data entry system, while the FCL precision is 0.254 cm (0.1 ) based on the CoCoRaHS data entry system. NWS snow depth was taken as snow on the ground (SNWD) rather than snowfall (SNOW) [Ryan et al., 56;, in order to compare the NWS measurements to the around-the-house snow cores. At the FCL station, SWE was measured as SWE on the ground (WESD) to the nearest 2.54 mm (0.1 ). However, for the FTC station, there were no such measurements, and cumulative precipitation was used to estimate SWE, as per Meinhardt and Fassnacht [67].

Fine Resolution Snowmelt Differences
From 00:00 to about 12:00 on 16 April 2020, an average of 26.3 cm of fresh snow fell at the author's house, as measured at five locations on the north side using the SnowMetrics snowboard tube. This snowfall yielded about 29.6 mm of SWE and an average density of 112 kg/m 3 . Two days later, there was differential snowpack compaction and melt along the 180 cm transect (Figure 4). A snowboard (Figure 4a) was used to estimate continuous (~0.25 mm resolution) snow depth (Figure 4b and cyan-colored lines in Figure 4c,d). Snow cores (orange-colored squares in Figure 4b-d and Table S2a,b) were extracted at 11 locations, and the trench (blue-colored lines in Figure 4b-d) was dug to estimate depth (Figure 4b), SWE (Figure 4c), and density (Figure 4d). Obvious layers were identified in each of the snow cores extracted from the snow tube, and the density of each of those layers was estimated (Figure 4e).

Additional Snow Depth Measurements around Operational Stations
To evaluate snow depth variability over short distances, five additional snow depth measurements were taken near each of the 10 individual Federal Sampler measurement locations at the NRCS Cameron Pass (station 05J01) snow course in Northern Colorado The surface illustrated the areas with preferential melt and thus decreased SWE (distances > 50 cm; Figures 4c and S1), as well as increased densification where the depth was the shallowest (distances from 70 to 90 cm and beyond 140 cm; Figure 4d-e). The snowpack changes are driven by short-wave radiation, with solar-noon shading (Figure 4b) over the areas with the least changes in the snowpack (Figure 4e). In each of the 11 cores, the bottom 1 to 1.5 cm was a refrozen layer (mean density of 332 kg/m 3 ) and was thus easy to identify. The four cores from 50 to 90 cm and the three cores beyond 140 cm all received more solar loading and had meltwater at the top~2 cm with a mean density of 372 kg/m 3 (Figure 4e). The rest of the snowpack had a density of about 150 kg/m 3 .

Additional Snow Depth Measurements around Operational Stations
To evaluate snow depth variability over short distances, five additional snow depth measurements were taken near each of the 10 individual Federal Sampler measurement locations at the NRCS Cameron Pass (station 05J01) snow course in Northern Colorado ( Figure 5). There is variability among the individual snow course measurements themselves [68,69], and around each individual measurement ( Figure 5). The mean snow depth along the snow course was only 1.2 cm different for the two methods (177.4 vs. 176.2 cm for the snowcourse vs. additional measurements, respectively), but the variability from the depth probe measurements was greater ( Figure 5).  Twenty-four additional snow depth measurements were taken on t June 2011) at 1, 2, and 3 m intervals in eight directions around the N SNOTEL snow pillow in Northern Colorado to compare to the Judd depth sensor measurements of depth above the snow pillow [56] (Figure was melting, so the depth measurements taken on 2 June left holes in t cept for one of the 24 locations; Figure 6) that remained on 4 June, an measured in the exact same location for the two dates. Snow depth varie atically with direction or distance from the edge of the snow pillow (Fi decreased at all but two locations, and on average it matched what wa SNOTEL station (Figure 6c). However, the decrease was not uniform. Twenty-four additional snow depth measurements were taken on two dates (2 and 4 June 2011) at 1, 2, and 3 m intervals in eight directions around the NRCS Joe Wright SNOTEL snow pillow in Northern Colorado to compare to the Judd ultrasonic snow depth sensor measurements of depth above the snow pillow [56] (Figure 6). The snowpack was melting, so the depth measurements taken on 2 June left holes in the snowpack (except for one of the 24 locations; Figure 6) that remained on 4 June, and thus depth was measured in the exact same location for the two dates. Snow depth varied but not systematically with direction or distance from the edge of the snow pillow (Figure 6a,b). Depth decreased at all but two locations, and on average it matched what was observed at the SNOTEL station ( Figure 6c). However, the decrease was not uniform. measured in the exact same location for the two dates. Snow d atically with direction or distance from the edge of the snow decreased at all but two locations, and on average it matche SNOTEL station (Figure 6c). However, the decrease was not  (Table S4). (c) Illustrates the snow depth difference for the 24 individual points between the two sampling dates and the best fit line. Note that 23 of the 24 points were taken in the exact same hole from 2 to 4-June; the location of the 2 m away SW hole was estimated as midway between the 1 and 3 m away holes (Table S4).

Sensor Comparisons
Sensors need to be evaluated; the difference between each measurement is a function of the sensor and spatial variability. At four locations across the domain of the NRCS Colorado Snow Survey office, additional snow pillows (and other sensors) have been installed to evaluate how snow pillow characteristics influence the measurement of SWE [70]. Four winters of data https://www.wcc.nrcs.usda.gov/ (accessed on 30 July 2021) from the Berthoud Summit SNOTEL station (station 05K14S/335) illustrate that the three pillows do not measure exactly the same SWE (Figure 7a), yet the differences are consistent from year to year (Figure 7b). Accumulation patterns are different among pillows; specifically, pillow 2 begins to accumulate more SWE than the standard pillow starting in February, while pillow 3 accumulates less snow by up to 30 mm (Figure 7b) after there is 50 mm on the ground but then in March starts to accumulate more than the standard pillow. Peak SWE at pillow 2 (pillow 3) is 33 to 66 (15 to 46) mm greater than the standard pillow. Pillow 3 melts out more slowly than the other two pillows (highest lines in Figure 7b), with a maximum difference of 125 mm in 2017 (Figure 7bi). However, while pillows 2 and 3 have more snow to melt than the standard pillow, the melt-out date is only two days earlier, except in 2018, when it is five days earlier.
February, while pillow 3 accumulates less snow by up to 30 mm (Figure 7b) after there is 50 mm on the ground but then in March starts to accumulate more than the standard pillow. Peak SWE at pillow 2 (pillow 3) is 33 to 66 (15 to 46) mm greater than the standard pillow. Pillow 3 melts out more slowly than the other two pillows (highest lines in Figure  7b), with a maximum difference of 125 mm in 2017 (Figure 7bi). However, while pillows 2 and 3 have more snow to melt than the standard pillow, the melt-out date is only two days earlier, except in 2018, when it is five days earlier. SNOTEL stations are typically situated in forest clearings [71], in order to not be influenced by canopy interception and to decrease wind. The pillows are about 3.5 m in diameter and were placed about 2 m apart (Figure 8a,b). There are no trees from the southwest to the southeast for at least 20 m. At this site (Figure 8a), the forest does not throw shade differentially on any one of the snow pillows. SNOTEL stations are typically situated in forest clearings [71], in order to not be influenced by canopy interception and to decrease wind. The pillows are about 3.5 m in diameter and were placed about 2 m apart (Figure 8a,b). There are no trees from the southwest to the southeast for at least 20 m. At this site (Figure 8a), the forest does not throw shade differentially on any one of the snow pillows.
The expectation is that differences in snow pillow materials caused a difference in accumulation and melt (Domonkos 2016 [70]). The standard snow pillow is now black with a surface mesh (pillow 1 in Figure 2b). A tan/grey pillow is also available (pillow 2 in Figure 8b), and a chain-link fence covering yields a different surface roughness (pillows 2 and 3 in Figure 8b). It is assumed that the energy balance of the three snow pillows is different when the snowpack is shallow due to their color, i.e., albedo, and thus shortwave radiation associated with light penetration. Thus, material-related SWE variation should only be expected during the initial accumulation (depth less than 50 cm and SWE < 100 mm) and then during melt (likely shallower depth due to a decrease in light penetration from increased density).
The properties of the surface material could also influence the sensible and latent heat fluxes during initial accumulation, and these could cause differential melt. We assume that this would result in differences in SWE. However, we do not see differences in SWE until a certain amount of accumulation. Then, the differences can be 5% or greater (Figure 8b). Snow depth is not measured. It is possible that SWE is not initially different, but metamorphism rates may be different. This could cause densification at the non-standard pillows, yielding a lower snow depth. The "relative" depression at these other snow pillows can subsequently be filled in by redistributed snow. fluxes during initial accumulation, and these could cause differential melt. We assume that this would result in differences in SWE. However, we do not see differences in SWE until a certain amount of accumulation. Then, the differences can be 5% or greater ( Figure  8b). Snow depth is not measured. It is possible that SWE is not initially different, but metamorphism rates may be different. This could cause densification at the non-standard pillows, yielding a lower snow depth. The "relative" depression at these other snow pillows can subsequently be filled in by redistributed snow.

Discussion and Recommendations for Sampling
Developing an effective, comprehensive sampling strategy requires knowledge of the processes that dictate the distribution of snow [13]. It also requires consideration of the purpose of the sampling, i.e., understanding processes, model input, model and remote sensing calibration, and evaluation. These considerations are also necessary when considering the temporal component of sampling, and there, the scales of variability are different. While the distribution of snow tends to be consistent from year to year [72][73][74], sampling later in the winter must consider that some of the domain may still be accumulating snow when other areas have already begun to melt [75,76]. These areas may be in close proximity to one another, or the degree of melt may vary (Figure 4). At a single location, the date of peak accumulation can vary from year to year [77]. Sampling should consider the intraand interannual variability in snowpack properties [78].
The snowpack varies over a range of scales [13,34,[42][43][44] (Figures 2b and 3-6). The resolution of sampling should consider what processes dictate the variability [13,14] and the scale of their variability [79]. Usually, it is not feasible to perform measurements such as those around the house (Figure 3), nor is it realistic or perhaps even useful to measure very fine-scale melt processes (Figure 4). Further, it is not ideal to call a measurement near the house representative (Figure 3), but the differences highlight the relevance of local micrometeorological variability; one must be cautious to randomly measure the snowpack and call that measurement characteristic of a larger area. Snow depth is the easiest measurement to perform on the ground. To estimate snow depth for a specific location for a pixel, various snow depth sampling strategies have been employed (see also [13] for a summary). Numerous sampling efforts use a variant of the procedure employed by Elder et al. [80], where each location was randomly selected from a 25 m DEM using a random stratified sampling strategy, with snow depth recorded at five points, each 4 m away in the four cardinal directions, and then the mean of five depths is used to consider local variability. A nested approach is an alternative to the random stratified sampling strategy [81]. Clark et al. [13] also collected five snow depths, but they were taken within a 0.25 m 2 area, at 10 m intervals along the contour to create a transect.
To collect data at more locations, Meromy et al. [26] used the mean of three snow depths 5 m apart in a row with sampling location spaced 50 m apart along the transect and 100 m between transects. To evaluate the number of samples necessary to represent a location, the sampling strategy of Meromy et al. [26] was expanded to 11, 17, or 21 samples at 1 m intervals to represent each location [27]; on average, five snow depth samples tend to represent the mean within a 5% tolerance, but this varied based on the terrain and canopy. The 11 × 11 depth probe samples over a 100 m 2 area used by López-Moreno et al. [25] is overkill, but such exercises determine how many samples must be actually measured to be representative.
A compromise is often necessary between having a random distribution of sampling locations (e.g., Figure 1b) [46] and being efficient in moving across the terrain to collect more data [26,27]. For manual measurements, the SnowHydro GPS Magnaprobe http://www.snowhydro.com/ (accessed on 30 July 2021) can help expedite data collection by integrating snow depth (up to 140 cm) and location [28]. We also need to consider other semi-automated data collection tools on our phone or tablets, such as ArcGIS Sur-vey123 https://survey123.arcgis.com/ (accessed on 30 July 2021), together with GPS tools, such as maps.me https://maps.me/ (accessed on 30 July 2021), or GAIA GPS https://www.gaiagps.com/ (accessed on 30 July 2021).
We need to continue to evaluate both remote sensing and model data or model output [26]. We also need to be strategic in how we do these evaluations. For example, the point-to-area comparison shown in Figure 2a is not very useful, fror the reasons mentioned above. However, a more structural comparison (Figure 2b) can provide more insight into if processes and properties are actually being measured [66,82]. This also includes the identification of scale breaks to inform sampling [34,42,83].
The snow sampling tools also need to be compared, as has been done in several contexts for extracting snow cores to measure SWE [24,37,38]. López-Moreno et al. [24] also assessed the bias induced by the observations, but even over small distances, there can be substantial spatial variability (Figures 4-7). We should continue to evaluate operational sensors ( Figure 7) and ensure that any sensor inconsistencies due to changing hardware and other factors [84][85][86] are thoroughly assessed.
The NRCS occasionally collects additional SWE measurements about snow pillows to evaluate performance, especially in years of deep snow [70]. Such additional measurements could be added to the NRCS snowpack database https://www.wcc.nrcs.usda.gov/ (accessed on 30 July 2021). Additional snow depth measurements are simple (Figure 6), as long they do not disturb the snow pillow. Similarly, additional measurements around the individual snow course locations provide insight into variability along the snow course transect (Figure 5). At a minimum, an evaluation of variability from the individual snow course measurements is easily attainable [68,69].
Terrain and land cover/canopy data are often used as surrogates for meteorological forcing variables [26,27,65,80,87]. Thus, the sampling needs to consider the spatial resolution of the data, specifically the digital elevation model or DEM and associated terrain variables (e.g., slope, aspect), canopy, and land cover data, as well as variability within each pixel. Recent research [26,27] used 30 m DEM (e.g., https://nationalmap.gov, (accessed on 30 July 2021) and land cover/canopy density [88] data. Biomass can be derived from LiDAR measurements at a 1 to 3 m resolution [89], and in a snow hydrology context, these data can be used to determine evergreen forest versus other land cover types [90]. Yet, the accuracy of the location of the measurement, as discussed above related to Figures 1 and 2, needs to be considered in the context of which DEM, canopy, and/or land cover pixel a sample is being collected in. The use of sub-meter accuracy Global Positioning System (GPS) units and location post-processing can reduce the uncertainty of the location, i.e., which pixel the measurement is in.
Going beyond the manual and in situ measurements, an assessment of LiDAR-based snow depth data can provide insight into optimal sampling locations, especially for the deployment of sensors [91]. Advances have been made with structure from motion and photogrammetry to map snow depth [92], including on drones or unmanned aerial vehicles (UAV) [93]. Along transects, ground-penetrating radar (GPR) has been used in various capacities [94,95] and is comparable on different platforms across scales [96,97]. Ground-based techniques such as GPS [98] and neutron probes [99] infer SWE through the signal attenuation by the SWE itself. Further, the new prevalence of do-it-yourself sensors [100,101] with inexpensive memory now allows for the construction and deployment of arrays of homemade instruments [65]. In 2016, Dozier et al. [11] highlighted the newest technology as the combined airborne lidar and multi-spectral data, such as those collected by the airborne snow observatory [27]. We should be using multiple tools from the various remote sensing technologies, with ground-truth from manual measurements and do-it-yourself sensors.
Supplementary Materials: The following are available online at https://www.mdpi.com/article/ 10.3390/geosciences11110435/s1, Table S1: Fresh snow core data (depth and SWE) collected around the author's house after the 26 October 2020 snowfall event in Fort Collins Colorado USA and used for Figure 3, Table S2. (a) Preferential snowmelt snow depth and SWE snow core data for several lay-ers/intervals at 11 locations shown in Figure 4e. These data were used to estimate the bulk measurements shown in Figure 4b Funding: This research received no direct, external funding. The image used for snow depth extracted from the roughness board in Figure 4a was collected as part of the NSF Division of Mathematical Sciences grant DMS-1615909. The additional snow course depth data presented in Figure 5 were collected as part of the NOAA Office of Hydrologic Development grant NA07NWS4620016. The additional SNOTEL snow pillow depth data presented in Figure 6 were collected as part of the NASA Terrestrial Hydrology Program award NNX11AQ66G.
Institutional Review Board Statement: Not applicable.

Informed Consent Statement: Not applicable.
Data Availability Statement: Data for Figures 1 and 2 are available from the national Snow and Ice Data Center https://nsidc.org/ (accessed on 30 July 2021); for the lidar data refer to citation [41] and for the field (snow depth probe) data refer to citation [46]. Stations data for Figure 3 are available from http://ccc.atmos.colostate.edu/~autowx/ (accessed on 30 July 2021). Snowcourse data for Figure 5