Seasonal Estimates and Uncertainties of Snow Accumulation from CloudSat Precipitation Retrievals

: CloudSat is often the only measurement of snowfall rate available at high latitudes, making it a valuable tool for understanding snow climatology. The capability of CloudSat to provide information on seasonal and subseasonal time scales, however, has yet to be explored. In this study, we use subsampled reanalysis estimates to predict the uncertainties of CloudSat snow water equivalent (SWE) accumulation measurements at various space and time resolutions. An idealized/simulated subsampling model predicts that CloudSat may provide seasonal SWE estimates with median percent errors below 50% at spatial scales as small as 2 ◦ × 2 ◦ . By converting these predictions to percent differences, we can evaluate CloudSat snowfall accumulations against a blend of gridded SWE measurements during frozen time periods. Our predictions are in good agreement with results. The 25th, 50th, and 75th percentiles of the percent differences between the two measurements all match predicted values within eight percentage points. We interpret these results to suggest that CloudSat snowfall estimates are in sufﬁcient agreement with other, thoroughly vetted, gridded SWE products. This implies that CloudSat may provide useful estimates of snow accumulation over remote regions within seasonal time scales.


Introduction
Most snow that falls on the planet lands over remote and unpopulated terrain where few in situ measurements are available [1,2]. Understanding the amount of snow that falls in these regions is important since the Snow Water Equivalent (SWE) stored in remote snowpack can determine the freshwater supply for whole ecosystems and societies. Estimates of snow accumulation in remote regions are difficult to come by, however, and products that exist often carry large uncertainties. Reanalysis models or interpolated hydrological products rely on in situ SWE observations thousands of km apart, and SWE estimates between different methods can vary significantly [3]. Satellite measurements do not have a sampling discrepancy between inhabited and uninhabited terrain, so they are a logical choice for measuring snow in remote locations. Indeed, recent observations from the Gravity Recovery and Climate Experiment (GRACE-FO) have successfully been used to provide novel information on total snowpack in mountainous regions and high latitudes [4]. This measurement inherently combines all mass-change that occurs during a time period, however. If one is interested in total deposited snowfall, that information will be absorbed alongside any snowmelt, rain, evaporation, or blowing that occurs.
Integrated satellite precipitation estimates can theoretically be used to provide a purer measurement of the amount of snow that falls in a region over time. Passive microwave sensors provide the most frequent measurements of precipitation. These measurements have historically had errors over snow-covered surfaces, though recent algorithms are

CloudSat 2C-SNOW-PROFILE
CloudSat was launched on 28 April 2006. Only data from 2006-2015 are used in this study to be conterminous with the gridded SWE datasets (see Section 2.2). CloudSat has an orbital apogee of 81 degrees and repeats its orbital track roughly every 16 days. Cloud-Sat carries the Cloud Profiling Radar (CPR), a W band (3.2 mm) radar with a minimum detectable reflectivity of −30 dBZ. CPR reflectivities are converted to snowfall rates via the 2C-SNOW-PROFILE [21]. This product relies on a Bayesian optimal estimation algorithm. The algorithm uses measurements of reflectivity and ECMWF analysis to update prior assumptions of microphysical properties and retrieve snowfall rates. The CloudSat snowfall rates are presented as lognormal snowfall probability distributions. The standard deviations of these distributions, herein referred to as the Bayesian uncertainty, tend to be between 150% and 250% of the mean value. CloudSat is still operating today, but observations have not been continuous. There are numerous gaps in the data record, particularly during 2009 and 2011 for winter snowfall. In 2012, CloudSat experienced a battery failure, and after the battery was repaired it operated in a "Daylight Only" mode that reduces the frequency of retrievals as a function of distance to the northern orbital peak. The loss of CloudSat snowfall measurements results in a bias of annual snowfall measurements from CloudSat after 2012 [22].
The Bayesian uncertainty is not the only source of error that can impact 2C-SNOW-PROFILE retrievals. If the ice crystals in a size distribution are not sufficiently small compared to the incident wavelength, non-Rayleigh scattering effects will dampen the relationship between W band reflectivity and snowfall rate, adding increasingly large uncertainties for snowfall rates at high reflectivity. Atmospheric gasses can attenuate W band reflectivity, but this effect is typically accounted for in the 2C-SNOW-PROFILE algorithm and can be safely ignored [23]. Attenuation due to liquid water may also be substantial in mixed phase clouds, and no attempt is made to identify or offset this attenuation. If there are an excessive number of snowflakes in a particle size distribution (PSD), multiple scattering effects may artificially increase the reflectivity of a PSD by introducing cross sections from snowflakes outside of the radar footprint. High snowflake concentration also increases W band attenuation in a fashion that tends to cancel out the increase due to multiple scattering [24]. Finally, the 2C-SNOW-PROFILE defines "surface precipitation rate" as the precipitation rate 1.25 km above the surface. This is done to remove the "blind zone" of spurious radar measurements that obfuscate precipitation reflectivity near the ground, but it will also remove information on any sublimation [25] or low-level precipitation development [26] that may occur within the boundary layer. The surface clutter will often spread beyond the 1.25 km threshold over mountains and cliffsides, and in these cases the 2C-SNOW-PROFILE flags a retrieval as unreliable.

Gridded SWE Products
Gridded SWE products are generated from forward models of SWE that are constrained by collocated observations at set time intervals. In this study, we use the Blended-4 (B4) gridded SWE product developed by the Canadian Sea Ice and Snow Evolution (Can-SISE) network [3]. B4 contains daily northern hemisphere (NH) SWE on ground estimates from 2006 to 2015 at 0.5 • resolution. The B4 product is the average (or blend) of four gridded SWE products: the Canadian Meteorological Center (CMC) gridded SWE product, CROCUS, GlobSnow (GS) and MERRA-2 (M2) datasets. The CMC dataset (updated by Ross Brown) provides daily estimates of gridded snow depth and SWE at 1 ⁄ 3 degree resolution [27]. Background field estimates are derived using a simple snowpack model as described in Brasnett [28] which is forced with six-hourly ECMWF two-meter air temperature and precipitation fields. The data are then interpolated to the grid using an optimal interpolation data assimilation scheme that ingests in situ data from approximately 8000 survey sites across North America (NA). CROCUS is a complex, physically based, energy and mass balance snow model developed by the National Centre for Meteorological Research (NCMR) at 1 • resolution [29]. The CROCUS model operates as a replacement snow model in the Interactions between Soil, Biosphere, and Atmosphere (ISBA) land model, driven by ERA-Interim meteorology and provides daily estimates of SWE for multiple snow layers. The GS (version two) product developed by the European Space Agency (ESA) is the only satellite-based component of the B4 which provides daily estimates of SWE through a combination of in situ data and satellite temperature brightness (TB) measurements [30]. In situ snow depth observations are used to force a single layer snow emission model to simulate passive microwave estimates across the NH. Snow grain size is then estimated through the optimized agreement between simulated and observed satellite TB estimates, which are then kriged across the NH to produce a spatially contiguous map of assimilated SWE at 25 km resolution. This product is masked over alpine regions due to uncertainties related to microwave estimates over complex terrain. M2 is a reanalysis product developed by the National Aeronautics and Space Administration's (NASA) Global Modeling and Assimilation Office (GMAO) [31]. M2 uses the Catchment land surface model and an intermediate complexity snow scheme to provide SWE estimates at 0.5 • by 0.6 • resolution. In situ data are ingested using a 3DVAR assimilation scheme of nearby surface observations, which is then used as forcing data for the Goddard Earth Observing System (GEOS) atmospheric model to derive gridded SWE estimates.
There are several sources of error in SWE data products. SWE can be underestimated at in situ stations due to "gauge undercatch", whereby falling snow is not captured by the gauge opening during strong winds [32]. SWE products can have errors resulting from errors in modelled precipitation, especially for grid cells that are positioned further away from observation sites, since these have less information to constrain model estimates [29]. Various sources of error propagating from assumptions in forward models may also add uncertainty to final gridded SWE estimates [30]. The blending approach has been shown to reduce SWE uncertainty, however, as well as diminishing any overall bias from these effects [3,[33][34][35][36][37][38].

Weather Observations and Reanlaysis
An hourly simulation of global precipitation is drawn from the NCAR Global Climate Four-Dimensional Data Assimilation (CFDDA) Hourly 40 km Reanalysis [39]. This Reanalysis provides estimates of total (combined water and liquid) precipitation at 0.4 • increments. CFDDA incorporates hourly and six-hourly surface and upper air measurements. Land surface conditions such as surface temperature and soil moisture are initialized by the NASA Global Land Data Assimilation System and compiled using the Noah Land Surface Model (LSM). Sea surface temperatures are provided by the National Center for Environmental Prediction (NCEP) version 2.0 global and daily dataset. At the time of its creation, it was the highest temporally resolved reanalysis model. While newer reanalysis models such as ERA5 can now provide hourly reanalysis estimates with similar or higher resolution, the purpose of this reanalysis model is more to provide a realistic simulation of weather patterns than to account for the best approximation of real weather at any particular period in Earth's history. CFDDA is sufficient for this purpose.
Daily maximum temperatures come from the Climate Prediction Center (CP) Global Daily Temperature dataset. This dataset is generated by National Oceanic and Atmospheric Administration/Ocean Atmospheric Research/Earth Science Research Laboratory (NOAA/OAR/ESRL) Physical Science Laboratory (PSL) in Boulder, Colorado. These temperatures represent a combination of weather station and satellite observations.

Method
As long as temperatures remain below freezing, the total snow that falls from the atmosphere during a time period should be comparable to the change in surface snow as measured at the beginning and end of that time period. We refer to both of these measurements as ∆SWE, assuming herein that ∆SWE ≥ 0. A B4 measurement of ∆SWE (∆SWE B ) is defined as the difference of SWE between any two consecutive days over the same grid cell, with SWE B f and SWE B i referring to the final and initial B4 measurement of SWE. A Cloudsat measurement of ∆SWE (∆SWE C ) is defined as the average measured snowfall rate for a grid cell multiplied by the sample collection time, with S representing a single snowfall rate retrieval, N representing the number of retrievals, and t representing the time period. Calculating ∆SWE C and ∆SWE B adds new uncertainties to the data sources. ∆SWE B will have a Gaussian random error representing the different estimates from the constituent SWE products. In some measurements, ∆SWE B had negative values. This could be due to sublimation/melting or random errors from the ∆SWE blending process. Either way, −∆SWE cannot be evaluated against CloudSat, so these measurements are rejected from the study. The accuracy of ∆SWE C relies on CloudSat gathering a representative average precipitation rate over a measurement time period. At finer spatiotemporal resolutions CloudSat snowfall estimates have larger disagreements with surface evaluations, since ∆SWE C may be disproportionately impacted by non-precipitating periods or snowfall rates during individual storms that will often be stronger than average snowfall rates [34]. ∆SWE C and ∆SWE B are compared against each other at 20 • × 5 • , 10 • × 5 • , and 5 • × 5 • spatial resolutions, with the 10 • × 5 • and 5 • × 5 • nested inside of larger 20 • × 5 • grid cells, at 10-day, 20-day, and 30-day time periods. We draw comparison grid cells over relatively flat regions to prevent SWE and CloudSat errors that are linked with complex topography (Figure 1). In all of these regions, the average surface clutter within the range gates that determine CloudSat-retrieved surface snowfall rates are negligible.
flat regions to prevent SWE and CloudSat errors that are linked with complex topography (Figure 1). In all of these regions, the average surface clutter within the range gates that determine CloudSat-retrieved surface snowfall rates are negligible.

Estimated Accuracy of CloudSat Snow Accumulation Retrievals
Before evaluating CloudSat measurements using the method described in the previous section, we first generate predictions of CloudSat seasonal and subseasonal snow accumulation accuracy at varying resolutions. The process used to generate this prediction is visualized in Figure 2. CloudSat snowfall accumulation errors can be divided into two categories: sampling errors and retrieval errors. Retrieval errors can be further subdivided into Bayesian retrieval errors, which can be described by the probability distribution function accompanying retrievals in the 2C-SNOW-PROFILE, and non-Bayesian retrieval errors, which represent all other retrieval errors discussed in Section 2. Sampling errors are simulated by subsampling CloudSat orbit tracks from an arbitrary three-month period (January, February, and March of 2000) of hourly precipitation generated by CFDDA precipitation estimates (Figure 2a,b). Bayesian uncertainty is simulated by multiplying the measurements from each simulated satellite orbit with a random coefficient generated from a lognormal probability distribution function with a mean of 1 and a standard deviation of 200%. This corresponds to a typical Bayesian uncertainty from the 2C-SNOW-PROFILE. No attempt is made to simulate non-Bayesian retrieval errors, such as non-Rayleigh scattering or blind zone biases. This introduces some uncertainty, though such errors are expected to be infrequent over the sample grids used in this study-away from the mountains and water bodies typically responsible for snow storms with heavy precipitation or shallow cloud depths [2].

Estimated Accuracy of CloudSat Snow Accumulation Retrievals
Before evaluating CloudSat measurements using the method described in the previous section, we first generate predictions of CloudSat seasonal and subseasonal snow accumulation accuracy at varying resolutions. The process used to generate this prediction is visualized in Figure 2. CloudSat snowfall accumulation errors can be divided into two categories: sampling errors and retrieval errors. Retrieval errors can be further subdivided into Bayesian retrieval errors, which can be described by the probability distribution function accompanying retrievals in the 2C-SNOW-PROFILE, and non-Bayesian retrieval errors, which represent all other retrieval errors discussed in Section 2. Sampling errors are simulated by subsampling CloudSat orbit tracks from an arbitrary three-month period (January, February, and March of 2000) of hourly precipitation generated by CFDDA precipitation estimates (Figure 2a,b). Bayesian uncertainty is simulated by multiplying the measurements from each simulated satellite orbit with a random coefficient generated from a lognormal probability distribution function with a mean of 1 and a standard deviation of 200%. This corresponds to a typical Bayesian uncertainty from the 2C-SNOW-PROFILE. No attempt is made to simulate non-Bayesian retrieval errors, such as non-Rayleigh scattering or blind zone biases. This introduces some uncertainty, though such errors are expected to be infrequent over the sample grids used in this study-away from the mountains and water bodies typically responsible for snow storms with heavy precipitation or shallow cloud depths [2].
Atmosphere 2021, 12, x FOR PEER REVIEW 5 of 15 flat regions to prevent SWE and CloudSat errors that are linked with complex topography (Figure 1). In all of these regions, the average surface clutter within the range gates that determine CloudSat-retrieved surface snowfall rates are negligible.

Estimated Accuracy of CloudSat Snow Accumulation Retrievals
Before evaluating CloudSat measurements using the method described in the previous section, we first generate predictions of CloudSat seasonal and subseasonal snow accumulation accuracy at varying resolutions. The process used to generate this prediction is visualized in Figure 2. CloudSat snowfall accumulation errors can be divided into two categories: sampling errors and retrieval errors. Retrieval errors can be further subdivided into Bayesian retrieval errors, which can be described by the probability distribution function accompanying retrievals in the 2C-SNOW-PROFILE, and non-Bayesian retrieval errors, which represent all other retrieval errors discussed in Section 2. Sampling errors are simulated by subsampling CloudSat orbit tracks from an arbitrary three-month period (January, February, and March of 2000) of hourly precipitation generated by CFDDA precipitation estimates (Figure 2a,b). Bayesian uncertainty is simulated by multiplying the measurements from each simulated satellite orbit with a random coefficient generated from a lognormal probability distribution function with a mean of 1 and a standard deviation of 200%. This corresponds to a typical Bayesian uncertainty from the 2C-SNOW-PROFILE. No attempt is made to simulate non-Bayesian retrieval errors, such as non-Rayleigh scattering or blind zone biases. This introduces some uncertainty, though such errors are expected to be infrequent over the sample grids used in this study-away from the mountains and water bodies typically responsible for snow storms with heavy precipitation or shallow cloud depths [2].  In order to generate distributions of retrieval uncertainty, the CFDDA product and the simulated satellite retrievals are averaged at the same spatiotemporal resolutions (30 days and 10 • × 5 • as an example, in Figure 2c,d). The accuracy of ∆SWE C can be estimated by with ∆SWE true and ∆SWE estimate referencing the full and subsampled grid cells from Figure 2c,d, respectively. These distributions are visualized as box-plot distributions for some example spatiotemporal resolutions in Figure 3. Note that the distributions in Figure 3 are presented without fliers (Figure 3 with fliers is presented in the Appendix A). This is done for visualization's sake, as some fliers approach 10000%. The presence of these rare errors may be concerning, and they should be considered if CloudSat is to be used for seasonal or subseasonal measurements. However, these extreme errors did not exhibit any latitude dependence, indicating that they are more likely a result of the simulated Bayesian uncertainty rather than the sampling error. Since the simulated lognormal PDF is unbounded, it allows for the potential of extremely high errors that do not occur in CloudSat retrievals. This is an admitted flaw of the simulation. We do not believe it will strongly impact the quantitative results of this study, however, since it primarily affects outliers, and the results of this section will be evaluated in terms of percentiles. PE distributions cannot be directly evaluated, however, since the gridded B4 product carries too much uncertainty to be considered as a ∆ . An alternative measurement of uncertainty that can be evaluated against measured data is the percentage differences ( ), On seasonal time scales, ∆SWE C is predicted to have median PE near 50% at the smallest spatial scale of 2 • × 2 • , and near 20% at the largest spatial scale of 20 • × 5 • . On monthly time scales, estimated median PE are still near 50% at spatial scales as small as 5 • × 2 • . These estimates are not weighted by sampling frequency, so they are likely larger or smaller depending on latitude. Whisker ranges are typically several times larger between the 75th and 95th percentiles than the 5th and 25th percentile, indicating again that errors may often be considerable even if median errors are relatively low. The overall picture is that theoretical accuracy improves, and the variation of errors decreases, with increasing spatial and temporal scale.
PE distributions cannot be directly evaluated, however, since the gridded B4 product carries too much uncertainty to be considered as a ∆SWE true . An alternative measurement of uncertainty that can be evaluated against measured data is the percentage differences (PD), with ∆SWE 1 and ∆SWE 2 here representing the full and subsampled grid cells from Figure 2c,d, respectively. Box plots of predicted PD distributions are provided in Figure 4. The 90-day PD distributions are not calculated since there are no 90-day frozen time periods that can be used to generate ∆SWE cells for evaluation. As an example, if a CloudSat measurement could retrieve a snowfall accumulation of 20 mm with a median percent difference error of 50%, the true snow accumulation would be between 12 and 33 mm half of the time.
Atmosphere 2021, 12, x FOR PEER REVIEW It should be noted that the reanalysis model may not be capturing a natural di tion of snowfall rates. In particular, the heaviest snowfall rates that occur at the resolutions may not be well represented in an hourly gridded model. This discre between natural and re-analysis estimated precipitation rates may adversely influen simulations of CloudSat snowfall retrieval error distributions. Since the purpose simulation is just to recreate sampling and relative errors, it is also possible that h snowfall events would have no noticeable effect on results. Regardless, such extrem cipitation would be rare events, and they are not expected to impact the central di tion of the median, first, and third percentile in the displayed plots.

Evaluation of CloudSat Snow Accumulation Estimates
We evaluate our predicted CloudSat accuracy by comparing collocated ∆SWE ∆ . Evaluations are presented at all combinations of 5° × 5°, 10° × 5°, and 20° × 10-day, 20-day, and 30-day resolutions in Figure 5. ∆ can vary by two orders o nitude and the differences between ∆ and ∆ are heteroscedastic, so r are presented on a log-log scale. There is a visible agreement between the two me ments of ∆ in all panels, though the correlation decreases with finer space an resolutions. The strongest correlation coefficient of 0.8 occurs for 20° × 5° grid cells and 30 day time periods. The next highest correlation coefficient (0.7) comes for 1 grid cells on 30 day time periods. The lowest observed correlation coefficient is 0.4 5° grid cells over 10 day time periods.
At resolutions that provided relatively well correlated datasets, ∆ value It should be noted that the reanalysis model may not be capturing a natural distribution of snowfall rates. In particular, the heaviest snowfall rates that occur at the finest resolutions may not be well represented in an hourly gridded model. This discrepancy between natural and re-analysis estimated precipitation rates may adversely influence our simulations of CloudSat snowfall retrieval error distributions. Since the purpose of this simulation is just to recreate sampling and relative errors, it is also possible that heavier snowfall events would have no noticeable effect on results. Regardless, such extreme precipitation would be rare events, and they are not expected to impact the central distribution of the median, first, and third percentile in the displayed plots.

Evaluation of CloudSat Snow Accumulation Estimates
We evaluate our predicted CloudSat accuracy by comparing collocated ∆SWE C with ∆SWE C . Evaluations are presented at all combinations of 5 • × 5 • , 10 • × 5 • , and 20 • × 5 • and 10-day, 20-day, and 30-day resolutions in Figure 5. ∆SWE can vary by two orders of magnitude and the differences between ∆SWE C and ∆SWE B are heteroscedastic, so results are presented on a log-log scale. There is a visible agreement between the two measurements of ∆SWE in all panels, though the correlation decreases with finer space and time resolutions. The strongest correlation coefficient of 0.8 occurs for 20 • × 5 • grid Subplots (c,f,i) have 30 day time periods, Subplots (a-c) have 20° × 5° spatial resolution. Subplots (d-f) have 10° × 5° spatial resolution. Subplots (g-i) have 5° × 5° spatial resolution.
The differences between biases and correlations in different datasets may also reflect conditional biases on CloudSat or B4 ∆ estimates. For example, one might expect a low-bias from CloudSat at warmer temperatures when retrieved snowfall rates would be largest [40]. To investigate this possibility, we consider three sensitivities in Figure 6: A regional sensitivity, a sampling sensitivity, and a temperature sensitivity. Regions represent different longitude spans. NA (North America) refers to any central longitude less than 0, EA (Eurasia) refers to any central longitude less between 0° and 60°, ES (East Siberia) refers to the region between 60° and 90°, and WS (West Siberia) refers to any longitude greater than 90°. Sampling sensitivity is measured in terms of an overpass ratio, representing the relative number of days during a sampling period that a Cloudsat Measurement was conducted. Overpass ratio is mostly a function of latitude, but it will also be impacted by the 2012 battery anomaly. Temperature is represented by the maximum temperature during a grid cell measurement. At resolutions that provided relatively well correlated datasets, ∆SWE values tend to be larger when measured by B4. For 20 • × 5 • resolutions, this bias is fairly consistent along the ∆SWE domain. At finer resolutions, the relationship between ∆SWE and ∆SWE C appears to be skewed; the maximum ∆SWE B tends to be lower than the maximum ∆SWE C while ∆SWE B becomes disproportionately larger at low ∆SWE C . This behavior is likely related to undersampled CloudSat measurements. If a grid cell is not adequately capturing the average precipitation rate during a sample period, a chance observation of a few storms may lead to an anomalously high ∆SWE C . If significant storms are missed, it may lead to an anomalously low ∆SWE B . A simpler interpretation may also be that ∆SWE C becomes uncorrelated with ∆SWE B when CloudSat measurements are too sparse to provide meaningful snow accumulation information. Indeed, within the finest spatiotemporal resolutions tested (10 days, 5 • × 5 • ), the correlation coefficient increases with increasing sampling frequency at higher latitudes; the correlation coefficient between log-transformed ∆SWE B and ∆SWE C increases from 0.4, 0.5, and 0.6 as the along central grid latitudes increases from 62.5, 67.5, and 72.5.
The differences between biases and correlations in different datasets may also reflect conditional biases on CloudSat or B4 ∆SWE estimates. For example, one might expect a low-bias from CloudSat at warmer temperatures when retrieved snowfall rates would be largest [40]. To investigate this possibility, we consider three sensitivities in Figure 6: A regional sensitivity, a sampling sensitivity, and a temperature sensitivity. Regions represent different longitude spans. NA (North America) refers to any central longitude less than 0, EA (Eurasia) refers to any central longitude less between 0 • and 60 • , ES (East Siberia) refers to the region between 60 • and 90 • , and WS (West Siberia) refers to any longitude greater than 90 • . Sampling sensitivity is measured in terms of an overpass ratio, representing the relative number of days during a sampling period that a Cloudsat Measurement was conducted. Overpass ratio is mostly a function of latitude, but it will also be impacted by the 2012 battery anomaly. Temperature is represented by the maximum temperature during a grid cell measurement.
Atmosphere 2021, 12, x FOR PEER REVIEW 10 of 15 Figure 6. Collocated ∆ measurements for 10° × 5° grid cells at one month resolution, as in Figure 5, but colored with respect to region (a), overpass ratio (b), and maximum temperature (c).
These sensitivity tests do not reveal any obvious biases that are not better described through decreased correlation. Warmer temperatures lead to larger snowfall accumulations, though this could be expected without looking at any data. From a climate standpoint, it is interesting that EA contains the largest wide-scale accumulations of snow and the warmest frozen 30-day time periods in our North-Hemispheric dataset. Looking at Figure 6b, it appears that grid cells with a higher overpass frequency also have a stronger correlation. This is predictable for the same reason that correlations generally increased with increasing spatiotemporal resolution, though it does motivate a new presentation of results that accounts for differences in spatiotemporal resolution and latitude. Figure 7 presents the correlation coefficient of the nine resolutions from Figure 5, with datasets subset by central latitude.
The correlation coefficients paint an expected picture: correlation increases with decreasing spatiotemporal resolution and higher latitudes. Of course, the two are not totally independent, since grid cells will decrease in physical area with increasing area. Note that there are great variations in the number of measurements and range of ∆ within datasets, and while data are only presented if they provide a p-value less than 0.01, these differences likely still influence the results for data subsets with particularly low volumes. Additionally, note that the different latitudes represent different numbers of grid cells that are providing data. The 70-75°N grid is entirely comprised of one 20° × 5° cell in northern Siberia ( Figure 1) and there were no 20° × 5° grid cells that provided consistently freezing time periods below 60°N. Figure 7 does demonstrate, however, that the results from Figure 5 are likely more intricate than they appear. The 5° × 5° data 10-day data, which has the most data points and can be expected to be the most statistically robust measurement at all central latitudes, has a stronger correlation at 75°N than any represented dataset (5 × 5° 10 days, 10° × 5° 10 days, 5° × 5° to days) below 60°N, for example. Therefore, one can expect higher accuracy for finer spatiotemporal resolutions closer to the poles [38]. These sensitivity tests do not reveal any obvious biases that are not better described through decreased correlation. Warmer temperatures lead to larger snowfall accumulations, though this could be expected without looking at any data. From a climate standpoint, it is interesting that EA contains the largest wide-scale accumulations of snow and the warmest frozen 30-day time periods in our North-Hemispheric dataset. Looking at Figure 6b, it appears that grid cells with a higher overpass frequency also have a stronger correlation. This is predictable for the same reason that correlations generally increased with increasing spatiotemporal resolution, though it does motivate a new presentation of results that accounts for differences in spatiotemporal resolution and latitude. Figure 7 presents the correlation coefficient of the nine resolutions from Figure 5, with datasets subset by central latitude.
The correlation coefficients paint an expected picture: correlation increases with decreasing spatiotemporal resolution and higher latitudes. Of course, the two are not totally independent, since grid cells will decrease in physical area with increasing area. Note that there are great variations in the number of measurements and range of ∆SWE within datasets, and while data are only presented if they provide a p-value less than 0.01, these differences likely still influence the results for data subsets with particularly low volumes. Additionally, note that the different latitudes represent different numbers of grid cells that are providing data. The 70-75 • N grid is entirely comprised of one 20 • × 5 • cell in northern Siberia ( Figure 1) and there were no 20 • × 5 • grid cells that provided consistently freezing time periods below 60 • N. Figure 7 does demonstrate, however, that the results from Figure 5 are likely more intricate than they appear. The 5 • × 5 • data 10-day data, which has the most data points and can be expected to be the most statistically robust measurement at all central latitudes, has a stronger correlation at 75 • N than any represented dataset (5 × 5 • 10 days, 10 • × 5 • 10 days, 5 • × 5 • to days) below 60 • N, for example. Therefore, one can expect higher accuracy for finer spatiotemporal resolutions closer to the poles [38]. Finally, the collocated ∆ and ∆ are converted into percentage differences (Equation (4)) so that they can be used to evaluate the predicted CloudSat accuracy from Section 4. Measured distributions are presented alongside the simulated distributions from Section 4 in Figure 8. The measured accuracy (orange boxes) is in very good agreement with the simulated accuracy (blue boxes). Differences between the 25th, 50th, and 75th percentiles of measured and simulated distributions at all resolutions are less than 8%. This agreement is particularly remarkable considering that the initial predictions were based off two simulated precipitation products generated from the same model data, whereas the ∆ and ∆ products represent two independent datasets with their own biases and uncertainties.
Since the PE predictions in Figure 2 are a reconfiguration of the same data in Figure  3, we interpret Figure 8 to suggest our subsampling method provides confident estimates of ∆ uncertainty for seasonal and subseasonal time scales. In other words, we conclude that CloudSat can provide measurements of snow accumulation over wide areas with a predictable accuracy. Finally, the collocated ∆SWE C and ∆SWE B are converted into percentage differences (Equation (4)) so that they can be used to evaluate the predicted CloudSat accuracy from Section 4. Measured PD distributions are presented alongside the simulated PD distributions from Section 4 in Figure 8. The measured accuracy (orange boxes) is in very good agreement with the simulated accuracy (blue boxes). Differences between the 25th, 50th, and 75th percentiles of measured and simulated PD distributions at all resolutions are less than 8%. This agreement is particularly remarkable considering that the initial predictions were based off two simulated precipitation products generated from the same model data, whereas the ∆SWE C and ∆SWE B products represent two independent datasets with their own biases and uncertainties. Finally, the collocated ∆ and ∆ are converted into percentage differences (Equation (4)) so that they can be used to evaluate the predicted CloudSat accuracy from Section 4. Measured distributions are presented alongside the simulated distributions from Section 4 in Figure 8. The measured accuracy (orange boxes) is in very good agreement with the simulated accuracy (blue boxes). Differences between the 25th, 50th, and 75th percentiles of measured and simulated distributions at all resolutions are less than 8%. This agreement is particularly remarkable considering that the initial predictions were based off two simulated precipitation products generated from the same model data, whereas the ∆ and ∆ products represent two independent datasets with their own biases and uncertainties.
Since the PE predictions in Figure 2 are a reconfiguration of the same data in Figure  3, we interpret Figure 8 to suggest our subsampling method provides confident estimates of ∆ uncertainty for seasonal and subseasonal time scales. In other words, we conclude that CloudSat can provide measurements of snow accumulation over wide areas with a predictable accuracy. Since the PE predictions in Figure 2 are a reconfiguration of the same data in Figure 3, we interpret Figure 8 to suggest our subsampling method provides confident estimates of ∆SWE C uncertainty for seasonal and subseasonal time scales. In other words, we conclude that CloudSat can provide measurements of snow accumulation over wide areas with a predictable accuracy.

Summary
Active satellite radars such as CloudSat's CPR have a unique skill in providing radar profiles that can perform with similar accuracy over level ocean, land, ice, and snowcovered terrain. In this paper, we evaluated the potential of CloudSat to measure seasonal snowfall accumulation.
Using a simulation of global CloudSat snowfall rate retrievals, we generated a distribution of percentage errors at different spatial and temporal scales. We found that CloudSat seasonal snowfall accumulation measurements could have reasonable accuracy at different spatiotemporal resolutions, with median percent errors within 50% at 2 • × 2 • spatial scales as one example. To evaluate these predictions, we recast the uncertainty estimates into percentage differences so that CloudSat estimates could be compared against a best estimate of surface snowfall accumulation from the B4 gridded SWE model dataset. We also defined a common variable (∆SWE) that could be interpreted as the integration of the average snowfall rate from CloudSat or as the difference between gridded SWE at the beginning and end of a time period as long as temperatures remained below freezing.
We found 84 to 2550 collocated ∆SWE observations that were usable for evaluation, depending on resolution constraints. Correlation coefficients between CloudSat and B4 measured ∆SWE range between 0.4-0.8 from the finest (10 day, 5 • × 5 • ) to the coarsest (30 day, 20 • × 5 • ) resolution. Overall, the distribution of errors between CloudSat and B4 agreed very well with the predicted values; the 25th, 50th, and 75th percentile of percentdifferences between measured ∆SWE C and ∆SWE B were within several percentage points of the predicted values at all resolutions. Based on the success of our percentage difference predictions, we infer that the predictions of percentage error from the same data are similarly reliable. While a direct evaluation of CloudSat accumulation estimates over large spatial scales is not possible with our method, we believe this study strongly supports the capability of CloudSat to measure snow accumulation with a quantifiable accuracy at arbitrary space and time scales. We aim to find ways to decrease the uncertainty of ∆SWE C and increase the data available for ∆SWE C evaluations in follow-up studies.
The results of this study can only be considered applicable in regions that are similar to our sample grids-flat terrain (land or ocean). This study makes no predictions on the ability of CloudSat to measure accumulations over mountainous terrain, which would must be properly validated before examining CloudSat-derived snow information over areas of high elevation, such as Antarctica. However, we note that Antarctica, in particular, presents a host of other issues for snow-accumulation measurements, including a greater effect of post-battery-anomaly CloudSat measurement losses [21,40] and a greater impact of blowing snow on regional accumulation totals [41].  Data Availability Statement: NOAA temperature data is available from https://psl.noaa.gov. CFDDA reanalysis is available from https://rda.ucar.edu/datasets/ds604.0. CloudSat data is available from http://www.cloudsat.cira.colostate.edu/. All SWE products from the B4 dataset can be found in their respective citations.

Conflicts of Interest:
The authors declare no conflict of interest.