Description of the UCAR/CU Soil Moisture Product

Currently, the ability to use remotely sensed soil moisture to investigate linkages between the water and energy cycles and for use in data assimilation studies is limited to passive microwave data whose temporal revisit time is 2–3 days or active microwave products with a much longer (>10 days) revisit time. This paper describes a dataset that provides soil moisture retrievals, which are gridded to 36 km, for the upper 5 cm of the soil surface at sparsely sampled 6-hour intervals for +/− 38 degrees latitude for 2017–present. Retrievals are derived from the Cyclone Global Navigation Satellite System (CYGNSS) constellation, which uses GNSS-Reflectometry to obtain L-band reflectivity observations over the Earth’s surface. The product was developed by calibrating CYGNSS reflectivity observations to soil moisture retrievals from NASA’s Soil Moisture Active Passive (SMAP) mission. Retrievals were validated against observations from 171 in-situ soil moisture probes, with a median unbiased root-mean-square error (ubRMSE) of 0.049 cm3 cm−3 (standard deviation = 0.026 cm3 cm−3) and median correlation coefficient of 0.4 (standard deviation = 0.27). For the same stations, the median ubRMSE between SMAP and in-situ observations was 0.045 cm3 cm−3 (standard deviation = 0.025 cm3 cm−3) and median correlation coefficient was 0.69 (standard deviation = 0.27). The UCAR/CU Soil Moisture Product is thus complementary to SMAP, albeit with a larger random noise component, providing soil moisture retrievals for applications that require faster revisit times than passive microwave remote sensing currently provides.


Introduction
Near-surface (0-5 cm) soil moisture data are used for a variety of applications, such as drought monitoring [1][2][3], flood forecasting [4,5], initializing climate models [6], numerical weather prediction [7], quantifying the linkages between the surface and atmospheric boundary layers [8][9][10], optimizing irrigation strategies [11], and predicting infectious disease transmission [12]. All of these applications require soil moisture data at different spatial and temporal scales. Due to its ability to retrieve soil moisture over large parts of the globe, satellite remote sensing is often used to provide these data. Most often, microwave instruments are used due to their ability to see through clouds and penetrate at least some amount of overlying vegetation. However, even the microwave sensor with the fastest revisit time (passive radiometer), is constrained to providing data every 2-3 days [13]. While for many soil moisture applications, such as climate model initialization, a 2-3 day repeat period is more than sufficient, other applications require a higher density of observations. For example, the quantification of soil moisture memory is limited to the time scale of observations [10]. Several ongoing efforts are also using soil moisture as a proxy for other quantities, such as rainfall [14,15] or evaporation [16,17], and having daily or sub-daily data would increase the success of these techniques.
A constellation of microwave instruments, the Cyclone Global Navigation Satellite System (CYGNSS), is currently on orbit over the tropics, and because it is a constellation with numerous satellites, it can provide data more frequently than currently possible using a single instrument alone [18]. Although not designed for soil moisture remote sensing, this paper describes a dataset [19] that has been developed from the CYGNSS constellation that provides soil moisture retrievals for the majority of the tropics (+/− 38 degrees latitude) at daily and sub-daily time steps. Here, we describe the algorithm, its assumptions, and validation of the retrievals using in-situ soil moisture observations. In regions where the algorithm performance is acceptable, the CYGNSS soil moisture retrievals could be used to augment existing soil moisture satellite data for the aforementioned applications that require data at a higher temporal resolution.

CYGNSS
The CYGNSS mission was launched in December 2016. A NASA Earth Ventures Mission, CYGNSS consists of eight small satellites that orbit the tropics with an inclination of 35 degrees [18]. CYGNSS was designed to retrieve ocean surface wind speed during hurricane intensification events, and, as such, the receivers and software were optimized for ocean surface remote sensing. Each of the CYGNSS satellites carries with it two downward-looking antennas and a Global Navigation Satellite System-Reflectometry (GNSS-R) receiver.
GNSS-R is a form of L-band bistatic radar that utilizes transmitted navigation signals as the signal source. GNSS is an umbrella term that encompasses constellations like the United States' GPS, but also the EU's Galileo, Russia's GLONASS, China's BeiDou, India's IRNSS, and Japan's QZSS. In total, there are over 80 GNSS satellites currently in orbit (32 of which are GPS satellites), with more being planned in the coming years. To date, GNSS-R most commonly utilizes right hand circular polarized (RHCP) L-band signals transmitted from GPS satellites, which upon reflection are predominantly left hand circular polarized (LHCP).
Because L-band signals are commonly used for near-surface (0-5 cm) soil moisture remote sensing (e.g., NASA's Soil Moisture Active Passive (SMAP) or the European Space Agency's Soil Moisture and Ocean Salinity (SMOS) missions [13,20]), there is significant interest in quantifying the sensitivity of spaceborne GNSS-R data to changes in soil moisture. There had been evidence of sensitivity during previous GNSS-R flight campaigns (e.g., [21][22][23][24][25]), though the first analyses investigating the sensitivity of spaceborne GNSS-R data to soil moisture were only done after the launch of TechDemoSat-1 (TDS-1) in 2014. TDS-1 carried the same GNSS-R receiver as is used by CYGNSS, and these preliminary investigations did show that there appeared to be some sensitivity of GNSS-R to soil moisture [26,27]. Due to the fact that TDS-1 only collected GNSS-R data for one out of every eight days, however, developing a full soil moisture retrieval algorithm was not attempted.
The sheer amount of data collected by CYGNSS has allowed for an empirical investigation into the sensitivity of spaceborne GNSS-R data to soil moisture. The dataset that we describe here is the result of a modified version of the methodology presented in [28]. The full dataset is available online [19]. The results presented here are the outcome of just one of several ongoing efforts investigating the potential of CYGNSS to retrieve soil moisture. Several of these efforts are also using SMAP retrievals as a reference data set, though differences arise in assumptions regarding gridding, open water masking, and vegetation considerations. For example, [29] assumed that the vast majority of CYGNSS observations have a coarse (>25 km) spatial resolution and spatially averaged the data under this assumption, whereas in the algorithm described here we assume a 3 km spatial resolution. The authors of [30] also used SMAP data for retrieval algorithm development, though also made considerations for changes in vegetation, and [31] incorporated surface roughness data from IceSat-2, neither of which do we do here. Both [29,32] scale their soil moisture retrievals by knowing a priori the maximum and minimum values of soil moisture for a particular region, which we also do not do here. Several other researchers are exploring machine learning techniques for CYGNSS soil moisture retrieval (e.g., [33,34]). Given the differences in validation data sets and time periods used for validation, it is difficult to Figure 1. Schematic of the GNSS-R technique. A GNSS satellite transmits (Tx) a signal towards the Earth's surface. Part of this signal reflects in the forward (specular) direction and back into space. A GNSS-R receiver (Rx) onboard a low-Earth-orbiting satellite, with a downward-looking antenna, records this signal. The point on the Earth's surface where the signal reflects depends upon the positions of the transmitting and receiving satellites. The roughness of the surface at the reflection point determines the spatial resolution of the signal, with rougher surfaces producing larger spatial footprints. Nearly always, the receiver integrates the reflected signal over a period of time, which elongates the spatial footprint in the along-track direction.
The point of reflection on the Earth's surface is determined by the positions of the transmitting and receiving satellites. Because these positions are constantly changing, the reflection points are pseudo-randomly distributed on the Earth's surface (see Figure 2a for examples), which is different than traditional remote-sensing techniques, which collect data in repeatable swaths. This means that, for a given point of the Earth's surface, observations could be recorded one hour apart, and then there could be no observations for the next several hours, for example [35]. Observations are recorded at all times of day, again, unlike traditional remote-sensing techniques, which tend to observe a particular location at a particular time of day. The pseudo-random distribution of observations, over time, aggregate such that complete maps of the reflected signal can be made (Figure 2b). A GNSS-R receiver (Rx) onboard a low-Earth-orbiting satellite, with a downward-looking antenna, records this signal. The point on the Earth's surface where the signal reflects depends upon the positions of the transmitting and receiving satellites. The roughness of the surface at the reflection point determines the spatial resolution of the signal, with rougher surfaces producing larger spatial footprints. Nearly always, the receiver integrates the reflected signal over a period of time, which elongates the spatial footprint in the along-track direction.
The point of reflection on the Earth's surface is determined by the positions of the transmitting and receiving satellites. Because these positions are constantly changing, the reflection points are pseudo-randomly distributed on the Earth's surface (see Figure 2a for examples), which is different than traditional remote-sensing techniques, which collect data in repeatable swaths. This means that, for a given point of the Earth's surface, observations could be recorded one hour apart, and then there could be no observations for the next several hours, for example [35]. Observations are recorded at all times of day, again, unlike traditional remote-sensing techniques, which tend to observe a particular location at a particular time of day. The pseudo-random distribution of observations, over time, aggregate such that complete maps of the reflected signal can be made (Figure 2b).
The spatial resolution of the reflecting signal depends on the roughness of the surface at and near the reflection point [36]. Here, roughness includes possible contributions from many factors and not just the micro-scale roughness of the soil, such as roughness introduced from vegetation canopies, wind-roughened water, and macro-scale roughness from topography. If the surface is relatively rough, then the reflected signal is incoherent and comes from an area called the "glistening zone," which is on the order of several kilometers (~25 km in the case of the ocean surface), though recent research is beginning to show that where incoherent scattering originates from over the land surface is highly dependent on the local topography and may not be definable with a simple 25-km radius [37]. If the surface is relatively smooth, then the reflected signal is coherent and comes from an area defined by the first Fresnel zone. For a low-Earth-orbiting GNSS-R satellite, this area is on the order of 0.5 kilometers, though this also depends on incidence angle (0.33 km at 0 deg incidence, 1.3 km at 60 deg incidence) [38]. In practical terms, determining whether the reflecting surface is smooth enough to produce a coherent reflection is challenging, as surface roughness is an extremely difficult parameter to measure. Existing measurements show that surface roughness tends to be on the order of 2-3 cm [39][40][41], though surface roughness likely varies considerably on scales as large as the first Fresnel zone. There are also likely complications from macro-scale surface roughness due to topographic variations that are not well understood. Modelling efforts are beginning to shed light into the role of topography in GNSS-R signal scattering (e.g. [42,43]), with some studies showing that the sensitivity of the GNSS-R signal to soil moisture is relatively unaffected by complex topography [42].
In all likelihood, most signals are probably a combination of incoherent and coherent scattering. In the algorithm presented here, we ignore contributions from incoherent scattering. Assuming that the reflected signal is always coherent may lead to increased uncertainties in final soil moisture retrievals, though currently these uncertainties are not able to be quantified without a better knowledge of the precise conditions that lead to incoherence.
Due to the fact that CYGNSS was designed to be an ocean sensor, where the reflected signal is relatively weak, the processing software integrates the signal over a period of 1 second for each "observation." During that time, the spacecraft has moved approximately 7 km, which means that the smallest along-track spatial resolution possible over land is 7 km, though the across track resolution could still be the theoretical 0.5 km. This results in the spatial footprint having a minimum size of 7 x 0.5 km, with the signal being smeared out along track (Figure 2a). In mid-2019, the CYGNSS Ellipses are approximately 7 × 0.5 km in size, which is the expected footprint if the surface has little surface or topographic roughness. Dots are the location of the specular reflection points recorded by CYGNSS. This example shows coverage for one typical 24-hr period. Grid cells are the size of the EASE2 36-km grid cells. (b) Over time, observations made by CYGNSS completely cover the land surface, producing maps such as this (CYGNSS observations overlaid on DEM). Here, higher values could indicate a wet surface or a relatively flat surface.
The spatial resolution of the reflecting signal depends on the roughness of the surface at and near the reflection point [36]. Here, roughness includes possible contributions from many factors and not just the micro-scale roughness of the soil, such as roughness introduced from vegetation canopies, wind-roughened water, and macro-scale roughness from topography. If the surface is relatively rough, then the reflected signal is incoherent and comes from an area called the "glistening zone," which is on the order of several kilometers (~25 km in the case of the ocean surface), though recent research is beginning to show that where incoherent scattering originates from over the land surface is highly dependent on the local topography and may not be definable with a simple 25-km radius [37]. If the surface is relatively smooth, then the reflected signal is coherent and comes from an area defined by the first Fresnel zone. For a low-Earth-orbiting GNSS-R satellite, this area is on the order of 0.5 km, though this also depends on incidence angle (0.33 km at 0 deg incidence, 1.3 km at 60 deg incidence) [38]. In practical terms, determining whether the reflecting surface is smooth enough to produce a coherent reflection is challenging, as surface roughness is an extremely difficult parameter to measure. Existing measurements show that surface roughness tends to be on the order of 2-3 cm [39][40][41], though surface roughness likely varies considerably on scales as large as the first Fresnel zone. There are also likely complications from macro-scale surface roughness due to topographic variations that are not well understood. Modelling efforts are beginning to shed light into the role of topography in GNSS-R signal scattering (e.g., [42,43]), with some studies showing that the sensitivity of the GNSS-R signal to soil moisture is relatively unaffected by complex topography [42].
In all likelihood, most signals are probably a combination of incoherent and coherent scattering. In the algorithm presented here, we ignore contributions from incoherent scattering. Assuming that the reflected signal is always coherent may lead to increased uncertainties in final soil moisture retrievals, though currently these uncertainties are not able to be quantified without a better knowledge of the precise conditions that lead to incoherence.
Due to the fact that CYGNSS was designed to be an ocean sensor, where the reflected signal is relatively weak, the processing software integrates the signal over a period of 1 s for each "observation." During that time, the spacecraft has moved approximately 7 km, which means that the smallest along-track spatial resolution possible over land is 7 km, though the across track resolution could still be the theoretical 0.5 km. This results in the spatial footprint having a minimum size of 7 × 0.5 km, Remote Sens. 2020, 12, 1558 5 of 26 with the signal being smeared out along track (Figure 2a). In mid-2019, the CYGNSS integration time was decreased from 1 to 0.5 s, which means the minimum spatial footprint is currently 3.5 × 0.5 km.
The reflected GNSS signal is recorded by the receiver in the form of what is called a delay-Doppler map (DDM). A DDM is created by cross-correlating the received signal with a locally generated replica that has been modified considering different path delays (resulting from the path distance between the transmitter, reflecting surface, and receiver, as shown in Figure 1) and Doppler shifts (resulting from the relative motions of the transmitter, reflecting surface, and the receiver). Two examples of DDMs are shown in Figure 3. Figure 3a is an example of a DDM recorded by CYGNSS over the land surface, and Figure 3b is an example of a DDM recorded over the ocean surface. The horseshoe shape of the ocean DDM is an indication that the reflection is incoherent and comes from a large, rough area. The absence of a horseshoe shape in Figure 3a indicates that the reflection is mostly coherent, and comes from a smaller, smoother area. The maximum power of each DDM is affected by surface roughness, the dielectric constant of the surface, and the vegetation overlying the surface, which is explained further below.
Remote Sens. 2020, 12, x FOR PEER REVIEW 5 of 24 integration time was decreased from 1 to 0.5 seconds, which means the minimum spatial footprint is currently 3.5 x 0.5 km. The reflected GNSS signal is recorded by the receiver in the form of what is called a delay-Doppler map (DDM). A DDM is created by cross-correlating the received signal with a locally generated replica that has been modified considering different path delays (resulting from the path distance between the transmitter, reflecting surface, and receiver, as shown in Figure 1) and Doppler shifts (resulting from the relative motions of the transmitter, reflecting surface, and the receiver). Two examples of DDMs are shown in Figure 3. Figure 3a is an example of a DDM recorded by CYGNSS over the land surface, and Figure 3b is an example of a DDM recorded over the ocean surface. The horseshoe shape of the ocean DDM is an indication that the reflection is incoherent and comes from a large, rough area. The absence of a horseshoe shape in Figure 3a indicates that the reflection is mostly coherent, and comes from a smaller, smoother area. The maximum power of each DDM is affected by surface roughness, the dielectric constant of the surface, and the vegetation overlying the surface, which is explained further below. DDMs are most commonly used by summarizing them into one metric or observable. The observables that are commonly used for soil moisture estimation are the peak cross-correlation of each DDM or the peak divided by the noise floor (signal-to-noise ratio, SNR). The value of the peak cross-correlation of each DDM is related to surface characteristics at the specular reflection point of the GNSS signal, including the roughness of the surface, the surface dielectric constant, and properties related to any overlying vegetation such as vegetation water content and structure of the canopy. However, the peak of each DDM is also affected by variables unrelated to the reflecting surface, such as antenna gain and range. We describe our procedure to correct for these effects in the next section.

Introduction to the Algorithm
The algorithm presented here uses collocated soil moisture retrievals from the Soil Moisture Active Passive (SMAP) mission to calibrate concurrent (same calendar day) CYGNSS observations throughout a calibration period [44]. For a given location, a linear relationship between SMAP soil moisture and CYGNSS surface reflectivity observations is determined, and the relationship is used to transform all CYGNSS observations into soil moisture, even at times when there are no corresponding SMAP data points. Once the calibration is performed, it is applied to data outside the DDMs are most commonly used by summarizing them into one metric or observable. The observables that are commonly used for soil moisture estimation are the peak cross-correlation of each DDM or the peak divided by the noise floor (signal-to-noise ratio, SNR). The value of the peak cross-correlation of each DDM is related to surface characteristics at the specular reflection point of the GNSS signal, including the roughness of the surface, the surface dielectric constant, and properties related to any overlying vegetation such as vegetation water content and structure of the canopy. However, the peak of each DDM is also affected by variables unrelated to the reflecting surface, such as antenna gain and range. We describe our procedure to correct for these effects in the next section.

Introduction to the Algorithm
The algorithm presented here uses collocated soil moisture retrievals from the Soil Moisture Active Passive (SMAP) mission to calibrate concurrent (same calendar day) CYGNSS observations throughout a calibration period [44]. For a given location, a linear relationship between SMAP soil moisture and CYGNSS surface reflectivity observations is determined, and the relationship is used to transform all CYGNSS observations into soil moisture, even at times when there are no corresponding SMAP data points. Once the calibration is performed, it is applied to data outside the calibration period such that SMAP data are no longer required for ongoing CYGNSS soil moisture retrievals.
Using SMAP data for calibration of course comes with many drawbacks, the major one being that SMAP soil moisture retrievals are not soil moisture observations and have their own error and uncertainties. One must be careful when using CYGNSS data in areas where it is known that SMAP performs poorly. In addition, SMAP's 40 km spatial resolution is likely coarser than that of CYGNSS. Intelligent upscaling of CYGNSS data to the 36-km EASE-2 grid that SMAP uses is necessary. If the resolution of CYGNSS is smaller than 36 km, then this effectively degrades the CYGNSS data and does not utilize it to its full potential. Despite the drawbacks associated with calibrating CYGNSS data with SMAP retrievals, SMAP data are considered to be one of, if not the, most accurate of the existing soil moisture products [45,46].

Derivation of P r,eff
The first step in the retrieval algorithm is to calculate the effective surface reflectivity, which is the peak value of each delay-Doppler map (DDM) corrected for gain, range, and incidence angle effects. We call the uncorrected peak value of each DDM P r . P r is affected by surface characteristics, such as the dielectric constant, roughness, and vegetation, as well as the gain of the receiving antenna, the bistatic range, and the power transmitted by each GPS satellite. We correct P r for antenna gain, range, and GPS transmit power assuming a coherent reflection: where P t is the transmitted RHCP power, G t is the gain of the transmitting antenna, R ts is the distance between the transmitter and the specular reflection point, R sr is the distance between the specular reflection point and the receiver, G r is the gain of the receiving antenna, λ is the GPS wavelength (0.19 m), and Γ rl is the effective surface reflectivity. We then solve for Γ rl , which is the term affected by the surface roughness, dielectric constant of the soil, and vegetation by first converting all terms to dB: Incidence angle is also expected to affect a coherent reflection, though this effect is only significant when the incidence angle is above 40 or 50 degrees. We correct for incidence angle in a similar way as in [29] by modelling how the effective surface reflectivity should change as a function of incidence angle, using established relationships (e.g., [47]). We call Γ rl observations that have been corrected for incidence angle "P r,eff ," which stands for the effective surface reflectivity.

Outlier Identification
Because CYGNSS was not optimized for remote sensing of the land surface, we remove observations that are flagged with standard quality measures as well as use empirical quality control that we have found to increase the effectiveness of our algorithm. Standard quality flags that we use are the following: "S-band transmitter powered up," "spacecraft attitude error," "black body DDM," "DDM is a test pattern," "direct signal in DDM," and "low confidence in the GPS EIRP estimate." Although it has been recommended that observations from the GPS Block IIF satellites be removed due to larger variations in GPS transmit power and consequently larger uncertainties in P r [48], we keep these observations, as removing them reduces data volume by more than 30%.
Any data recorded before December 2017 reflecting from surface elevations greater than 600 m are removed. Prior to this time, the satellites did not record DDMs that contained the full surface reflection coming from these elevations.
We perform additional quality control measures that are not standardized in the analysis of CYGNSS data over oceans. These measures were developed after a detailed examination of outliers in regions where the surface is relatively constant over time (e.g., deserts). We remove any observation where P r is less than 2 dB above the noise floor (SNR < 2 dB). We remove observations with a receiver antenna gain less than 0 dB, observations with an incidence angle greater than 65 degrees, any data with P r occurring in a delay bin outside of 7-10 pixels (exclusive), and any observations that do not have a SNR less than or equal to the receiver antenna gain plus 14 dB.

Removal of Open Water Observations
The removal of specular reflection points that are affected by open water is a critical step before retrieving soil moisture. Even small water bodies~25 m wide can significantly affect P r,eff (e.g., Figure 1 from [28]). Our algorithm uses the Global Surface Water Explorer (GSWE) dataset described in [49], which is a 30-m water mask derived from Landsat data. Because it is derived from optical data, it cannot sense water beneath vegetation, though the L-band CYGNSS data is likely sensitive to some amount of water underneath a vegetation canopy. Additionally, because the GSWE is quasi-static in that it describes open water occurrence or when water occurs seasonally, it is not concurrent with the CYGNSS observations. The open water masking effort is thus imperfect, though we have found that it successfully removes a large amount of CYGNSS observations that are affected by open water.
The current algorithm removes open water using the "seasonality" data product provided by the GSWE. This product indicates how many months out of a year a pixel is inundated (0-12). For our purposes, we make this product binary by considering any value greater than 1 to be flagged as open water, and anything below this not to be open water. This is done because occasionally permanent water bodies are seasonally covered by vegetation, which makes the GSWE represent them as less than 12 (permanent). For each specular reflection point, we find the amount of water within a 7 × 7 km region surrounding the point. This is a simplification of the actual footprint, but it is computationally more efficient than rotating axes to form actual ellipses, which themselves are simplifications and not well quantified. If the amount of water in the 7 × 7 km region exceeds 1%, we remove that CYGNSS observation from consideration. Changing these thresholds or the size of the 7 × 7 km region does change the results, though never uniformly increasing or decreasing error across regions. It is possible that future versions of the soil moisture product will use a different open water masking procedure or incorporate CYGNSS coherence detectors, such as that described in [37], to aid in the detection of small water bodies.

Conversion of P r,eff to Soil Moisture
We now describe how P r,eff is transformed into soil moisture, using SMAP soil moisture retrievals to calibrate CYGNSS observations. Our calibration period was chosen to be 17 March 2017-1 October 2018. This is an extended period beyond what was shown in [28]. In our calibration, we use all SMAP retrievals regardless of SMAP quality flags to allow for the retrieval of soil moisture from CYGNSS for the entirety of CYGNSS' observational area. Of course, users should be cautious to use CYGNSS soil moisture retrievals from regions regularly flagged by SMAP, such as the dense forests of South America and Africa, which are indicated in the CYGNSS quality flags.
The algorithm is based on the assumption that P r,eff is linearly related to SMAP soil moisture. The linear relationship is allowed to vary spatially, but we assume that it does not change over time. For a given location, we calculate the slope of the best-fit linear regression between concurrent (same calendar day) SMAP soil moisture retrievals and CYGNSS P r,eff , after having removed the mean of each for the entire time series. Before we can describe this in more detail, however, we must explain what "a given location" means in this context.
In Section 1.2, we described how the smallest theoretical (no roughness) spatial footprint of CYGNSS over land is approximately 7 × 0.5 km, or 3.5 × 0.5 km for data recorded after mid-2019. What the actual spatial resolution is over land is a matter of debate within the GNSS-R community, though our own analysis of how P r,eff varies with small landcover or topographic features indicates to us that the effective footprint is likely only a few km, much smaller than SMAP's 40 km resolution [28]. If, for every SMAP observation, the CYGNSS observations completely sampled the 36-km EASE-2 grid cell used by SMAP, then a simple averaging could be used to aggregate CYGNSS observations to match with the SMAP retrieval. However, this is not the case. For every SMAP observation, there will likely be several CYGNSS observations within the grid cell, though not enough to completely sample the grid cell, if the spatial footprint is small (< 10 km). In this case, simple averaging will lead to variations in the day-to-day signal due to differential sampling of landcover types and topography within the SMAP pixel, which could be mistaken for variations in soil moisture.
In order to avoid this, we first grid our P r,eff observations to~3 × 3 km "subcells," retrieve soil moisture from the subcells, and then aggregate the gridded observations to the 36-km SMAP EASE-2 grid resolution (Figure 4a). This subcell approach minimizes the confounding effects of landcover and topography on P r,eff . The number of points per subcell in the calibration period are shown in Figure 5-subcells with less than three observations were not used for calibration.
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 24 What the actual spatial resolution is over land is a matter of debate within the GNSS-R community, though our own analysis of how Pr,eff varies with small landcover or topographic features indicates to us that the effective footprint is likely only a few km, much smaller than SMAP's 40 km resolution [28]. If, for every SMAP observation, the CYGNSS observations completely sampled the 36-km EASE-2 grid cell used by SMAP, then a simple averaging could be used to aggregate CYGNSS observations to match with the SMAP retrieval. However, this is not the case. For every SMAP observation, there will likely be several CYGNSS observations within the grid cell, though not enough to completely sample the grid cell, if the spatial footprint is small (< 10 km). In this case, simple averaging will lead to variations in the day-to-day signal due to differential sampling of landcover types and topography within the SMAP pixel, which could be mistaken for variations in soil moisture.
In order to avoid this, we first grid our Pr,eff observations to ~3 x 3 km "subcells," retrieve soil moisture from the subcells, and then aggregate the gridded observations to the 36-km SMAP EASE-2 grid resolution (Figure 4a). This subcell approach minimizes the confounding effects of landcover and topography on Pr,eff. The number of points per subcell in the calibration period are shown in Figure 5-subcells with less than three observations were not used for calibration. can be different for each subcell.  What the actual spatial resolution is over land is a matter of debate within the GNSS-R community, though our own analysis of how Pr,eff varies with small landcover or topographic features indicates to us that the effective footprint is likely only a few km, much smaller than SMAP's 40 km resolution [28]. If, for every SMAP observation, the CYGNSS observations completely sampled the 36-km EASE-2 grid cell used by SMAP, then a simple averaging could be used to aggregate CYGNSS observations to match with the SMAP retrieval. However, this is not the case. For every SMAP observation, there will likely be several CYGNSS observations within the grid cell, though not enough to completely sample the grid cell, if the spatial footprint is small (< 10 km). In this case, simple averaging will lead to variations in the day-to-day signal due to differential sampling of landcover types and topography within the SMAP pixel, which could be mistaken for variations in soil moisture.
In order to avoid this, we first grid our Pr,eff observations to ~3 x 3 km "subcells," retrieve soil moisture from the subcells, and then aggregate the gridded observations to the 36-km SMAP EASE-2 grid resolution (Figure 4a). This subcell approach minimizes the confounding effects of landcover and topography on Pr,eff. The number of points per subcell in the calibration period are shown in Figure 5-subcells with less than three observations were not used for calibration. can be different for each subcell.  Within each subcell, we calculate the linear regression between SMAP soil moisture and P r,eff match-ups (occurring on the same calendar day), after having removed the mean values of both SMAP soil moisture and P r,eff in that subcell. The slope of the linear regression is β, which is conceptualized in Figure 4b. The correlation coefficients for these relationships are shown in Figure 6. β varies spatially (Figure 7). Some of the spatial variability is likely real, and it could be the result of spatial variations in topography or landcover. Some of it, however, is likely the result of the fact that some arid regions do not have enough soil moisture variability throughout the year in order to adequately calculate β. In these regions, β is artificially low because any noise in the CYGNSS observations will significantly affect the linear regression [28].
Remote Sens. 2020, 12, x FOR PEER REVIEW 9 of 24 Within each subcell, we calculate the linear regression between SMAP soil moisture and Pr,eff match-ups (occurring on the same calendar day), after having removed the mean values of both SMAP soil moisture and Pr,eff in that subcell. The slope of the linear regression is , which is conceptualized in Figure 4b. The correlation coefficients for these relationships are shown in Figure  6.
varies spatially (Figure 7). Some of the spatial variability is likely real, and it could be the result of spatial variations in topography or landcover. Some of it, however, is likely the result of the fact that some arid regions do not have enough soil moisture variability throughout the year in order to adequately calculate . In these regions, is artificially low because any noise in the CYGNSS observations will significantly affect the linear regression [28]. is then used to estimate soil moisture from CYGNSS for data falling outside the calibration period as well as data within the calibration period when there are no SMAP match-ups (since SMAP has a 2-3 day overpass period) using the following equation: where , is the soil moisture derived from one CYGNSS observation at time, t, , , is one observation of , at time, t, , , is the mean value of , for a particular subcell during the calibration period, and , is the mean SMAP soil moisture during the calibration period. The mean values of both SMAP and CYGNSS during the calibration period serve as our reference values, in order to return an absolute value of soil moisture from CYGNSS. Once retrievals are made for individual subcells, the mean of the retrievals for all subcells within each 36km grid cell is used as the final CYGNSS soil moisture retrieval. Note that not all subcells within one 36-km grid cell will be sampled for each time step, and there could be times when just one subcell is used for the retrieval. The timesteps at which retrievals are averaged is described below.

Daily and Sub-Daily Retrievals
Soil moisture retrievals are currently provided on daily and sub-daily (6 hourly) time steps. For the daily retrievals, we average all CYGNSS soil moisture retrievals within a particular 36-km grid cell that fall within the 24-hour time period. For the sub-daily retrievals, we average all retrievals for a grid cell in 6-hour intervals, which are currently midnight-6 am, 6 am-noon, noon-6 pm, and 6 pm-midnight (UTC). Note that, just because retrievals are provided at 6-hour time steps, it does not mean that there will be values everywhere at every time step. There will be missing values in some grid cells even when aggregated to a 24-hour time step.

Quality Control
There is currently only minimal quality control of the soil moisture retrievals themselves. Soil moisture retrievals that are either less than 0.01 cm 3 cm −3 or greater than 0.65 cm 3 cm −3 are removed, Figure 7. The slope of the linear regression between CYGNSS effective reflectivity observations and SMAP soil moisture (β). This represents the sensitivity of CYGNSS to soil moisture, with lower values indicating a higher sensitivity-though low values are also found in regions where soil moisture does vary significantly. Higher values of β mean that CYGNSS observations are not as sensitive to soil moisture. Imperfect open water masking will cause an apparent insensitivity to soil moisture.
β is then used to estimate soil moisture from CYGNSS for data falling outside the calibration period as well as data within the calibration period when there are no SMAP match-ups (since SMAP has a 2-3 day overpass period) using the following equation: 10 of 26 where SM CYGNSS, t is the soil moisture derived from one CYGNSS observation at time, t, P r,e f f , t is one observation of P r,e f f at time, t, P r,e f f ,cal is the mean value of P r,e f f for a particular subcell during the calibration period, and Soil moisture SMAP, cal is the mean SMAP soil moisture during the calibration period. The mean values of both SMAP and CYGNSS during the calibration period serve as our reference values, in order to return an absolute value of soil moisture from CYGNSS. Once retrievals are made for individual subcells, the mean of the retrievals for all subcells within each 36-km grid cell is used as the final CYGNSS soil moisture retrieval. Note that not all subcells within one 36-km grid cell will be sampled for each time step, and there could be times when just one subcell is used for the retrieval. The timesteps at which retrievals are averaged is described below.

Daily and Sub-Daily Retrievals
Soil moisture retrievals are currently provided on daily and sub-daily (6 hourly) time steps. For the daily retrievals, we average all CYGNSS soil moisture retrievals within a particular 36-km grid cell that fall within the 24-hour time period. For the sub-daily retrievals, we average all retrievals for a grid cell in 6-hour intervals, which are currently midnight-6 am, 6 am-noon, noon-6 pm, and 6 pm-midnight (UTC). Note that, just because retrievals are provided at 6-hour time steps, it does not mean that there will be values everywhere at every time step. There will be missing values in some grid cells even when aggregated to a 24-hour time step.

Quality Control
There is currently only minimal quality control of the soil moisture retrievals themselves. Soil moisture retrievals that are either less than 0.01 cm 3 cm −3 or greater than 0.65 cm 3 cm −3 are removed, though users should apply their own additional thresholds when using retrievals in specific regions that may have higher or lower residual or saturated moisture contents. Figure 8 shows the unbiased root-mean-square difference (ubRMSD) between CYGNSS and SMAP soil moisture retrievals for the calibration period (18 March 2017-1 October 2018). Semi-transparent regions are those frequently flagged by SMAP as being poor quality. Higher values of ubRMSD tend to cluster in areas that flood seasonally, which indicates imperfect open water masking, or have greater soil moisture variability throughout the year. It is "easier" to have a lower ubRMSD in regions with lower soil moisture variability.

Quality Flags
Static quality flags are provided in a separate file from the soil moisture retrievals. Examining the quality flags is a crucial first step before using the CYGNSS soil moisture retrievals or interpreting the empirical sensitivity of CYGNSS to soil moisture (β). The flags were developed to encourage users to exercise caution when using soil moisture from certain regions, or to be careful in the interpretation of β from these regions. The following criteria were used in the development of the flags:

1.
Regions where CYGNSS observations were calibrated to SMAP data where a large portion (>90%) of the SMAP soil moisture retrievals were flagged as "not recommended for retrieval." These data tend to be in regions that are forested, with significant topography, or near coastlines. Although the overall ubRMSE between CYGNSS retrievals and in-situ observations remains largely unchanged for sites located in these regions, there are fewer instances where the ubRMSE is < 0.04 cm 3 cm −3 .

2.
Regions where CYGNSS was calibrated to SMAP data with a small range of soil moisture values (<0.1 cm 3 cm −3 ). This indicates a larger uncertainty in β [28]. The ubRMSE between CYGNSS and in-situ observations in these regions is low (0.0395 cm 3 cm −3 ) due to the fact that there is only small variability in soil moisture. In these regions, because there is a larger uncertainty in β, we do not want users to make any interpretations about β, or attempt to compare it to modelled sensitivity.

3.
Regions where the ubRMSD between CYGNSS and SMAP was large for the calibration period (> 0.08 cm 3 cm −3 ). The ubRMSE between CYGNSS and in-situ stations with this condition was higher than average (0.0561 cm 3 cm −3 ). Users are advised to use caution when analysing retrievals from these areas. In-situ stations used for validation are described in the next section.

4.
Regions with few observations in the 36 km grid cell for calibration, leading to less certain retrievals outside the calibration period (n < 100). β is also more uncertain in these regions.

5.
Regions where P r,e f f ,cal is low (<5 dB). There is a higher likelihood that roughness or vegetation effects dominate in these areas. Soil moisture retrievals from these areas are particularly suspect-the ubRMSE between CYGNSS and in-situ observations located in these regions is 0.07 cm 3 cm −3 . We advise users against using CYGNSS soil moisture retrievals at these locations.
near coastlines. Although the overall ubRMSE between CYGNSS retrievals and in-situ observations remains largely unchanged for sites located in these regions, there are fewer instances where the ubRMSE is < 0.04 cm 3 cm −3 . 2. Regions where CYGNSS was calibrated to SMAP data with a small range of soil moisture values (<0.1 cm 3 cm −3 ). This indicates a larger uncertainty in [28]. The ubRMSE between CYGNSS and in-situ observations in these regions is low (0.0395 cm 3 cm −3 ) due to the fact that there is only small variability in soil moisture. In these regions, because there is a larger uncertainty in , we do not want users to make any interpretations about , or attempt to compare it to modelled sensitivity. 3. Regions where the ubRMSD between CYGNSS and SMAP was large for the calibration period (> 0.08 cm 3 cm −3 ). The ubRMSE between CYGNSS and in-situ stations with this condition was higher than average (0.0561 cm 3 cm −3 ). Users are advised to use caution when analysing retrievals from these areas. In-situ stations used for validation are described in the next section. 4. Regions with few observations in the 36 km grid cell for calibration, leading to less certain retrievals outside the calibration period (n < 100). is also more uncertain in these regions. 5. Regions where , , is low (<5 dB). There is a higher likelihood that roughness or vegetation effects dominate in these areas. Soil moisture retrievals from these areas are particularly suspect-the ubRMSE between CYGNSS and in-situ observations located in these regions is 0.07 cm 3 cm −3 . We advise users against using CYGNSS soil moisture retrievals at these locations. Figure 8. Unbiased root-mean-square difference (ubRMSD) between SMAP and CYGNSS soil moisture retrievals. Regions where SMAP always flags the data as being 'poor quality' are semitransparent, such as the Amazon, Central Africa, Indonesia, Japan, Southeast Asia, and the majority of the Eastern United States. Higher ubRMSD in regions with "good-quality" SMAP data tend to be found in regions that are seasonally flooded or near coastlines. It is possible that in these areas, the seasonal water influence on CYGNSS effective reflectivity may overwhelm the soil moisture signal. Alternatively, it is also possible that the SMAP brightness temperature observations are actually responding to the increase in flooded area instead of soil moisture.

Results
We validated the UCAR/CU CYGNSS Soil Moisture Product at 171 in-situ soil moisture sites from six different networks: COSMOS [50], PBOH2O [51], SCAN [52], SNOTEL [53], USCRN [54], and OzNet [55]. The time period chosen for validation was October 2, 2018 -December 31, 2019, so as to not overlap with the calibration period. However, not all stations had data for the entire validation time period. In particular, sites within the PBOH2O network (a ground-based GNSS reflectometry network) only contained data through the first six weeks of the validation time period. Note that the majority of the validation sites are located in the United States-just because there may Figure 8. Unbiased root-mean-square difference (ubRMSD) between SMAP and CYGNSS soil moisture retrievals. Regions where SMAP always flags the data as being 'poor quality' are semi-transparent, such as the Amazon, Central Africa, Indonesia, Japan, Southeast Asia, and the majority of the Eastern United States. Higher ubRMSD in regions with "good-quality" SMAP data tend to be found in regions that are seasonally flooded or near coastlines. It is possible that in these areas, the seasonal water influence on CYGNSS effective reflectivity may overwhelm the soil moisture signal. Alternatively, it is also possible that the SMAP brightness temperature observations are actually responding to the increase in flooded area instead of soil moisture.

Results
We validated the UCAR/CU CYGNSS Soil Moisture Product at 171 in-situ soil moisture sites from six different networks: COSMOS [50], PBOH2O [51], SCAN [52], SNOTEL [53], USCRN [54], and OzNet [55]. The time period chosen for validation was 2 October 2018-31 December 2019, so as to not overlap with the calibration period. However, not all stations had data for the entire validation time period. In particular, sites within the PBOH2O network (a ground-based GNSS reflectometry network) only contained data through the first six weeks of the validation time period. Note that the majority of the validation sites are located in the United States-just because there may be acceptable agreement between CYGNSS and in-situ observations at these locations, it does not mean that CYGNSS is expected to perform as well in environments extremely disparate from those typical of the United States (e.g., tropical rainforests). Before validation, we removed obviously non-sensical soil moisture data from the in-situ records, for example soil moisture observations below 0 cm 3 cm −3 or greater than 1 cm 3 cm −3 .
We calculated the median unbiased root-mean-square error (ubRMSE) between daily averaged CYGNSS retrievals and in-situ observations for each individual station (Table A1) as well as aggregated by network ( Table 1). Note that here we use ubRMSE, whereas before we used ubRMSD to compare SMAP and CYGNSS retrievals. Though we calculated them the same way, we use ubRMSE here to indicate that this is a validation exercise, whereas the comparison between SMAP and CYGNSS was only for the purpose of showing where the two products are dissimilar from one another. For context, we also calculated the ubRMSE between SMAP retrievals and in-situ observations for the same time period. These values are also contained in Tables 1 and A1. Overall, the median ubRMSE between CYGNSS and in situ (0.049 cm 3 cm −3 ) and SMAP and in situ (0.045 cm 3 cm −3 ) were very similar, given that the standard deviations of each were~0.025 cm 3 cm −3 . Similar ubRMSEs are expected since that CYGNSS was calibrated from SMAP retrievals. Table 1. Median unbiased root-mean-square error (ubRMSE) and correlation coefficient (r) between CYGNSS and in-situ soil moisture sites, with those from SMAP shown for context. ubRMSE and r values were calculated for the time period between 2 October 2018, and 31 December 2019, unless an individual station did not have data for the full time period. Note that the distribution of r values for SMAP is significantly non-normal, which limits the interpretation of the standard deviation. A map of all in-situ stations used in this validation exercise along with their respective ubRMSE values is shown in Figure 9. Note that there is a wide range of ubRMSE values, depending on the site and network. In particular, SNOTEL sites performed poorly (mean ubRMSE~= 0.09 cm 3 cm −3 ). SNOTEL sites tend to be in mountainous areas with surrounding trees. P r,e f f in these regions tends to be low, which is an indicator that either topographic roughness or dense vegetation is significantly affecting the reflected signal. We caution users who are interested in using the CYGNSS retrievals in such areas, given the high validation ubRMSE at the in-situ sites. The quality flags described in the last section delineate where these effects are likely to be at play. Table A1 also includes other validation metrics, such as the correlation coefficient (r) between CYGNSS and in-situ sites and the mean bias between in-situ observations and SMAP retrievals, which by extension is also the bias between in-situ observations and CYGNSS retrievals. The median correlation coefficient across sites is somewhat low (r = 0.4) but with a large standard deviation (0.27). Figure 10a shows a histogram of the correlation coefficient (r) between CYGNSS and in-situ observations, with that from SMAP shown for context. Compared to SMAP, CYGNSS has a wider distribution of correlation coefficients, with a lower median value (Table 1). Similar to what was discussed in relation to the correlation between CYGNSS effective reflectivity and SMAP soil moisture, areas with low or no soil moisture variation during the validation time period often results in a low correlation coefficient, as any effects due to random noise are amplified when there is no soil moisture variation. CYGNSS retrievals are expected to have more noise than SMAP retrievals, given the aforementioned imperfections in signal calibration and lack of full grid cell coverage. Figure 10b shows that the distribution of r values does change significantly when only in-situ sites with moderate or large soil moisture variability are considered, which helps mask the effect of noise. In addition to the correlation coefficient, the median absolute bias between SMAP and in-situ observations, which will also be the bias between CYGNSS and in-situ, is 0.05 cm 3 cm −3 . This dry bias is a known issue in the SMAP retrievals [56].  Table A1 also includes other validation metrics, such as the correlation coefficient (r) between CYGNSS and in-situ sites and the mean bias between in-situ observations and SMAP retrievals, which by extension is also the bias between in-situ observations and CYGNSS retrievals. The median correlation coefficient across sites is somewhat low (r = 0.4) but with a large standard deviation (0.27). Figure 10a shows a histogram of the correlation coefficient (r) between CYGNSS and in-situ observations, with that from SMAP shown for context. Compared to SMAP, CYGNSS has a wider distribution of correlation coefficients, with a lower median value (Table 1). Similar to what was discussed in relation to the correlation between CYGNSS effective reflectivity and SMAP soil moisture, areas with low or no soil moisture variation during the validation time period often results in a low correlation coefficient, as any effects due to random noise are amplified when there is no soil moisture variation. CYGNSS retrievals are expected to have more noise than SMAP retrievals, given the aforementioned imperfections in signal calibration and lack of full grid cell coverage. Figure 10b shows that the distribution of r values does change significantly when only in-situ sites with moderate or large soil moisture variability are considered, which helps mask the effect of noise. In addition to the correlation coefficient, the median absolute bias between SMAP and in-situ observations, which will also be the bias between CYGNSS and in-situ, is 0.05 cm 3 cm −3 . This dry bias is a known issue in the SMAP retrievals [56]. Validating a satellite remote-sensing product based off of point measurements, as we have done here, is an imperfect exercise, as the individual soil moisture probe observations may not be representative of the larger areal average. The validation effort conducted by SMAP, for example, averaged in-situ observations from many probes spread out over a large area, for each of their validation sites [46]. The sites that comprise OzNet do indeed contain multiple probes at each site, though there are only two sites with data within the calibration time period (see Table A1). In this case, we used the mean of all probe measurements within one site to calculate the ubRMSE. CYGNSS soil moisture retrievals did agree better with the in-situ observations from these sites (ubRMSE = 0.043 cm 3 cm −3 ) than they did from the point-based observations. Examples of CYGNSS soil moisture time series that agree well with in-situ observations are shown in Figure 11. These sites show that CYGNSS has the ability to retrieve both low and high soil moisture contents. An example of the increased temporal resolution of daily averaged CYGNSS retrievals with respect to SMAP is shown in Figure 12. In this example, the increased resolution of CYGNSS results in the observation of increased soil moisture due to two precipitation events in late June and early July of 2018, which are missed by SMAP. Table A1 shows how many daily averaged Validating a satellite remote-sensing product based off of point measurements, as we have done here, is an imperfect exercise, as the individual soil moisture probe observations may not be representative of the larger areal average. The validation effort conducted by SMAP, for example, averaged in-situ observations from many probes spread out over a large area, for each of their validation sites [46]. The sites that comprise OzNet do indeed contain multiple probes at each site, though there are only two sites with data within the calibration time period (see Table A1). In this case, we used the mean of all probe measurements within one site to calculate the ubRMSE. CYGNSS soil moisture retrievals did agree better with the in-situ observations from these sites (ubRMSE = 0.043 cm 3 cm −3 ) than they did from the point-based observations. Examples of CYGNSS soil moisture time series that agree well with in-situ observations are shown in Figure 11. These sites show that CYGNSS has the ability to retrieve both low and high soil moisture contents. An example of the increased temporal resolution of daily averaged CYGNSS retrievals with respect to SMAP is shown in Figure 12. In this example, the increased resolution of CYGNSS results in the observation of increased soil moisture due to two precipitation events in late June and early July of 2018, which are missed by SMAP. Table A1 shows how many daily averaged CYGNSS soil moisture retrievals were available during the validation time period, which can be compared to the number of SMAP retrievals for the same time period. On average CYGNSS soil moisture retrievals increase the temporal revisit time by 63.4%.  For each in-situ site, we calculated the number of rain events during the validation period, which we defined as an increase in the daily averaged soil moisture greater than 0.02 cm 3 cm −3 . We then calculated the number of the rainy days that had CYGNSS observations and the number that had SMAP observations. From this, we could calculate the percentage of rain events "observed" by both CYGNSS and SMAP and found that CYGNSS observed a median of 87% of the rain events, and SMAP observed a median of 43% (distributions shown in Figure 13). At sites with acceptable CYGNSS performance, CYGNSS would be able to provide more information about soil drying and wetting than SMAP currently is able to.  For each in-situ site, we calculated the number of rain events during the validation period, which we defined as an increase in the daily averaged soil moisture greater than 0.02 cm 3 cm −3 . We then calculated the number of the rainy days that had CYGNSS observations and the number that had SMAP observations. From this, we could calculate the percentage of rain events "observed" by both CYGNSS and SMAP and found that CYGNSS observed a median of 87% of the rain events, and SMAP observed a median of 43% (distributions shown in Figure 13). At sites with acceptable CYGNSS For each in-situ site, we calculated the number of rain events during the validation period, which we defined as an increase in the daily averaged soil moisture greater than 0.02 cm 3 cm −3 . We then calculated the number of the rainy days that had CYGNSS observations and the number that had SMAP observations. From this, we could calculate the percentage of rain events "observed" by both CYGNSS and SMAP and found that CYGNSS observed a median of 87% of the rain events, and SMAP observed a median of 43% (distributions shown in Figure 13). At sites with acceptable CYGNSS performance, CYGNSS would be able to provide more information about soil drying and wetting than SMAP currently is able to.
Remote Sens. 2020, 12, x FOR PEER REVIEW 16 of 24 Figure 13. The percentage of rain events during the validation period observed by SMAP (green bars) and CYGNSS (pink bars). A rain event was defined as the daily averaged soil moisture value increasing by more than 0.02 cm 3 cm −3 with respect to the previous day's value. Figure 14 shows three examples of soil moisture time series from CYGNSS and SMAP where CYGNSS does not agree well with in-situ observations. Generally, if SMAP does not perform well at a site, CYGNSS is likely to perform poorly, too. However, there are other sites where SMAP performance is significantly more acceptable than that from CYGNSS (e.g., Figure 14c). Understanding why CYGNSS soil moisture retrievals at some locations do not perform as well as SMAP is one subject of future research, though preliminary analyses indicate that inadequate open water masking could be one contributing factor. Figure 13. The percentage of rain events during the validation period observed by SMAP (green bars) and CYGNSS (pink bars). A rain event was defined as the daily averaged soil moisture value increasing by more than 0.02 cm 3 cm −3 with respect to the previous day's value. Figure 14 shows three examples of soil moisture time series from CYGNSS and SMAP where CYGNSS does not agree well with in-situ observations. Generally, if SMAP does not perform well at a site, CYGNSS is likely to perform poorly, too. However, there are other sites where SMAP performance is significantly more acceptable than that from CYGNSS (e.g., Figure 14c). Understanding why CYGNSS soil moisture retrievals at some locations do not perform as well as SMAP is one subject of future research, though preliminary analyses indicate that inadequate open water masking could be one contributing factor. Figure 15a shows a histogram of the ubRMSEs between CYGNSS and in-situ observations (values in Table A1), with that from SMAP shown for context. Although the median ubRMSE for both CYGNSS and SMAP retrievals are lower than 0.05 cm 3 cm −3 , it is important to note that ubRMSE statistics are correlated with the variability of soil moisture at a particular site. Figure 15b shows the ubRMSE of CYGNSS and SMAP as a function of the interquartile range (IQR) of soil moisture for each site. As soil moisture variability increases, so does the ubRMSE. This means it may be unreasonable to assume an error of 0.04 cm 3 cm −3 in areas that undergo extreme fluctuations in soil moisture throughout the year. For context, the median IQR at validation sites was 0.07 cm 3 cm −3 , and the IQR of USCRN station Bronte-11-NNE shown in Figures 11 and 12 is 0.071 cm 3 cm −3 . Figure 14 shows three examples of soil moisture time series from CYGNSS and SMAP where CYGNSS does not agree well with in-situ observations. Generally, if SMAP does not perform well at a site, CYGNSS is likely to perform poorly, too. However, there are other sites where SMAP performance is significantly more acceptable than that from CYGNSS (e.g., Figure 14c). Understanding why CYGNSS soil moisture retrievals at some locations do not perform as well as SMAP is one subject of future research, though preliminary analyses indicate that inadequate open water masking could be one contributing factor.   Table A1), with that from SMAP shown for context. Although the median ubRMSE for both CYGNSS and SMAP retrievals are lower than 0.05 cm 3 cm −3 , it is important to note that ubRMSE statistics are correlated with the variability of soil moisture at a particular site. Figure 15b shows the ubRMSE of CYGNSS and SMAP as a function of the interquartile range (IQR) of soil moisture for each site. As soil moisture variability increases, so does the ubRMSE. This means it may be unreasonable to assume an error of 0.04 cm 3 cm −3 in areas that undergo extreme fluctuations in soil moisture throughout the year. For context, the median IQR at validation sites was 0.07 cm 3 cm −3 , and the IQR of USCRN station Bronte-11-NNE shown in Figures 11 and 12 is 0.071 cm 3 cm −3 .

Discussion
As with all remote-sensing techniques, there are limitations to what information retrieval algorithms can provide. Below are some of the more significant limitations of the UCAR/CU retrieval algorithm. We avoid commenting on limitations of CYGNSS in general and instead only address the specific algorithm presented here.
1. Errors in SMAP retrievals will propagate into CYGNSS soil moisture retrievals. Because CYGNSS is calibrated using SMAP, any systemic errors in the SMAP retrievals (particularly, persistent bias) will also be present in CYGNSS retrievals. As discussed above, at validation sites with poor SMAP performance, CYGNSS also performs poorly. 2. As with all empirical approaches, investigation into the "true" sensitivity to soil moisture is difficult. As mentioned above, unless there is enough soil moisture variability, calculation of

Discussion
As with all remote-sensing techniques, there are limitations to what information retrieval algorithms can provide. Below are some of the more significant limitations of the UCAR/CU retrieval algorithm. We avoid commenting on limitations of CYGNSS in general and instead only address the specific algorithm presented here.

1.
Errors in SMAP retrievals will propagate into CYGNSS soil moisture retrievals. Because CYGNSS is calibrated using SMAP, any systemic errors in the SMAP retrievals (particularly, persistent bias) will also be present in CYGNSS retrievals. As discussed above, at validation sites with poor SMAP performance, CYGNSS also performs poorly.

2.
As with all empirical approaches, investigation into the "true" sensitivity to soil moisture is difficult. As mentioned above, unless there is enough soil moisture variability, calculation of β is difficult due to noise in the CYGNSS observations. Additionally, if there is variability of P r,eff within a subcell due to, for example, spatial variations in land cover, then β may appear artificially high (i.e., low sensitivity to soil moisture). 3.
The relationship between P r,eff and soil moisture may not actually be linear. Although we approximate the relationship as being linear, it may not be-it may appear to be linear either due to noise overwhelming an obvious non-linearity, or it may appear linear because in many regions soil moisture does not often fluctuate between 0.02 cm 3 cm −3 and 0.5 cm 3 cm −3 or higher, which would be necessary to elucidate significant non-linear relationships. The empirical linear relationships may thus not match those eventually derived from a model and should not be compared.

4.
The assumption that the sensitivity of P r,eff to soil moisture does not change over time is likely incorrect. Fluctuations in vegetation water content, particularly in agricultural regions, will likely change β, though we currently ignore that possibility.

5.
Aggregating CYGNSS observations to 36 km does not take advantage of the finer spatial resolution. Any advantages CYGNSS might have vis-à-vis providing higher spatial resolution soil moisture retrievals is not permitted by the approach that we use here. Other approaches using either machine learning methods or models could be successful, if provided with accurate, high-resolution ancillary data.

Conclusions
This paper described the UCAR/CU CYGNSS Soil Moisture Product [19], which uses spaceborne GNSS-R reflections, calibrated to SMAP soil moisture, to produce soil moisture retrievals. Validation of the product at 171 in-situ soil moisture stations resulted in a median ubRMSE of 0.049 cm 3 cm −3 . This product can be used by hydrologists who are interested in soil moisture data at a higher temporal resolution than currently available using other products. As was shown in Figures 12 and 13, CYGNSS can observe changes in soil moisture due to precipitation events that may be too quick for the SMAP overpass period. The quantification of soil moisture memory, observation of moisture conditions giving rise to flooding, and the inverse estimation of precipitation using soil moisture time series all require soil moisture data on short time scales, and a daily soil moisture product may be able to provide more complete information on soil moisture dynamics at needed time scales.
GNSS-R researchers interested in developing independent CYGNSS soil moisture retrievals using model-based approaches may also benefit from using this product, as it can provide an indication of where simple algorithms like this one can successfully retrieve soil moisture and where more complicated approaches may be warranted. Table A1. Location information and statistics for each in-situ soil moisture station used for validation. Unbiased root-mean-square errors (ubRMSEs) for CYGNSS are shown as well as those for SMAP, which are shown for context. The correlation coefficient (r) between CYGNSS soil moisture retrievals and in-situ observations is shown. The bias between SMAP retrievals and in-situ observations is the same as for CYGNSS retrievals and in-situ observations, since CYGNSS retrievals are already bias-corrected with respect to SMAP. The number of CYGNSS and SMAP observations used for calculation of ubRMSEs, r, and the bias are also shown.