Downscaling SMAP Brightness Temperatures to 3 km Using CYGNSS Reﬂectivity Observations: Factors That Affect Spatial Heterogeneity

: NASA’s Soil Moisture Active Passive (SMAP) mission only retrieved ~2.5 months of 3 km near surface soil moisture (NSSM) before its radar transmitter malfunctioned. NSSM remains an important area of study, and multiple applications would beneﬁt from 3 km NSSM data. With the goal of creating a 3 km NSSM product, we developed an algorithm to downscale SMAP brightness temperatures (TBs) using Cyclone Global Navigation Satellite System (CYGNSS) reﬂectivity data. The purpose of downscaling SMAP TB is to represent the spatial heterogeneity of TB at a ﬁner scale than possible via passive microwave data alone. Our SMAP/CYGNSS TB downscaling algorithm uses β as a scaling factor that adjusts TB based on variations in CYGNSS reﬂectivity. β is the spatially varying slope of the negative linear relationship between SMAP emissivity (TB divided by surface temperature) and CYGNSS reﬂectivity. In this paper, we describe the SMAP/CYGNSS TB downscaling algorithm and its uncertainties and we analyze the factors that affect the spatial patterns of SMAP/CYGNSS β . 3 km SMAP/CYGNSS TBs are more spatially heterogeneous than 9 km SMAP enhanced TBs. The median root mean square difference (RMSD) between 3 km SMAP/CYGNSS TBs and 9 km SMAP TBs is 3.03 K. Additionally, 3 km SMAP/CYGNSS TBs capture expected NSSM patterns on the landscape. Lower (more negative) β values yield greater spatial heterogeneity in SMAP/CYGNSS TBs and are generally found in areas with low topographic roughness (<350 m), moderate NSSM variance (~0.01–0.0325), low-to-moderate mean annual precipitation (~0.25–1.5 m), and moderate mean Normalized Difference Vegetation Indices (~0.2–0.6). β values are lowest in croplands and grasslands and highest in forested and barren lands.


Introduction
NASA'S Soil Moisture Active Passive (SMAP) mission was launched in January of 2015 with the goal of providing 9 km near surface soil moisture (NSSM) retrievals with high accuracy, global coverage, and a brief revisit time.With an effective sensing depth of ~0-5 cm, the SMAP L-band microwave radiometer produces high-sensitivity brightness temperatures (TBs) with spatial resolutions of ~40 km [1].By merging the ~40 km TBs with 3 km L-band radar data, SMAP retrieved 3 km and 9 km NSSM [2].Unfortunately, the SMAP radar transmitter malfunctioned in July of 2015, and only ~2.5 months of activepassive NSSM were retrieved.The SMAP team then created an alternative active-passive NSSM product, using Sentinel-1A/Sentinel-1B C-band Synthetic Aperture Radar (SAR) data.While this product has a spatial resolution of 1-3 km, full global coverage is only achieved every 12 days [3].
NSSM measurements with both fine spatial scale and fine temporal scale are important for weather forecasting, hydrologic modeling, drought monitoring, and flood prediction [4].
While a variety of remotely sensed NSSM products are available, they generally have either fine spatial resolution or fine temporal resolution, but not both.Microwave radiometers, like SMAP and Soil Moisture and Ocean Salinity (SMOS) [5], retrieve NSSM at coarse spatial resolution (~40 km) and fine temporal resolution (~2-3 days).Microwave synthetic aperture radar (SAR), like Sentinel-1 [6] and the upcoming NASA-ISRO SAR (NISAR) mission [7], retrieve NSSM at fine spatial resolution (dozens of meters to 1 km) but coarse temporal resolution (~6-12 days).There is still a need for NSSM products with both fine spatial scale and fine temporal scale.
We developed an algorithm to downscale SMAP TBs using Cyclone Global Navigation Satellite System (CYGNSS) reflectivity data.The purpose of downscaling SMAP TBs is to represent the spatial heterogeneity of TB at a finer scale than possible via passive microwave data alone, with the end goal of developing a 3 km NSSM product.In this paper, we discuss the SMAP/CYGNSS TB downscaling algorithm, including the complexities involved in collocating SMAP and CYGNSS data.We also investigate the factors that affect the spatial variability of downscaled SMAP/CYGNSS TBs and analyze which types of landscapes produce the most and least spatially heterogeneous TBs.Finally, we discuss the primary sources of uncertainty associated with downscaled SMAP/CYGNSS TBs.Deriving NSSM from SMAP/CYGNSS TBs will be addressed in a separate paper.

Background
Each of the 8 CYGNSS observatories in orbit contains a bistatic radar receiver [8].Bistatic radar receivers collect radar signals sent from separate sources, in this case Global Positioning System (GPS) satellites, using a Signals-of-Opportunity retrieval method.CYGNSS retrievals therefore create a pseudo random spatial coverage pattern (Figure 1), over a latitudinal range of ±38 • [8].Like the SMAP radiometer, CYGNSS retrieves L-band microwave signals with a sensing depth of ~0-5 cm.Therefore, downscaling SMAP using CYGNSS provides an advantage over downscaling SMAP using Sentinel, as Sentinel retrieves C-band microwave signals with a sensing depth of ~0-2 cm [1].
Remote Sens. 2022, 14, x FOR PEER REVIEW 2 of 23 NSSM measurements with both fine spatial scale and fine temporal scale are important for weather forecasting, hydrologic modeling, drought monitoring, and flood prediction [4].While a variety of remotely sensed NSSM products are available, they generally have either fine spatial resolution or fine temporal resolution, but not both.Microwave radiometers, like SMAP and Soil Moisture and Ocean Salinity (SMOS) [5], retrieve NSSM at coarse spatial resolution (~40 km) and fine temporal resolution (~2-3 days).Microwave synthetic aperture radar (SAR), like Sentinel-1 [6] and the upcoming NASA-ISRO SAR (NISAR) mission [7], retrieve NSSM at fine spatial resolution (dozens of meters to 1 km) but coarse temporal resolution (~6-12 days).There is still a need for NSSM products with both fine spatial scale and fine temporal scale.
We developed an algorithm to downscale SMAP TBs using Cyclone Global Navigation Satellite System (CYGNSS) reflectivity data.The purpose of downscaling SMAP TBs is to represent the spatial heterogeneity of TB at a finer scale than possible via passive microwave data alone, with the end goal of developing a 3 km NSSM product.In this paper, we discuss the SMAP/CYGNSS TB downscaling algorithm, including the complexities involved in collocating SMAP and CYGNSS data.We also investigate the factors that affect the spatial variability of downscaled SMAP/CYGNSS TBs and analyze which types of landscapes produce the most and least spatially heterogeneous TBs.Finally, we discuss the primary sources of uncertainty associated with downscaled SMAP/CYGNSS TBs.Deriving NSSM from SMAP/CYGNSS TBs will be addressed in a separate paper.

Background
Each of the 8 CYGNSS observatories in orbit contains a bistatic radar receiver [8].Bistatic radar receivers collect radar signals sent from separate sources, in this case Global Positioning System (GPS) satellites, using a Signals-of-Opportunity retrieval method.CYGNSS retrievals therefore create a pseudo random spatial coverage pattern (Figure 1), over a latitudinal range of ±38° [8].Like the SMAP radiometer, CYGNSS retrieves L-band microwave signals with a sensing depth of ~0-5 cm.Therefore, downscaling SMAP using CYGNSS provides an advantage over downscaling SMAP using Sentinel, as Sentinel retrieves C-band microwave signals with a sensing depth of ~0-2 cm [1].The spatial footprints of GNSS-R (Global Navigation Satellite System-Reflectometry) retrievals depend on the roughness of the surface.Most GNSS-R observations are a combination of coherent and incoherent reflections, with smooth, flat surfaces producing coherent reflections and rough, highly vegetated, or topographically variable surfaces producing incoherent reflections.For low-Earth orbit GNSS-R satellites, like CYGNSS, coherent reflections have a theoretical footprint of ~0.5 km and incoherent reflections have a theoretical footprint of ~25 km [9].While it is possible that future research will allow for the inclusion of a combination of incoherent and coherent scattering in our analysis of CYGNSS reflectivity data, for the time being we assume there is a strong coherent component in the reflectivity observations, and we correct the data accordingly.This assumption is based upon observational evidence of a significant degree of spatial heterogeneity at relatively small scales (~few km), even in the absence of inland surface water.
Assuming a coherently reflecting surface over land, the spatial footprint for a CYGNSS retrieval is ~0.5 × 7 km or ~0.5 × 3.5 km.The elongation is due to the time integration of the received signal.Before mid-2019, integration times were one second, which effectively smeared the along-track footprint to 7 km.After mid-2019, integration times were reduced to 0.5 s, which reduced the theoretical along-track footprint to 3.5 km [10].We chose to use a CYGNSS gridding scheme of 3 × 3 km, based on a study by [11].They found that R 2 and unbiased root mean square difference (ubRMSD) between SMAP and CYGNSS NSSM increased and decreased, respectively, as the CYGNSS NSSM gridding size decreased from 18 km to 9 km to 3 km.3 km also falls within the SMAP EASE-Grid 2.0 gridding scheme [1].
When CYGNSS data are gridded to 3 km (Figure 1), the statistical repeat period is ~8-14 days, varying with latitude [12].The SMAP repeat period at low-to-mid latitudes is 2-3 days [1].If we choose a temporal collocation scheme that includes all CYGNSS observations (Section 2.2.1), we can expect the average spatial coverage of 3 km CYGNSS observations within 9 km SMAP observations to range from ~14-38%.We calculated this by dividing the SMAP repeat period (2-3 days for full SMAP coverage) by the CYGNSS repeat period (8-14 days for full CYGNSS coverage).We further discuss SMAP/CYGNSS spatial coverage in Section 4.5.
In the past several years, multiple NSSM algorithms have been developed using CYGNSS reflectivity data [13][14][15][16][17].However, similar to SAR NSSM data, CYGNSS and other GNSS-R NSSM data have lower accuracy than NSSM retrieved by radiometers, due to the larger surface roughness and vegetation effects of radar at smaller spatial scales [18].By merging SMAP and CYGNSS data, we aim to create a high accuracy NSSM product with a spatial scale of 3 km.To derive SMAP/CYGNSS NSSM, we first need to downscale SMAP TBs using CYGNSS reflectivity data.

Materials and Methods
To calculate downscaled SMAP/CYGNSS TB, we first calculated CYGNSS reflectivity.Next, we collocated SMAP and CYGNSS data, calculated the spatially-varying slopes of the linear regressions between SMAP emissivity and CYGNSS reflectivity, and used a variation of the SMAP active-passive TB downscaling algorithm to calculate 3 km SMAP/CYGNSS TB.

Calculating CYGNSS Reflectivity
CYGNSS Level 1 data are delay Doppler maps (DDMs), which map the power of the received signal with respect to time delay and Doppler shift [8].We derived CYGNSS reflectivity values using the peak power value of each DDM, from CYGNSS version 2.1 L1 data [10].We excluded all CYGNSS data with signal-to-noise ratios lower than 2 dB.This means the peak power values used to calculate reflectivity values were noticeably higher than the noise floors of the DDMs [8].
The peak value of each CYGNSS DDM is the power received from the specular reflection point and must be corrected for antenna gain, the bistatic range, the GPS transmit power, and incidence angle, as described by [14].Assuming a coherent reflection, the coherent component of the bistatic radar equation can be used: where P r is the uncorrected peak power of the DDM, P t is the transmitted power, G t is the transmitting antenna gain, G r is the receiving antenna gain, λ t is the transmitted GPS wavelength (0.19 m), R ts is the distance from the transmitting antenna to the specular reflection point, R sr is the distance from the specular reflection point to the receiving antenna, and Γ s is the effective surface reflectivity.We calculated Γ s by rearranging Equation ( 1) and converting all terms to dB: Finally, we corrected for incidence angle using a modeled, theoretical formula of reflectivity with respect to incidence angle [11,15].Assuming accurate knowledge of the terms in Equation ( 1), the final CYGNSS reflectivity (Γ) is only dependent on surface properties, including surface roughness, vegetation, and the dielectric constant, which depends on soil moisture.

Downscaling SMAP TB
In an effort that follows the SMAP mission's original plan to downscale radiometer TB retrievals using radar retrievals, we downscaled SMAP TBs using CYGNSS reflectivity values.SMAP's original active-passive algorithm includes cross-polarization data, which is not available from CYGNSS retrievals.We therefore used a slightly modified version of the baseline SMAP active-passive TB algorithm (Equation ( 3)) with no cross-polarization measurements [1].This algorithm merges coarse-scale (33 km, gridded at 9 km) SMAP TBs with fine-scale (3 km) CYGNSS reflectivity values to create fine-scale (3 km) SMAP/CYGNSS TBs, as depicted in Figure 2.
Here, TB F is the downscaled, SMAP/CYGNSS fine-scale brightness temperature, and TB C is the SMAP coarse-scale brightness temperature.Γ F is fine-scale CYGNSS reflectivity and Γ C is coarse-scale CYGNSS reflectivity.Γ F and Γ C are further discussed in Section 2.2.3.β C is the coarse-scale, spatially varying slope of the linear regression of SMAP emissivity and CYGNSS reflectivity values.Emissivity is defined as: e = TB C /T S,C , where T S,C is coarse-scale effective soil temperature, or surface temperature.
We used vertically polarized SMAP enhanced TBs, gridded at 9 km with a native resolution of approximately 33 km [19], for our TB C values [20].We chose to use vertically polarized SMAP TBs because the SMAP NSSM algorithm using vertically polarized TBs outperforms the other NSSM algorithms [19].We used the GMAO GEOS-5 (Global Modeling and Assimilation Office, Goddard Earth Observing System version 5) effective soil temperature data provided with the SMAP enhanced product for our T S,C values [20].An approximate equilibrium in temperature between the air, near-surface soil, and vegetation at 6am increases the accuracy of derived NSSM [1].We therefore only used 6am SMAP enhanced TBs (from descending orbits) in this study.We gridded 3 km TB F using the EASE-Grid 2.0.
The goal of downscaling SMAP TB is to represent the spatial heterogeneity of TB at a finer scale than possible using passive microwave data alone.3   Before we used Equation (3) to calculate  , we first collocated SMAP TB and CYGNSS reflectivity data and calculated β.

Collocating SMAP TB and CYGNSS Reflectivity Data
We collocated SMAP TBs and CYGNSS reflectivity values in both space and time.We spatially collocated the fine-scale variables using the 3 × 3 km EASE-Grid 2.0.We spatially collocated the coarse-scale variables using 33 × 33 km boxes, centered on the 9 × 9 km EASE-Grid 2.0, as shown in Figure 2.Each 33 km box consists of 11 × 11 3 km grid cells [2,3].
We masked out all 3 × 3 km grid cells containing semi-permanent open water.SMAP enhanced TB data are water body corrected, which result in higher TBs for adjusted grid cells [21].Consequently, matching SMAP and CYGNSS data over regions of open water will introduce error in the downscaled SMAP/CYGSS TBs.Using 30 m surface water data derived from Landsat [22], available through the Global Surface Water Explorer, we calculated the percentage of each 3 × 3 km grid cell that contained open surface water for at least six months in 2018.We then implemented a water mask over regions that exceeded 5% semi-permanent open water.
The temporal collocation scheme we used is ± half the time between successive SMAP observations.Since the time between SMAP observations is variable, the temporal collocation period varies from approximately 1-6 days.As can be seen in Figure 3a, the Before we used Equation (3) to calculate TB F , we first collocated SMAP TB and CYGNSS reflectivity data and calculated β.

Collocating SMAP TB and CYGNSS Reflectivity Data
We collocated SMAP TBs and CYGNSS reflectivity values in both space and time.We spatially collocated the fine-scale variables using the 3 × 3 km EASE-Grid 2.0.We spatially collocated the coarse-scale variables using 33 × 33 km boxes, centered on the 9 × 9 km EASE-Grid 2.0, as shown in Figure 2.Each 33 km box consists of 11 × 11.3 km grid cells [2,3].
We masked out all 3 × 3 km grid cells containing semi-permanent open water.SMAP enhanced TB data are water body corrected, which result in higher TBs for adjusted grid cells [21].Consequently, matching SMAP and CYGNSS data over regions of open water will introduce error in the downscaled SMAP/CYGSS TBs.Using 30 m surface water data derived from Landsat [22], available through the Global Surface Water Explorer, we calculated the percentage of each 3 × 3 km grid cell that contained open surface water for at least six months in 2018.We then implemented a water mask over regions that exceeded 5% semi-permanent open water.
The temporal collocation scheme we used is ± half the time between successive SMAP observations.Since the time between SMAP observations is variable, the temporal collocation period varies from approximately 1-6 days.As can be seen in Figure 3a, the "between-SMAP" temporal collocation scheme captures all CYGNSS data without reusing any CYGNSS data."between-SMAP" temporal collocation scheme captures all CYGNSS data without reusing any CYGNSS data.

Calculating β
With more than five years of coinciding CYGNSS and SMAP data, there are sufficient collocated time-series data to calculate β statistically.As described above, β is the spatially varying slope determined via linear regression between SMAP emissivity and CYGNSS reflectivity values.We chose to use emissivity, rather than TB, because TB is affected by surface temperature, which fluctuates seasonally.
The β calculation is very important for the success and accuracy of the TB algorithm.β is essentially the scaling factor that adjusts  to be higher or lower than  , so an accurate representation of spatial variability depends on the estimated β value.Since the dielectric constant of soil varies based on water content, which affects the microwave emissivity of the soil [23], SMAP emissivity decreases as NSSM increases and CYGNSS reflectivity increases as NSSM increases.Therefore, β values should theoretically be negative, and positive β values are unrealistic.Lower, or more negative, β values can produce larger differences between  and  , increasing spatial heterogeneity.As we show below, calculating β is challenging in some situations.

Calculating β
With more than five years of coinciding CYGNSS and SMAP data, there are sufficient collocated time-series data to calculate β statistically.As described above, β is the spatially varying slope determined via linear regression between SMAP emissivity and CYGNSS reflectivity values.We chose to use emissivity, rather than TB, because TB is affected by surface temperature, which fluctuates seasonally.
The β calculation is very important for the success and accuracy of the TB algorithm.β is essentially the scaling factor that adjusts TB F to be higher or lower than TB C , so an accurate representation of spatial variability depends on the estimated β value.Since the dielectric constant of soil varies based on water content, which affects the microwave emissivity of the soil [23], SMAP emissivity decreases as NSSM increases and CYGNSS reflectivity increases as NSSM increases.Therefore, β values should theoretically be negative, and positive β values are unrealistic.Lower, or more negative, β values can produce larger differences between TB F and TB C , increasing spatial heterogeneity.As we show below, calculating β is challenging in some situations.
We experimented with a variety of options before determining an optimal β calculation method.Due to the ~8-14 day repeat period for CYGNSS at a 3 km scale [12], the CYGNSS spatial coverage within a 33 km coarse-scale box on any given day is sparse.Therefore, daily CYGNSS observations within a 33 km box are not fully representative of its spatial heterogeneity.To account for this, we aggregated SMAP and CYGNSS data over 45-day periods.While longer periods provide slightly better correlation statistics, we chose this aggregation interval as a compromise to balance (1) the need for full CYGNSS coverage within 33 km boxes, (2) the need for ample data pairs for each linear regression, and (3) the ability to capture seasonal variations in soil moisture.
To calculate β, we located all CYGNSS reflectivity values within each 33 km box over 45-day periods and calculated the mean.Using the water mask described above, we excluded all CYGNSS observations within 3 × 3 km grid cells that exceeded 5% semipermanent open water from our 45-day CYGNSS reflectivity means.We also calculated the mean of all SMAP emissivity values for each 33 km box over 45-day periods.Whereby, we calculated a single pair of SMAP and CYGNSS values for each 33 km box for each 45-day period.Using a calibration period of 1 April 2017 through 31 December 2020, this creates up to 29 data pairs for each linear regression.
SMAP emissivity and CYGNSS reflectivity data pairs and their corresponding β values are vastly different from location to location (Figure 4).In the results, we examine how these differences are affected by NSSM variability, mean annual precipitation, vegetation density, and topographic roughness.
Remote Sens. 2022, 14, x FOR PEER REVIEW 7 of 23 We experimented with a variety of options before determining an optimal β calculation method.Due to the ~8-14 day repeat period for CYGNSS at a 3 km scale [12], the CYGNSS spatial coverage within a 33 km coarse-scale box on any given day is sparse.Therefore, daily CYGNSS observations within a 33 km box are not fully representative of its spatial heterogeneity.To account for this, we aggregated SMAP and CYGNSS data over 45-day periods.While longer periods provide slightly better correlation statistics, we chose this aggregation interval as a compromise to balance (1) the need for full CYGNSS coverage within 33 km boxes, (2) the need for ample data pairs for each linear regression, and (3) the ability to capture seasonal variations in soil moisture.
To calculate β, we located all CYGNSS reflectivity values within each 33 km box over 45-day periods and calculated the mean.Using the water mask described above, we excluded all CYGNSS observations within 3 × 3 km grid cells that exceeded 5% semi-permanent open water from our 45-day CYGNSS reflectivity means.We also calculated the mean of all SMAP emissivity values for each 33 km box over 45-day periods.Whereby, we calculated a single pair of SMAP and CYGNSS values for each 33 km box for each 45day period.Using a calibration period of 1 April 2017 through 31 December 2020, this creates up to 29 data pairs for each linear regression.
SMAP emissivity and CYGNSS reflectivity data pairs and their corresponding β values are vastly different from location to location (Figure 4).In the results, we examine how these differences are affected by NSSM variability, mean annual precipitation, vegetation density, and topographic roughness.

Calculating 𝑇𝐵
An illustration of the steps we used to calculate the 3 km SMAP/CYGNSS TBs ( ) associated with a single 9 km SMAP enhanced TB ( ) is provided in Figure 3b.First, we calculated  by finding the median of all CYGNSS reflectivity values with specular points that fell within the appropriate 33 km box within ± half the time between successive SMAP observations.Second, we identified which of these CYGNSS reflectivity values had

Calculating TB F
An illustration of the steps we used to calculate the 3 km SMAP/CYGNSS TBs (TB F ) associated with a single 9 km SMAP enhanced TB (TB C ) is provided in Figure 3b.First, we calculated Γ C by finding the median of all CYGNSS reflectivity values with specular points that fell within the appropriate 33 km box within ± half the time between successive SMAP observations.Second, we identified which of these CYGNSS reflectivity values had specular points that fell within the central 9 km grid cell and assigned these Γ F to their appropriate 3 km grid cells.On the rare occasion that 2 or more CYGNSS observations fell within the same 3 km grid cell within ± half the time between successive SMAP observations, they were averaged.Finally, using Equation (3), we calculated TB F for each Γ F .Regardless of whether Γ F occurs on the same day as TB C , the output TB F are for the same day as TB C .
Figure 5 includes regional maps of all the parameters in the SMAP/CYGNSS TB algorithm (Equation ( 3)).
Remote Sens. 2022, 14, x FOR PEER REVIEW 8 of 23 specular points that fell within the central 9 km grid cell and assigned these  to their appropriate 3 km grid cells.On the rare occasion that 2 or more CYGNSS observations fell within the same 3 km grid cell within ± half the time between successive SMAP observations, they were averaged.Finally, using Equation (3), we calculated  for each  .Regardless of whether  occurs on the same day as  , the output  are for the same day as  .Figure 5 includes regional maps of all the parameters in the SMAP/CYGNSS TB algorithm (Equation (3)).

Auxiliary Data
To better understand the global spatial distribution of SMAP/CYGNSS β values, we compared β values with mean annual precipitation, mean NDVI, topographic roughness, and landcover class.We describe the data, processing, and upscaling methodology used to calculate each below.
We calculated global mean annual precipitation using monthly 0.1 × 0.1-degree CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) and RFE2 (National Oceanic and Atmospheric Administration Climate Prediction Center African Rainfall Estimation Algorithm v2) rainfall data from 2001-2020, archived by FLDAS (Famine Early Warning Systems Network Land Data Assimilation System) [24].We then upscaled the results to match our coarse-scale grid (33 km boxes, gridded at 9 km) using a linear averaging method.

Auxiliary Data
To better understand the global spatial distribution of SMAP/CYGNSS β values, we compared β values with mean annual precipitation, mean NDVI, topographic roughness, and landcover class.We describe the data, processing, and upscaling methodology used to calculate each below.
We calculated global mean annual precipitation using monthly 0.1 × 0.1-degree CHIRPS (Climate Hazards Group InfraRed Precipitation with Station data) and RFE2 (National Oceanic and Atmospheric Administration Climate Prediction Center African Rainfall Estimation Algorithm v2) rainfall data from 2001-2020, archived by FLDAS (Famine Early Warning Systems Network Land Data Assimilation System) [24].We then upscaled the results to match our coarse-scale grid (33 km boxes, gridded at 9 km) using a linear averaging method.
We calculated mean NDVI using monthly 0.05 × 0.05-degree MODIS/Terra (Moderate Resolution Imaging Spectroradiometer) NDVI data from 2001-2020 [25].We then upscaled the results to match our coarse-scale grid (33 km boxes, gridded at 9 km) using a linear averaging method.
We calculated topographic roughness using the root-mean-square (RMS) surface height, derived from 90 m Shuttle Radar Topographic Mission (SRTM) digital elevation model (DEM) data [26].We calculated RMS surface height values to match our coarse-scale grid (33 km boxes, gridded at 9 km) using the following equation: Here, x n is any given surface height value in the 33 km box, N is the total number of surface height values in the 33 km box, and x is the mean value of surface heights within the 33 km box.
We used annual 0.05 × 0.05-degree MODIS IGBP (International Geosphere-Biosphere Programme) landcover class data [27].We upscaled the MODIS IGBP data to match our coarse-scale grids (33 km boxes, gridded at 9 km) using an areal-percent weighted average.For each 33 km box, we calculated the areal-percent of overlap for each 0.05 × 0.05-degree grid that overlapped or fell within the box.We multiplied the areal-percent weight by the percent-coverage of the dominant landcover class for the 0.05 × 0.05-degree grid and added it to a running sum for that landcover class.The largest landcover class running sum was assigned as the dominant landcover class of the 33 km box.

Results
Using the SMAP/CYGNSS TB downscaling algorithm (Equation ( 3)), TB spatial heterogeneity depends on the spatial variations in CYGNSS reflectivity (∆Γ) and is scaled by β.We first present SMAP/CYGNSS TB spatial heterogeneity results, then we discuss the spatial patterns in ∆Γ values and β values.Finally, we present 3 km SMAP/CYGNSS TB results.

SMAP/CYGNSS TB Spatial Heterogeneity
As expected, 3 km SMAP/CYGNSS TBs exhibit greater spatial variability than 9 km SMAP enhanced TBs.Comparing the 3 km SMAP/CYGNSS TBs associated with each 9 km SMAP TB allowed us to assess the spatial heterogeneity of 3 km SMAP/CYGNSS TBs.We calculated the RMSD between 9 km SMAP TBs and the corresponding 3 km SMAP/CYGNSS TBs for each SMAP observation with concurrent CYGNSS data during 2020.We then calculated the median RMSD TB for each 9 km grid cell.The median RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs range from 0.06 to 12.63 K (5th to 95th percentile) with a median of 3.03 K (Figure 6c).Many of the near-zero RMSD TBs are due to likely-erroneous, near-zero β values (discussed below, in Section 3.3).We can more accurately describe the spatial heterogeneity of 3 km SMAP/CYGNSS TBs by eliminating the near-zero RMSD TBs caused by near-zero β values.By only including β values with correlations less than −0.4,we exclude near-zero β values (Figure 6b).We discuss the reasons for choosing a correlation cutoff of −0.4 in Section 4.3.When only considering grid cells with β values with R < −0.4, the median annual RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs is 5.16 K (Figure 6c).

Spatial Patterns in ∆Γ Values
Median ∆Γ values (Γ F − Γ C ) range from 0.19 to 2.46 dB (5th to 95th percentile) with a median of 1.21 dB (Figure 6a).We calculated the median magnitude of ∆Γ for each SMAP observation in 2020.Then, we calculated the median ∆Γ value for each 9 km grid cell.

Spatial Patterns in ΔΓ Values
Median ΔΓ values (Γ F − Γ C ) range from 0.19 to 2.46 dB (5th to 95th percentile) with a median of 1.21 dB (Figure 6a).We calculated the median magnitude of ΔΓ for each SMAP observation in 2020.Then, we calculated the median ΔΓ value for each 9 km grid cell.
The highest median ΔΓ values are found in north-central Mexico, southeast Asia, Japan, and western Australia.ΔΓ values are zero and near-zero in regions with high topographic roughness and in densely forested areas, due to low signal-to-noise ratios in the CYGNSS DDMs.The high median ΔΓ values seen over the Amazon and Central African rainforests are water signals, sometimes from beneath the canopy, and essentially map the regional tributaries [28].Looking closely at these densely forested regions, ΔΓ values are zero or near-zero everywhere besides the regional tributaries.
We expect ΔΓ values to be consistently low in dry and barren areas, due to limited changes in soil moisture and vegetation.However, median ΔΓ values fluctuate over the Sahara Desert.This might be due to sand dunes, rock outcrops, or other topographic and surface roughness variations throughout the desert.
The global spatial patterns in median ΔΓ (Figure 6a) are not very similar to the spatial patterns in the median RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs (Figure 6c).The correlation between median ΔΓ and median RMSD TB is 0.45.This The highest median ∆Γ values are found in north-central Mexico, southeast Asia, Japan, and western Australia.∆Γ values are zero and near-zero in regions with high topographic roughness and in densely forested areas, due to low signal-to-noise ratios in the CYGNSS DDMs.The high median ∆Γ values seen over the Amazon and Central African rainforests are water signals, sometimes from beneath the canopy, and essentially map the regional tributaries [28].Looking closely at these densely forested regions, ∆Γ values are zero or near-zero everywhere besides the regional tributaries.
We expect ∆Γ values to be consistently low in dry and barren areas, due to limited changes in soil moisture and vegetation.However, median ∆Γ values fluctuate over the Sahara Desert.This might be due to sand dunes, rock outcrops, or other topographic and surface roughness variations throughout the desert.
The global spatial patterns in median ∆Γ (Figure 6a) are not very similar to the spatial patterns in the median RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs (Figure 6c).The correlation between median ∆Γ and median RMSD TB is 0.45.This suggests ∆Γ is not the primary source of SMAP/CYGNSS TB spatial heterogeneity, and instead β is the dominant factor.

Spatial Patterns in SMAP/CYGNSS β Values
SMAP/CYGNSS β values (the slopes of the linear regressions of SMAP emissivity and CYGNSS reflectivity) range from 0.002 to −0.021 dB −1 (5th to 95th percentile) with a median of −0.007 dB −1 (Figure 6b).The global spatial patterns in β are similar to the spatial patterns in the median RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs (Figure 6c).Regions with low β tend to have high RMSD TBs, and regions with near-zero β tend to have near-zero RMSD TBs.The correlation between β values and median RMSD TBs is −0.71.Therefore, β (not ∆Γ) is the primary source of SMAP/CYGNSS TB variations.
SMAP emissivity and CYGNSS reflectivity are strongly correlated in many areas (Figure 7a).The median global correlation is −0.58, while the statistical mode of correlations is about −0.91.Linear regressions with near-zero correlations typically yield near-zero or positive β values (Figure 7a,b).Given the SMAP/CYGNSS TB algorithm (Equation ( 3)), β values of zero necessitate that Tb F will equal Tb C .Therefore, near-zero β values yield low TB spatial heterogeneity.Additionally, as explained in Section 2.2.2, positive β values are unrealistic.Near-zero and positive β values occur in dry areas (e.g., Sahara Desert), forested areas (e.g., Amazon Rainforest), and areas with high topographic roughness (e.g., Andes Mountains).We further investigate the factors that control the spatial patterns in β values below.

Relationships between Mean Annual Precipitation, NSSM Variability and β Values
SMAP emissivity and CYGNSS reflectivity are often poorly correlated in dry areas (Figure 7a,d,e).This is because dry regions have little NSSM variability, and therefore little-to-no temporal variation in emissivity or reflectivity (Figure 4f).
The impact of precipitation and NSSM variability on SMAP/CYGNSS β values can be illustrated by examining the seasonal cycle of β for a single 33 km box (Figure 8).β is most negative (and correlation best) during the wet season, when NSSM is fluctuating due to rainfall events.In contrast, periods with no precipitation and low NSSM variability produce near-zero β values and poor correlations.We calculated seasonal β values using SMAP/CYGNSS data pairs that were temporally collocated using ± half the time between successive SMAP observations.The 45-day SMAP/CYGNSS β value (Figure 8b), calculated using the methodology described in Section 2.2.2, is closer to the seasonal β values generated during periods with higher precipitation and higher NSSM variability.This indicates that our 45-day β values provide an adequate estimate of the annual range of coarse-scale NSSM variability.
The relationship between SMAP/CYGNSS β values and NSSM variability is shown in Figure 9a.We represented NSSM variability by calculating the variance through time of SMAP NSSM for each 9 km grid cell during 2020.The relationship between SMAP/CYGNSS β values and FLDAS mean annual precipitation (described in Section 2.3) is shown in Figure 9b.As expected, regions with very low NSSM variance (<0.0025) and very low mean annual precipitation (<0.25 m) have near-zero and positive β values.In contrast, the most negative β values are found in regions with moderate NSSM variance (~0.01-0.0325)and low-to-moderate mean annual precipitation (~0.25-1.5 m).The nearzero β values found in regions with high mean annual precipitation might be due to a lack of NSSM variability, if the soils are always wet, or could be due to increasing vegetation density, which is discussed below.

Relationship between NDVI and β Values
SMAP emissivity and CYGNSS reflectivity are often poorly correlated in forested areas (Figure 6a,f) due to a lack of significant changes in both SMAP emissivity and CYGNSS reflectivity (Figure 4e).This is because the dense vegetation in forests interferes with microwave sensing of the soil.We used MODIS/Terra mean NDVI (described in Section 2.3) to compare SMAP/CYGNSS β values with vegetation density.zero or positive β values (Figure 7a,b).Given the SMAP/CYGNSS TB algorithm (Equation (3)), β values of zero necessitate that  will equal  .Therefore, near-zero β values yield low TB spatial heterogeneity.Additionally, as explained in Section 2.2.2, positive β values are unrealistic.Near-zero and positive β values occur in dry areas (e.g., Sahara Desert), forested areas (e.g., Amazon Rainforest), and areas with high topographic roughness (e.g., Andes Mountains).We further investigate the factors that control the spatial patterns in β values below.The relationship between SMAP/CYGNSS β values and NSSM variability is shown in Figure 9a.We represented NSSM variability by calculating the variance through time of SMAP NSSM for each 9 km grid cell during 2020.The relationship between SMAP/CYGNSS β values and FLDAS mean annual precipitation (described in Section 2.3) is shown in Figure 9b.As expected, regions with very low NSSM variance (<0.0025) and very low mean annual precipitation (<0.25 m) have near-zero and positive β values.In contrast, the most negative β values are found in regions with moderate NSSM variance (~0.01-0.0325)and low-to-moderate mean annual precipitation (~0.25-1.5 m).The nearzero β values found in regions with high mean annual precipitation might be due to a lack of NSSM variability, if the soils are always wet, or could be due to increasing vegetation density, which is discussed below.

Relationship between NDVI and β Values
SMAP emissivity and CYGNSS reflectivity are often poorly correlated in forested areas (Figure 6a,f) due to a lack of significant changes in both SMAP emissivity and CYGNSS reflectivity (Figure 4e).This is because the dense vegetation in forests interferes with microwave sensing of the soil.We used MODIS/Terra mean NDVI (described in Section 2.  The positive relationship between SMAP/CYGNSS β values and SRTM topographic roughness (described in Section 2.3) is shown in Figure 9d, with β values approaching zero as topographic roughness increases.Since both dry and densely vegetated areas have near-zero β values, regardless of topographic roughness, we did not include any 9 km grid cells with mean annual precipitation less than 0.25 m or mean NDVI greater than 0.8 in our topographic roughness analysis.

Landcover Class and β Values
SMAP/CYGNSS β values also vary with MODIS IGBP landcover class (described in Section 2.3), as shown in Figure 9e.We only included β values in this analysis for 9 km grid cells that did not change landcover class during the period of our linear regression calculations (April 2017 through December 2020) and for 9 km grid cells that were greater than 90% the predominant landcover class by area.Since high topographic roughness produces poorly correlated SMAP/CYGNSS linear regressions and is not dependent on landcover class, we did not include any 9 km grid cells with topographic roughness values greater than 350 m in our landcover class analysis.Areas with topographic roughness greater than 350 m have β values of zero or higher more than 25% of the time (Figure 9d).We did not include landcover classes that are masked out in NSSM products (0: water, 11: permanent wetland, 13: urban and built-up, and 15: snow and ice), nor did we include landcover classes without any 9 km grid cells meeting the criteria explained above (3: deciduous needleleaf forest).
β values are lowest in croplands (Figure 4a) and grasslands (Figure 4b) and highest in forests (Figure 4e) and barren lands (Figure 4f).Many of the landcover class and β value relationships can be explained by variations in NSSM variability, mean annual precipitation, and mean NDVI throughout landcover classes (Figure 7d-g).We consider how to untangle these confounding effects in the discussion.While some spatial disagreements exist, mainly in Australia, the United States, and the Middle East, general spatial agreement can be seen across most of the globe.Complete spatial agreement between β values is not expected, as NSSM variability affects β (Section 3.3.1).Since there is no temporal overlap in SMAP active-passive data and CYGNSS data, and the calculation methods are not the same, a direct comparison of β values is not possible.The SMAP active-passive β values were calculated from 15 April-7 July 2015, and our SMAP/CYGNSS β values were calculated using data from April 2017-December 2020.The ~2.5 months of time-series data used to calculate SMAP active-passive β values is not representative of annual precipitation and NSSM variability, which may have caused incomplete β estimations, especially in arid regions [2].

3 km SMAP/CYGNSS TBs
3 km SMAP/CYGNSS TBs include significantly more spatial detail than 9 km SMAP TBs and capture expected soil moisture patterns on the landscape.An example near New Delhi, India (Figure 10) shows a decrease in soil moisture over croplands as time progresses from August to December.The drying of the croplands is evident in both 3 km SMAP/CYGNSS TBs (Figure 10g,h) and 9 km SMAP TBs (Figure 10d,e).However, the spatial pattern of the more productive croplands is obvious in 3 km SMAP/CYGNSS TBs but not discernible in 9 km SMAP TBs.
The 3 km SMAP/CYGNSS TB maps shown in Figure 10g-h include data that is aggregated over 60 days.As such, adjacent 3 km TBs may be derived from SMAP/CYGNSS data observed on the same day, 5 days apart, or as many as 59 days apart.This introduces noise in the maps.We aggregated SMAP/CYGNSS TBs over 60 days due to sparse daily spatial coverage.Options for increasing daily spatial coverage in SMAP/CYGNSS TBs are discussed in Section 4.5.
The 3 km SMAP/CYGNSS TB maps shown in Figure 10g-h include data that is aggregated over 60 days.As such, adjacent 3 km TBs may be derived from SMAP/CYGNSS data observed on the same day, 5 days apart, or as many as 59 days apart.This introduces noise in the maps.We aggregated SMAP/CYGNSS TBs over 60 days due to sparse daily spatial coverage.Options for increasing daily spatial coverage in SMAP/CYGNSS TBs are discussed in Section 4.5.

Spatial Relationships Affecting β Values
We demonstrated the existence of spatial relationships between SMAP/CYGNSS β values and mean annual precipitation, NSSM variability, mean NDVI, and topographic roughness in Section 3.3.However, which of these relationships are most important for determining β values?We synthesize the SMAP/CYGNSS β results below.
High topographic roughness decreases the correlation between emissivity and reflectivity, regardless of mean annual precipitation, NSSM variability, or mean NDVI.This does not mean that topographic roughness is a good indicator of β values, but rather that the quality of SMAP and CYGNSS observations is reduced in areas with high topographic roughness, due to increased scattering of the microwave signal.
While NSSM variability is related to mean annual precipitation, NSSM variability is more indicative of β values for two major reasons.First, both low and high mean annual precipitation can create areas with low NSSM variability.Second, NSSM variability can be affected by other factors, including seasonal flooding, irrigation, and the water and energy balance of the surface.
The two most important factors for determining β values are therefore NSSM variability and mean NDVI.β values generally decrease with increasing NSSM variability (Section 3.3.1).Additionally, β values increase as mean NDVI increases from moderateto-high values, due to increased vegetation density (Section 3.3.2).NSSM variability is a better predictor of β values for areas with low-to-moderate mean NDVI, and mean NDVI is a better predictor of β values for areas with high mean NDVI.
The spatial relationships between SMAP/CYGNSS β values and mean annual precipitation, NSSM variability, and mean NDVI are also important when considering the β values produced by different landcover classes.Landcover classes with high mean NDVI (forests) and low mean annual precipitation and NSSM variability (barren areas) tend to produce poorly correlated linear regressions and near-zero β values.Alternatively, landcover classes with low-to-moderate mean annual precipitation, moderate NSSM variability, and moderate mean NDVI (shrublands, savannas, grasslands, and croplands) tend to produce well correlated linear regressions with more negative β values.
Similar findings are presented in [30].This study on the relationship between SMAP radar backscatter and SMAP radiometer TB found that linear correlations are best in grasslands and croplands and worst in forests and barren areas.They also found that linear correlations are best for moderate NDVI, with worse correlations for low and high NDVI, and that linear correlations improve as soil moisture variability increases.

Sources of Temporal Fluctuations in β Values
The temporal fluctuations in NSSM variability within 9 km grid cells significantly impact SMAP/CYGNSS β values (Section 3.3.1). Figure 8 demonstrates that β values vary significantly throughout the year, driven by seasonal changes in precipitation and NSSM variability.While seasonal changes in vegetation and surface roughness might also affect β values [3], these signals would likely be dwarfed by seasonal changes in NSSM variability.Since SMAP/CYGNSS β values erroneously approach zero during the dry season, which could yield less spatial heterogeneity than exists, we chose not to use seasonally fluctuating β values.We therefore did not explore any possible temporal relationships between vegetation or surface roughness and β values.values to replace.We chose a preliminary correlation cutoff of −0.4, to ensure the replacement of erroneous β values while minimizing the number of replaced β values.Future work will include testing a variety of correlation cutoffs during NSSM validation.
Unlike the SMAP radar, CYGNSS does not include any cross-polarization data, so we are unable to replicate the regression model between cross-polarization backscatter (representative of vegetation level) and β used to create representative β values for the SMAP active-passive product [2].Instead, we will use the relationships between SMAP/CYGNSS β values and mean annual precipitation, mean NDVI, and NSSM variability to determine representative β values.
Since the variation in β values produced by different landcover classes can be explained by the relationships between β values and mean annual precipitation, mean NDVI, and NSSM variability (Section 4.1), we will use landcover classes to create representative β values.Unfortunately, the correlation between β values and landcover class is not particularly high, as can be seen by the spread of the box and whisker plots in Figure 9e.We therefore removed β values with correlations greater than −0.4 from the analyses.We calculated the median β value for each landcover class, including only 9 km grid cells with β values with R < −0.

TB Spatial Heterogeneity Comparisons
The goal of downscaling SMAP TB is to represent the spatial heterogeneity of TB at a finer scale than possible via passive microwave data alone.While we expect to see increased spatial heterogeneity at the 3 km scale compared to the 9 km or 33 km scale, the amount of additional heterogeneity depends on various landscape attributes.To gauge the accuracy of 3 km SMAP/CYGNSS TB spatial heterogeneity, we compare it to other TB datasets with similar spatial resolutions.

TB Spatial Heterogeneity Comparisons
The goal of downscaling SMAP TB is to represent the spatial heterogeneity of TB at a finer scale than possible via passive microwave data alone.While we expect to see increased spatial heterogeneity at the 3 km scale compared to the 9 km or 33 km scale, the amount of additional heterogeneity depends on various landscape attributes.To gauge the accuracy of 3 km SMAP/CYGNSS TB spatial heterogeneity, we compare it to other TB datasets with similar spatial resolutions.
SMAP uses Passive Active L-band Sensor (PALS) flights during their calibration and validation campaigns.While no PALS flights are both spatially and temporally coincident with CYGNSS data, we can still use PALS data as a general comparison.During the SMAPVEX15 campaign in southeastern Arizona between 1 and 18 August 2015, the standard deviation of PALS 1 km TBs within 3 separate 36 km pixels during 7 overpasses varied from about 2-13 K [31].We calculated the standard deviation of 3 km SMAP/CYGNSS TBs within 33 km boxes covering approximately the same geographic area in Arizona between 1 and 31 August 2018 and found a range of about 0-16.9K (5th-95th percentile).These TB standard deviations are fairly similar.We do not expect an exact match, as we are comparing two different scales (1 km vs. 3 km TBs) over 2 different years.
We also compared 3 km SMAP/CYGNSS TB spatial heterogeneity to 3 km SMAP/Sentinel TB spatial heterogeneity.Since CYGNSS retrieves L-band microwave signals and Sentinel retrieves C-band microwave signals, and CYGNSS and Sentinel observations are generally not concurrent, we do not expect an exact match.However, the spatial heterogeneity of different downscaled 3 km SMAP TBs should be similar.
We calculated the RMSD between 9 km SMAP TBs and the corresponding 3 km SMAP/CYGNSS or SMAP/Sentinel TBs [32] for each SMAP observation with concurrent CYGNSS or Sentinel data between 1 April and 30 June 2018.We then calculated median RMSD TBs for each 9 km grid cell.We only included 9 km grid cells with both SMAP/CYGNSS and SMAP/Sentinel results in our analysis.The RMSD between 3 km SMAP/Sentinel TBs and 9 km SMAP TBs range from 0.8 to 11.7 K (5th-95th percentile) with a median of 3.7 K (Figure 12a).The RMSD between 3 km SMAP/CYGNSS TBs and 9 km SMAP TBs range from 0.06 to 12.1 K (5th-95th percentile) with a median of 2.7 K (Figure 12b).The lower median RMSD between SMAP/CYGNSS and SMAP TBs is primarily due to near-zero SMAP/CYGNSS β values.When we recalculate SMAP/CYGNSS TBs using landcover class corrected β values (Section 4.3), the RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs range from 0.7 to 12.9 K (5th-95th percentile) with a median of 3.7 K (Figure 12b).The similarity between SMAP/CYGNSS and SMAP/Sentinel 3 km TB spatial heterogeneity is promising.
from about 2-13 K [31].We calculated the standard deviation of 3 km SMAP/CYGNSS TBs within 33 km boxes covering approximately the same geographic area in Arizona between 1 and 31 August 2018 and found a range of about 0-16.9K (5th-95th percentile).These TB standard deviations are fairly similar.We do not expect an exact match, as we are comparing two different scales (1 km vs. 3 km TBs) over 2 different years.
We also compared 3 km SMAP/CYGNSS TB spatial heterogeneity to 3 km SMAP/Sentinel TB spatial heterogeneity.Since CYGNSS retrieves L-band microwave signals and Sentinel retrieves C-band microwave signals, and CYGNSS and Sentinel observations are generally not concurrent, we do not expect an exact match.However, the spatial heterogeneity of different downscaled 3 km SMAP TBs should be similar.
We calculated the RMSD between 9 km SMAP TBs and the corresponding 3 km SMAP/CYGNSS or SMAP/Sentinel TBs [32] for each SMAP observation with concurrent CYGNSS or Sentinel data between 1 April and 30 June 2018.We then calculated median RMSD TBs for each 9 km grid cell.We only included 9 km grid cells with both SMAP/CYGNSS and SMAP/Sentinel results in our analysis.The RMSD between 3 km SMAP/Sentinel TBs and 9 km SMAP TBs range from 0.8 to 11.7 K (5th-95th percentile) with a median of 3.7 K (Figure 12a).The RMSD between 3 km SMAP/CYGNSS TBs and 9 km SMAP TBs range from 0.06 to 12.1 K (5th-95th percentile) with a median of 2.7 K (Figure 12b).The lower median RMSD between SMAP/CYGNSS and SMAP TBs is primarily due to near-zero SMAP/CYGNSS β values.When we recalculate SMAP/CYGNSS TBs using landcover class corrected β values (Section 4.3), the RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs range from 0.7 to 12.9 K (5th-95th percentile) with a median of 3.7 K (Figure 12b).The similarity between SMAP/CYGNSS and SMAP/Sentinel 3 km TB spatial heterogeneity is promising.

SMAP/CYGNSS TB Spatial Coverage
SMAP/CYGNSS TBs do not provide full spatial coverage every 2-3 days, due to the ~8-14 day repeat period of CYGNSS observations at the 3 km scale.Low latitudes and areas where CYGNSS data have low signal-to-noise (i.e., forests and mountains) have

SMAP/CYGNSS TB Spatial Coverage
SMAP/CYGNSS TBs do not provide full spatial coverage every 2-3 days, due to the ~8-14 day repeat period of CYGNSS observations at the 3 km scale.Low latitudes and areas where CYGNSS data have low signal-to-noise (i.e., forests and mountains) have worse spatial coverage.Higher latitudes and areas where CYGNSS data have high signal-to-noise (i.e., grasslands and croplands) have better spatial coverage.
The CYGNSS spatial coverage for each SMAP observation ranges from 3.0 to 28.3% (5th to 95th percentile) with a median of 15.5%.We calculated percent coverage for each 9 km SMAP TB during 2020 within ±35 • latitude in non-mountainous and non-forested areas, using the associated 3 km SMAP/CYGNSS TBs.We then calculated the mean percent coverage for each 9 km grid cell.
There are two options for improving SMAP/CYGNSS TB spatial coverage.(1) We could produce both 3 km and 9 km SMAP/CYGNSS TBs.The 3 km TBs would include greater spatial heterogeneity, but the 9 km TBs would have better spatial coverage.Since the native resolution of SMAP enhanced TBs is ~33 km (gridded at 9 km), 9 km SMAP/CYGNSS TBs would still provide improved spatial heterogeneity.(2) We could use spatial interpolation techniques to fill the gaps.We could either (a) use pre-interpolated CYGNSS data, (b) interpolate 3 km SMAP/CYGNSS TBs, or (c) interpolate 3 km SMAP/CYGNSS NSSM.In each case, we could achieve full spatial coverage every 2-3 days, but precipitation patterns on the landscape would be less precise, which may lead to reduced NSSM accuracy.Future work will include analyzing the pros and cons of these options for improving SMAP/CYGNSS spatial coverage.
A final note: GNSS-R spatial coverage will likely improve in future years, due to larger constellation sizes and a higher number of received signals [33,34].

SMAP/CYGNSS TB Uncertainties
Multiple uncertainties exist related to the SMAP/CYGNSS TB algorithm.Some of these uncertainties are due to the difficulties of collocating SMAP and CYGNSS data, and others are inherent in CYGNSS data and the TB algorithm.

SMAP/CYGNSS Collocation Uncertainties
There are a few uncertainties related to the collocation of SMAP and CYGNSS data.First, since the temporal collocation period varies from 1-6 days, the chance of rainfall occurring between collocated SMAP and CYGNSS observations is non-negligible.Rainfall between collocated observations could lead to reduced accuracy in downscaled TBs, since the SMAP and CYGNSS observations being merged would not be representative of the same NSSM.Second, derived Γ C are not always representative of the spatial heterogeneity within a 33 km box.Due to the pseudo random spatial coverage of CYGNSS, some Γ C are calculated using a single CYGNSS reflectivity value, while other Γ C are calculated using tens of reflectivity values.When Γ C is calculated using a single reflectivity value, ∆Γ will always equal zero.Third, the moving window averaging method used to calculate Γ C , based on the coarse-scale gridding scheme, introduces bias into the downscaling algorithm.When the 3 km SMAP/CYGNSS TBs are upscaled, they do not always equal the 9 km SMAP TBs.

CYGNSS Reflectivity Uncertainties
Uncertainty in CYGNSS observations also introduces uncertainty into SMAP/CYGNSS TBs.CYGNSS data are calibrated to optimize retrievals over the ocean surface, which does not necessarily translate to an optimization of retrievals over land.In addition, there is uncertainty in some of the ancillary variables in the bistatic radar equation.Most prominently, the GPS effective isotropic radiated power (EIRP) for the v2.1 data is estimated from a lookup table and does not consider variations in transmit power over time or space.Some of the assumptions we made also introduce uncertainty into the reflectivity values: (1) the assumption that CYGNSS retrievals over land are based on completely coherent reflections; and (2) the 3 × 3 km gridding of CYGNSS reflectivity values, which does not account for the elongated footprint of CYGNSS observations.Future work will include estimating a bulk uncertainty for CYGNSS reflectivity values and propagating that uncertainty to TB and NSSM.

SMAP/CYGNSS TB Algorithm Uncertainties
Near-zero and positive β values, generally in areas with poorly correlated SMAP emissivity and CYGNSS reflectivity observations, are likely the greatest source of uncertainty in SMAP/CYGNSS TBs.While we will attempt to reduce some of this uncertainty by replacing poorly correlated β values with representative values (Section 4.3), an incorrect β value has the potential to create significant uncertainty in calculated TBs.If we only consider β values with R < −0.4 (Section 4.3 & Figure 6b), the standard deviation of β values is 0.006 dB −1 .Using 5th to 95th percentile ranges in β, T s , and ∆Γ, we determined that the potential RMSE in TBs based on β errors of ± 0.006 dB −1 is ~3.5 K.
Other SMAP/CYGNSS TB sources of uncertainty include seasonal fluctuations in surface water not captured by our water mask, and uncertainty in surface temperature data.No water mask is perfect, and some TB uncertainty is expected near water bodies that fluctuate seasonally.Uncertainty related to surface temperature is primarily due to its coarse resolution.For consistency, we used the same surface temperature data as SMAP (GMAO GEOS-5), which has a native resolution of 0.25 × 0.3125-degrees.This coarse spatial resolution does not adequately capture the spatial heterogeneity of surface temperature on the landscape.Using a finer spatial resolution surface temperature would likely improve our downscaling algorithm and will be considered in the future.

Conclusions
We merged SMAP enhanced TBs and CYGNSS reflectivity values to create 3 km SMAP/CYGNSS TBs.Future work includes using 3 km SMAP/CYGNSS TBs to create a 3 km NSSM product.
Globally, the correlations between SMAP emissivity and CYNGSS reflectivity values are good (more negative), except in forested areas, barren areas, or areas with high topographic roughness.In this way, SMAP/CYGNSS β values are similar to SMAP activepassive β values [2].
Lower β values yield greater spatial heterogeneity in SMAP/CYGNSS TBs.Areas with very low mean annual precipitation and NSSM variability have poorly correlated linear regressions with near-zero β values.Areas with high topographic roughness or high mean NDVI also have poorly correlated linear regressions with near-zero β values.Well correlated linear regressions with lower (more negative) β values are generally found in areas with low topographic roughness (<350 m), moderate NSSM variance (~0.01-0.0325),low-to-moderate mean annual precipitation (~0.25-1.5 m), and moderate mean NDVI (~0.2-0.6).β values are lowest in croplands and grasslands and highest in forested and barren lands.
3 km SMAP/CYGNSS TBs are more spatially heterogenous than 9 km SMAP TBs and capture expected soil moisture patterns on the landscape.The median RMSD between 3 km SMAP/CYGNSS TBs and 9 km SMAP TBs is 3.03 K.The spatial heterogeneity of 3 km SMAP/CYGNSS TBs is similar to the spatial heterogeneity of 3 km SMAP/Sentinel TBs.
In upcoming years, GNSS-R constellation sizes will increase, along with the number of received signals [33,34].Additional GNSS-R constellations will provide more opportunities for downscaled NSSM products.Hopefully, the SMAP/CYGNSS TB algorithm can serve as a model for future downscaling efforts.

Supplementary Materials:
The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/rs14205262/s1:A netcdf file containing β values, their corresponding correlations, and median RMSD TB and ∆Γ values for 2020 (as shown in Figures 6 and 7).6 and 7) are available in the Supplementary Materials netcdf file.SMAP data is publicly available in the NSIDC DAAC [20,29,32].CYGNSS data is publicly available in the PO.DAAC [10].Downscaled SMAP/CYGNSS TB data will be publicly available with the downscaled SMAP/CYGNSS NSSM product in the future and are available on request from the corresponding author.

Figure 1 .
Figure 1. 9 km SMAP brightness temperatures (TBs) over (a) southern North America and (b) Australia, and CYGNSS reflectivity gridded at 3 × 3 km over (c) southern North America and (d) Australia on 9 March 2020.

Figure 2 .
Figure 2. We merged 9 km SMAP TBs with CYGNSS reflectivity data, gridded at 3 km, to create 3 km SMAP/CYGNSS TBs. 9 km SMAP TB data have a native resolution of approximately 33 km.To collocate SMAP and CYGNSS data, we derived 33 km boxes for each SMAP 9 km grid cell [2,3].The goal of downscaling SMAP TB is to represent the spatial heterogeneity of TB at a finer scale than possible using passive microwave data alone.3 km SMAP/CYGNSS TB spatial heterogeneity is represented by spatial variations in  (Equation (3)) and is influenced by two factors.(1) Large spatial variations in CYGNSS reflectivity will create larger ΔΓ values ( −  ), which will increase the spatial variations of  .CYGNSS reflectivity varies due to natural spatial and temporal fluctuations in soil moisture, surface roughness, and vegetation.(2) More negative β values will also increase the spatial variations of  .SMAP/CYGNSS β values are further discussed in Section 2.2.2.Before we used Equation (3) to calculate  , we first collocated SMAP TB and CYGNSS reflectivity data and calculated β.

Figure 2 .
Figure 2. We merged 9 km SMAP TBs with CYGNSS reflectivity data, gridded at 3 km, to create 3 km SMAP/CYGNSS TBs. 9 km SMAP TB data have a native resolution of approximately 33 km.To collocate SMAP and CYGNSS data, we derived 33 km boxes for each SMAP 9 km grid cell [2,3].

Figure 3 .
Figure 3.All displayed data falls within the 33 km box centered on the 9 km grid cell at [31.2N, 98.7W].In (b-d), the black borders are the 33 km box and the red squares are the 9 km grid cell.(a): A month-long timeseries of collocated SMAP and CYGNSS data, where each temporal collocation period is ±half the time between successive SMAP observations.The black lines denote the time of each SMAP observation, and the alternating gray and white shaded regions show the temporal collocation periods.The blue shaded region shows the collocation period depicted in (b-d).The green squares are SMAP observations.The blue dots are CYGNSS observations.The red diamond is the  value shown in (c).(b): The  map shows all the CYGNSS reflectivity values collocated with the SMAP observation on 17 February 2020.(c):  is the median of all  and is used to calculate  within the central 9 km grid cell.(d) The  map shows the 3 km SMAP/CYGNSS TBs calculated using the displayed  and  values.

Figure 3 .
Figure 3.All displayed data falls within the 33 km box centered on the 9 km grid cell at [31.2N, 98.7W].In (b-d), the black borders are the 33 km box and the red squares are the 9 km grid cell.(a): A month-long timeseries of collocated SMAP and CYGNSS data, where each temporal collocation period is ±half the time between successive SMAP observations.The black lines denote the time of each SMAP observation, and the alternating gray and white shaded regions show the temporal collocation periods.The blue shaded region shows the collocation period depicted in (b-d).The green squares are SMAP observations.The blue dots are CYGNSS observations.The red diamond is the Γ C value shown in (c).(b): The Γ F map shows all the CYGNSS reflectivity values collocated with the SMAP observation on 17 February 2020.(c): Γ C is the median of all Γ F and is used to calculate TB F within the central 9 km grid cell.(d) The TB F map shows the 3 km SMAP/CYGNSS TBs calculated using the displayed Γ C and Γ F values.

Figure 6 .
Figure 6.(a) Map and histograms of median ΔΓ values (described in Section 3.2), calculated using data from 2020.(b) Map and histograms of SMAP/CYGNSS β values (described in Section 2.2.2).(c) Map and histograms of the median RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs (described in Section 3.1), calculated using data from 2020.Gray histograms include all data.Red histograms only include data for 9 km grid cells with β values where R < −0.4 (Section 4.3).

Figure 6 .
Figure 6.(a) Map and histograms of median ∆Γ values (described in Section 3.2), calculated using data from 2020.(b) Map and histograms of SMAP/CYGNSS β values (described in Section 2.2.2).(c) Map and histograms of the median RMSD between 9 km SMAP TBs and 3 km SMAP/CYGNSS TBs (described in Section 3.1), calculated using data from 2020.Gray histograms include all data.Red histograms only include data for 9 km grid cells with β values where R < −0.4 (Section 4.3).

Figure 8 .
Figure 8.A depiction of the effect that NSSM variability has on β values.All displayed data falls within the 33 km box centered on the 9 km grid cell at [36.1N, 119.6W].(a) Bi-monthly SMAP emissivity and CYGNSS reflectivity data, collocated using ±half the time between successive SMAP observations, and their corresponding linear regressions.The seasonally varying β values are included in each plot.The SMAP/CYGNSS data points are colored to match the timeseries in (c).(b) 45-day SMAP emissivity and CYGNSS reflectivity data (as described in Section 2.2.2) and the corresponding β value.(c) SMAP NSSM timeseries with FLDAS monthly rainfall data.The NSSM data points are colored bi-monthly, to match the scatter plots in (a).

Figure 8 .
Figure 8.A depiction of the effect that NSSM variability has on β values.All displayed data falls within the 33 km box centered on the 9 km grid cell at [36.1N, 119.6W].(a) Bi-monthly SMAP emissivity and CYGNSS reflectivity data, collocated using ±half the time between successive SMAP observations, and their corresponding linear regressions.The seasonally varying β values are included in each plot.The SMAP/CYGNSS data points are colored to match the timeseries in (c).(b) 45-day SMAP emissivity and CYGNSS reflectivity data (as described in Section 2.2.2) and the corresponding β value.(c) SMAP NSSM timeseries with FLDAS monthly rainfall data.The NSSM data points are colored bi-monthly, to match the scatter plots in (a).The relationship between SMAP/CYGNSS β values and mean NDVI is shown in Figure9c.As expected, areas with high mean NDVI values (>0.8) have near-zero SMAP/CY-GNSS β values.Areas with low mean NDVI values (<0.1) also have near-zero SMAP/CYG-NSS β values, because barren areas are usually very dry.In contrast, the most negative SMAP/CYGNSS β values, which ultimately lead to the greatest spatial heterogeneity in downscaled TBs, are found in areas with moderate mean NDVI values (~0.2-0.6).

Figure 9 .
Figure 9.The number of 9 km grid cells (n-values) used to calculate each boxplot are included above each subplot.(a) Relationship between SMAP/CYGNSS β values and annual SMAP NSSM variance.Each boxplot spans a NSSM variance range of 0.0025.(b) Relationship between SMAP/CYGNSS β values and FLDAS mean annual precipitation.Each boxplot spans a precipitation range of 0.25 m.(c) Relationship between SMAP/CYGNSS β values and MOIDIS/Terra mean NDVI.Each boxplot spans an NDVI range of 0.05.(d) Relationship between SMAP/CYGNSS β values and SRTM topographic roughness.Each boxplot spans a topographic roughness range of 50 m.(e) Relationship between SMAP/CYGNSS β values and MOIDIS IGBP landcover class.
3) to compare SMAP/CYGNSS β values with vegetation density.The relationship between SMAP/CYGNSS β values and mean NDVI is shown in Figure 9c.As expected, areas with high mean NDVI values (>0.8) have near-zero SMAP/CYGNSS β values.Areas with low mean NDVI values (<0.1) also have near-zero SMAP/CYGNSS β values, because barren areas are usually very dry.In contrast, the most negative SMAP/CYGNSS β values, which ultimately lead to the greatest spatial heterogeneity in downscaled TBs, are found in areas with moderate mean NDVI values (~0.2-0.6).

Figure 9 .
Figure 9.The number of 9 km grid cells (n-values) used to calculate each boxplot are included above each subplot.(a) Relationship between SMAP/CYGNSS β values and annual SMAP NSSM variance.Each boxplot spans a NSSM variance range of 0.0025.(b) Relationship between SMAP/CYGNSS β values and FLDAS mean annual precipitation.Each boxplot spans a precipitation range of 0.25 m.(c) Relationship between SMAP/CYGNSS β values and MOIDIS/Terra mean NDVI.Each boxplot spans an NDVI range of 0.05.(d) Relationship between SMAP/CYGNSS β values and SRTM topographic roughness.Each boxplot spans a topographic roughness range of 50 m.(e) Relationship between SMAP/CYGNSS β values and MOIDIS IGBP landcover class.3.3.3.Relationship between Topographic Roughness and β Values SMAP emissivity and CYGNSS reflectivity are often poorly correlated in areas with high topographic roughness.Topographic roughness increases the uncertainty in both passive and active microwave remote sensing observations, due to increased scattering of the signal.The positive relationship between SMAP/CYGNSS β values and SRTM topographic roughness (described in Section 2.3) is shown in Figure9d, with β values approaching zero as topographic roughness increases.Since both dry and densely vegetated areas have near-zero β values, regardless of topographic roughness, we did not include any 9 km grid cells with mean annual precipitation less than 0.25 m or mean NDVI greater than 0.8 in our topographic roughness analysis.

3. 3 . 5 .
Figure 6b,c compare SMAP/CYGNSS β values and SMAP active-passive β values.While some spatial disagreements exist, mainly in Australia, the United States, and the Middle East, general spatial agreement can be seen across most of the globe.Complete spatial agreement between β values is not expected, as NSSM variability affects β (Section 3.3.1).Since there is no temporal overlap in SMAP active-passive data and CYGNSS data, and the calculation methods are not the same, a direct comparison of β values is not possible.The SMAP active-passive β values were calculated from 15 April-7 July 2015, and our SMAP/CYGNSS β values were calculated using data from April 2017-December 2020.The ~2.5 months of time-series data used to calculate SMAP active-passive β values is not representative of annual precipitation and NSSM variability, which may have caused incomplete β estimations, especially in arid regions[2].

4. 3 .
Correcting Problematic β ValuesNear-zero SMAP/CYGNSS β values yield low TB spatial heterogeneity, and positive β values are unrealistic.Before calculating global SMAP/CYGNSS TB and NSSM, we plan to replace erroneous near-zero and positive SMAP/CYGNSS β values with realistic, representative β values.Since most near-zero and all positive β values have poor (near-zero or positive) correlations, we will implement a correlation cutoff to determine which β values to replace.We chose a preliminary correlation cutoff of −0.4, to ensure the replacement of erroneous β values while minimizing the number of replaced β values.Future work will include testing a variety of correlation cutoffs during NSSM validation.Unlike the SMAP radar, CYGNSS does not include any cross-polarization data, so we are unable to replicate the regression model between cross-polarization backscatter (repre-sentative of vegetation level) and β used to create representative β values for the SMAP active-passive product[2].Instead, we will use the relationships between SMAP/CYGNSS β values and mean annual precipitation, mean NDVI, and NSSM variability to determine representative β values.Since the variation in β values produced by different landcover classes can be explained by the relationships between β values and mean annual precipitation, mean NDVI, and NSSM variability (Section 4.1), we will use landcover classes to create representative β values.Unfortunately, the correlation between β values and landcover class is not particularly high, as can be seen by the spread of the box and whisker plots in Figure9e.We therefore removed β values with correlations greater than −0.4 from the analyses.We calculated the median β value for each landcover class, including only 9 km grid cells with β values with R < −0.4.We then replaced β values where R > −0.4 with their median landcover class β values.Since MODIS landcover class changes annually, SMAP/CYGNSS β values will change annually.Figure11shows a comparison of uncorrected and corrected β values.

4 .
We then replaced β values where R > −0.4 with their median landcover class β values.Since MODIS landcover class changes annually, SMAP/CYGNSS β values will change annually.Figure 11 shows a comparison of uncorrected and corrected β values.
SMAP uses Passive Active L-band Sensor (PALS) flights during their calibration and validation campaigns.While no PALS flights are both spatially and temporally coincident with CYGNSS data, we can still use PALS data as a general comparison.During the SMA-PVEX15 campaign in southeastern Arizona between 1 and 18 August 2015, the standard deviation of PALS 1 km TBs within 3 separate 36 km pixels during 7 overpasses varied

Figure 12 .
Figure 12.(a) RMSD between 3 km SMAP/Sentinel TBs and 9 km SMAP TBs.(b) RMSD between 3 km SMAP/CYGNSS TBs and 9 km SMAP TBs.Gray histogram TBs were calculated using uncorrected β values.Red histogram TBs were calculated using landcover class corrected β values.The remaining zero-value RMSD TBs in the β-corrected histogram are due to ΔΓ values of zero, which occur when  is calculated using a single CYGNSS reflectivity value (Section 4.6.1).

Figure 12 .
Figure 12.(a) RMSD between 3 km SMAP/Sentinel TBs and 9 km SMAP TBs.(b) RMSD between 3 km SMAP/CYGNSS TBs and 9 km SMAP TBs.Gray histogram TBs were calculated using uncorrected β values.Red histogram TBs were calculated using landcover class corrected β values.The remaining zero-value RMSD TBs in the β-corrected histogram are due to ∆Γ values of zero, which occur when Γ C is calculated using a single CYGNSS reflectivity value (Section 4.6.1).

Author Contributions:
Conceptualization, C.C.C. and E.E.S.; methodology, L.J.W., C.C.C., E.E.S. and N.N.D.; software, L.J.W. and C.C.C.; validation, L.J.W.; formal analysis, L.J.W.; resources, C.C.C. and E.E.S.; data curation, L.J.W. and C.C.C.; writing-original draft preparation, L.J.W.; writing-review and editing, E.E.S., C.C.C. and N.N.D.; visualization, L.J.W.; supervision, E.E.S. and C.C.C.; project administration, C.C.C. and E.E.S.; funding acquisition, C.C.C. and E.E.S.All authors have read and agreed to the published version of the manuscript.Funding: This research was funded by the NASA Soil Moisture Active-Passive (SMAP) Science Team, award number 80NSSC20K1793.Data Availability Statement: β values, their corresponding correlations, and median RMSD TB and ∆Γ values for 2020 (as shown in Figures km SMAP/CYGNSS TB spatial heterogeneity is represented by spatial variations in TB F (Equation (3)) and is influenced by two factors.(1) Large spatial variations in CYGNSS reflectivity will create larger ∆Γ values (Γ F − Γ C ), which will increase the spatial variations of TB F .CYGNSS reflectivity varies due to natural spatial and temporal fluctuations in soil moisture, surface roughness, and vegetation.(2) More negative β values will also increase the spatial variations of TB F .SMAP/CYGNSS β values are further discussed in Section 2.2.2.