Comprehensive Evaluation of Using TechDemoSat-1 and CYGNSS Data to Estimate Soil Moisture over Mainland China

: Spaceborne Global Navigation Satellite System Reflectometry (GNSS-R) provides a new opportunity for land observation. This study is the first to compare and evaluate the performance of the only two spaceborne GNSS-R satellite missions whose data are publicly available, i.e., the UK’s TechdemoSat-1 (TDS-1) and the US’s Cyclone Global Navigation Satellite System (CYGNSS), for sensitivity analysis with SMAP SM on a daily basis and soil moisture (SM) estimates on a monthly basis over Mainland China. For daily sensitivity analysis, the two data were matched up and compared for the period (i.e., May 2017 through April 2018) when they coexisted (R = 0.561 vs R = 0.613). For monthly SM estimates, a back-propagation artificial neural network (BP-ANN) was used to construct a model using data from more than two years. The model was subsequently used to derive long-term and continuous SM maps over Mainland China. The results showed that TDS-1 and CYGNSS agree and correlate very well with the SMAP SM in Mainland China (R = 0.676, MAE = 0.052 m 3 m -3 , and ubRMSE = 0.060 m 3 m -3 for TDS-1; R = 0.798, MAE = 0.040 m 3 m -3 , and ubRMSE = 0.062 m 3 m -3 for CYGNSS). The retrieved results were further validated using monthly in situ SM data from dense sites across Mainland China. It was found that the SM derived from the TDS-1 / CYGNSS also correlated well with in situ SM (R = 0.687, MAE = 0.066 m 3 m -3 , and ubRMSE = 0.056 m 3 m -3 for TDS-1; R = 0.724, MAE = 0.052 m 3 m -3 , and ubRMSE = 0.053 m 3 m -3 for CYGNSS). The results in this study suggested that TDS-1/CYGNSS and the upcoming spaceborne GNSS-R mission could be new and powerful data sources to produce SM data set at a large scale and with relatively high precision.


Introduction
Global Navigation Satellite System Reflectometry (GNSS-R) is a technique that exploits the capability of GNSS satellites to act as a bistatic-radar with the GNSS satellites as its transmitters and the receiver capable of processing scattered signals from the Earth's surface [1,2]. With the development of spaceborne observations, the GNSS-R technology provides new opportunities for Earth observation on a global scale. The first spaceborne GNSS-R was launched on the UK-Disaster Monitoring Constellation (UK-DMC) satellite in September 2003, which proved that spaceborne GNSS-R signals can reliably measure environmental parameters for ocean and land surface [3,4]. The TechdemoSat-1 (TDS-1), the experimental GNSS-R satellite, launched in July 2014, has demonstrated the strong sensitivity of the GNSS-R signal to various ocean and land parameters [5][6][7][8][9]. However, TDS-1 has limitations in data acquisition, both spatially and temporally, because the TDS-1 contains only one GPS-R payload and active for two out of every eight days, and the payload of TDS-1 stopped transmitting data in December 2018. The Cyclone Global Navigation Satellite System (CYGNSS) mission, launched into space in December 2016, contains eight microsatellites with the same payload as the TDS-1. The CYGNSS was designed to measure ocean winds in the tropics, while reflections observed from the satellites were also proved sensitive to land parameters [10][11][12][13]. Table 1 shows general information of the TDS-1 and CYGNSS missions. Compared to TDS-1 (10-35 days) [14], the CYGNSS microsatellites randomly receive GNSS-R signals with revisiting times of approximately 2.8~7 hours per day [12,15]. Table 1. General Information of the TechdemoSat-1 (TDS-1) and Cyclone Global Navigation Satellite System (CYGNSS) missions. Soil moisture (SM) has a significant impact on the earth's ecosystem by affecting the hydrological processes and climate changes, and it also plays an important role in land surface evapotranspiration, water migration, and the carbon cycle [16,17]. This reveals the necessity to obtain and analyze SM information in the long-term and over a large scale for such applications. Common remote sensing approaches to measure SM on a large scale mainly rely on optical and microwave sensors [18,19].

TDS-1 CYGNSS
Optical remote sensing data have high-spatial resolution (hundreds of meters) but are impacted greatly by cloud and mist. Passive microwave remote sensing data have high-temporal resolution (one or two half-orbit per day) with low-spatial distributions (tens of kilometers). Research results have shown methods to downscale the SM product from SM measurements (i.e., soil moisture ocean salinity (SMOS)) to obtain high-resolution SM maps at 1 km from the approximately 40 km native resolution of the instrument; this is closer to the spatial resolution achieved by GNSS-R instruments [20,21]. Spaceborne GNSS-R is an attractive approach for regional and global scale SM measurements because the GNSS signal is in the L-band, which is the same as the SM missions such as soil moisture active and passive (SMAP) and SMOS satellites, making it optimal for SM remote sensing [22,23].
Additionally, a constellation of GNSS-R receivers shortens the revisit time compared to traditional microwave remote sensing sensors.
Due to the abovementioned advantages of monitoring the SM with spaceborne GNSS-R, and in order to support the development of future GNSS-R satellites/constellations dedicated for monitoring the SM, many studies have focused on developing algorithms for SM estimation using the TDS-1 and CYGNSS observations [6,8,12,15,24,25]. Because TDS-1 and CYGNSS are both designed for ocean sensing, the data processing on land has inherent limitations, such as that they cannot receive adequate data from high altitude regions due to no consideration of land topography. Table 2 shows a summary of relevant studies using TDS-1 and CYGNSS to estimate SM. From Table 2, it can be concluded that notable gaps remain for evaluating use of the two data to estimate SM: 1) No study compares the current two existing spaceborne GNSS-R missions (i.e., TDS-1 and CYGNSS) whose data are publicly available. The two missions and their data have several common points (e.g., the same payload SGR-ReSI and the same key observable DDM SNR for SM sensing). Precisely because of this situation, the comparisons between them will bring new insights into better using their data and provide reference information for designation of future GNSS-R payloads for soil moisture sensing.
2) No attempt has been made to evaluate the performance of using the two data to estimate both daily and monthly SM, and a few studies focused on comparing the SM derived from CYGNSS with the SM derived from limited numbers of in situ sites. Table 2. Summary of relevant studies using TDS-1 and CYGNSS to estimate SM. The "key results" column is summarized from the following two aspects: 1) key conclusion and 2) accuracy evaluations. N/A represents "not applicable." Aiming to address the above-mentioned two issues, the overall objectives of this study are 1) for the first time ever to compare and evaluate the performance of the only two spaceborne GNSS-R satellite missions whose data are publicly available (e.g., TDS-1 and CYGNSS) for soil moisture estimations; and 2) comprehensively validates the GNSS-R derived soil moisture by taking advantage of the dense in situ networks over Mainland China. Specifically, the TDS-1 and CYGNSS data are matched up regionally and compared during their overlapping periods (i.e., May 2017 through April 2018). For daily sensitivity analysis, the effective reflected power (Pr,eff) and the surface reflectivity (SR) proposed by Chew et al. [6,12] were used as proxies to evaluate their ability to estimate SM. For the monthly SM estimates, to reduce vegetation and ground effects, a new SM retrieval algorithm combining variables affecting SM (i.e., vegetation cover, surface roughness, topography, and precipitation) and data for deriving SM (i.e., SR and SMAP SM data) were constructed based on the back propagation ANN (BP-ANN). The model was applied to compute long-term (over one year) and large-scale (a 50-km grid for TDS-1 and a 10-km grid for CYGNSS) SM data sets over Mainland China. Finally, the results were validated by the SMAP SM products and in situ data. The results of this study provide the possibility that TDS-1/ CYGNSS and the upcoming spaceborne GNSS-R missions could be new and powerful data sources to produce SM data set with large scale and relatively high previse precision.

Spaceborne GNSS-R Dataset
The study area is the region of Mainland China (73 through 135°E and 18 through 53°N), as shown in Figure 1. Data from TDS-1 and CYGNSS are used in this study. The two missions coexisted between the period from May 2017 to April 2018. TDS-1 level 1b data were downloaded via ftp.MERRByS.co.uk, and CYGNSS level 1 data, version 2.1, were downloaded via https://podaac.jpl.nasa.gov/. The DDM is the main observable of the data. One DDM is a twodimensional image generated by the scatter power from the surface of the specular point with the surroundings. DDMs are computed with a locally generated replica for different path delays and Doppler shifts. The SNR is computed from the averaged values of several delay-Doppler bins around the peak, to the average noise floor computed for delay lags before the reflected signal. Since the CYGNSS on-board data compression algorithm has an elevation upper limit of 600 m [12,13], which thus makes the valid observations mainly distributed in the southeastern part of Mainland China. For TDS-1, the reflected waveforms from surface with elevations over 3000 m were excluded, as done previously [6]. The TDS-1 and CYGNSS data were then filtered for 1) the antenna gain greater than 0 dB (corresponding to uncertainties reported in the measured antenna gain patterns) [6] and 2) the elevation angle of the specular point higher than 30° (to keep the good-quality Left Hand Circularly Polarized (LHCP) data) [12]. Also, a parameter called DirectSignallDDM in the TDS-1 L1b data and Quality Flags (i.e., direct signal in DDM, low confidence in the GPS EIRP estimate) in the CYGNSS L1b data were used to select the good data acquisitions. The NASA Shuttle Radar Topographic Mission (SRTM) 90 m Digital Elevation Models (DEM) database was applied to compute the elevations of the TDS-1 and CYGNSS coverage areas.

In situ Measurements
Monthly averaged in situ SM data of 588 sites selected from the meteorological observation network of Mainland China were used for validation ( Figure 1). Considering the complex geographical environment and climate conditions in China, the selected sites are distributed in seven provinces of China, with different land covers, climate conditions, and terrain distributions ( Table 3). All the in situ SM data were collected at a depth of 10 cm. Additionally, as shown in Figure 1, to exclude the effects of vegetation cover, buildings, inland water bodies, etc., the selected sites were located in bare soil and low vegetated density regions (i.e., vegetation height < 5 m) identified by the Global Land Cover Map for 2009 (GlobCover 2009).

Calculation of the Daily Surface Reflectivity (SR) and the Effective Reflected Power (Pr,eff)
For daily basis sensitivity analysis, the surface reflectivity (SR) and effective reflected power (Pr,eff) proposed by Chew et al. [6,12] are used as a proxy for SM to compare it with the SMAP SM products. The SR and Pr,eff are calibrated variables of SNR after correcting effects such as antenna gain, receiver noise, and range items. The SR and Pr,eff depend on the reflectivity of the soil, which is related to the dielectric constant [27].
For TDS-1 data, as presented by Carreno-Luengo et al. [26] and Chew et al. [6], the effective reflected power (Pr, eff) is defined as the corrected SNR of the DDM, as shown in Equation (1). The SNR (in dB) correction is based on the bistatic radar equation describing the coherent component of the received power.
where SNR is the peak power minus the noise, Rts is the range from the transmitter to the specular reflection point, Rsr is the range from the specular reflection point to the receiver, G r is the antenna gain toward the specular reflection point, and is the incidence angle.
For CYGNSS data, Chew et al. [12] proposed that the SR (in dB) can be described as follows: where P t r is the transmitted power, G t is the gain of the transmitting antenna, and λ represents wavelength of the GPS L1 bands signal (0.19 m).

Monthly SM Estimation Using Neural Network
GNSS-R reflectivity is sensitive to SM and to other geophysical parameters, e.g., vegetation canopy, elevation, slope, surface roughness, and precipitation [28][29][30][31]. Thus, for monthly SM estimates, a new model considering the aforementioned variables was constructed using the BP-ANN to estimate continuous SM over the study area ( Figure 2). The BP-ANN is a supervised learning algorithm, which refers to a multi-layers forward neural network with an input layer, one or more hidden layers, and an output layer. BP-ANN can be used in many tasks, e.g., classification and regression [32,33], and is also used in the geoscience field [34]. BP-ANN can, in principle, efficiently handle input and output variables relations, with no limited in linear relationships [35,36]. A multifactor non-linear regression model was applied in this study. During the model processing, contributions of individual variables and their combinations to the learning process were assessed to determine the optimal inputs for SM estimations. In addition to the variables mentioned above (i.e., NDVI, VWC, elevation, slope, and roughness), the variable of noise floor derived from the native data was also considered. This variable reflects the DDM noise. Additionally, numerous studies show that seasonal SM and precipitation have a significant interaction, and SM is probably influenced by insignificant precipitation changes. Hence, precipitation was also considered [37,38]. In order to choose the optimal input variables for SM retrieval, contributions of individual variables and their combinations to the sensitivity of SM were assessed (using the correlation coefficient (R) , RMSE, and the mean absolute error (MAE) as indicators), as shown in Table 4. The statistical indices from models 2 and 3 show that the combination of NDVI and VWC provides an optimal performance. Models 3~5 show that elevation and precipitation appear to positively affect the model, and precipitation has a slightly lower impact than that of elevation. Models 6 and 7 were used to investigate the slope and surface roughness, and both show positive correlations with SM.
The contribution of noise is negative, as shown in models 8 and 9, so this variable is not selected.
Most important of all, to determine the contribution of SR, model 10 used all the other variables (excluding noise) except SR. Note that the correlation coefficient of model 10 is much lower than that of other models, which proves that the SR has the highest impact on the overall performance. The best variable combination (i.e., model 7) was selected for the subsequent SM estimation. The model test was executed using CYGNSS data from April 2018. Three different layers were contained in the model, i.e., the input layer, the hidden layer, and the output layer. The input layer contained multi-parameters affecting SM, i.e., NDVI, vegetation water content (VWC), elevation, slope, precipitation, and roughness data, as shown in Figure 2. The NDVI and VWC were estimated from the Moderate-Resolution Imaging Spectroradiometer (MODIS) Aqua Surface Reflectance Daily Global 500m data set. The elevation, slope and roughness data were derived from the SRTM 90m DEM data set. The precipitation was derived from the Global Precipitation Measurement (GPM) Level 3 data set. The SMAP Level 3 SM data with a 9-km resolution were used in the output layer during the training stage of the model for optimization of the BP-ANN parameters. All data sets of the input and output layers were averaged to obtain monthly values. The VWC and roughness were computed from NDVI and slope with empirical relations [39,40], respectively.
The datasets were normalized to obtain values between 0 and 1 prior to training. The datasets were divided into a training set, testing set, and validation set, accounting for 60%, 20%, and 20%, respectively. The training set was used to adjust the weights on the neural network, the testing set was used to test the network performance, and the validation set was used to minimize overfitting [34,36]. Repeated trainings were tested to obtain an optimal neural network to achieve reasonable results. Table 5 shows the accuracy assessment of the multiple regression models. As shown in Table   5, the non-linear model is better than the corresponding linear model; two hidden layers show the highest precision; also, the hyperbolic tangent performs better than others. Hence, the ANN structure used in this paper was as follows: the input layer has seven nodes, which is the same as the number of used features. The output layer has a single node that is the predicted SM values. There are two hidden layers, and the number of nodes is 8. The hyperbolic tangent is chosen as the activation function. The last layer is a regression layer with no activation function. The maximum training number was set to 6000, the error metric was being minimized as RMSE, the error threshold was set to 0.001, and the learning rate was set to 0.05.

Results and Comparisons
In this section, the results of TDS-1 and CYGNSS were analyzed for sensitivity analysis on a daily basis and SM estimation on a monthly basis, respectively. Due to the fact that the TDS-1 data stopped transmitting data in December 2018, and the data after April 2018 is very sparse, in Section 3.  and R, were computed to quantify the accuracy.

Sensitivity Analysis from TDS-1 and CYGNSS on a Daily Basis
As an example, Figure 3  Due to the sparse distribution of TDS-1, for each grid cell (9 × 9-km), the maximum number of matches of TDS-1 Pr,eff and SMAP SM is 11; most are five to seven, so the R estimation for each grid may not be very particularly accurate. Thus, only the R of the SR derived from CYGNSS against the SMAP SM for each grid cell over the entire time period is shown (Figure 4(a)). The inner box in Figure   4(a) illustrates the numerical distributions of R for CYGNSS at a daily scale. The VWC estimated from NDVI with empirical relations [39] is also presented to show the influence of vegetation on the derived SM (Figure 4(b) [8,25]. This may be due to the absorption of the signal by the vegetation resulting in a large attenuation of the signal intensity; thus, the estimated SM is lower than the actual value [24].
To estimate SM under dense vegetation, future research of an algorithm suitable for bare soil or lowvegetation surfaces should be improved to consider removing the effects of vegetation.  Figure 5). The underestimates of the SMAP SM products over high vegetation regions may lead to this phenomenon [41,42]. The difference in scales between the SMAP pixels and the CYGNSS points may also be linked to the biases.

SM Estimation From TDS-1 and CYGNSS on a Monthly Basis
Compared with most atmospheric processes, soil moisture has a longer memory time and may impact the climatic characteristics; hence, monthly SM is a key variable for climatology studies and soil hydrology [37,38]. conversely, the SM in the tropic region (the gray box in Figure 6(a1)) tends to be high most of the time.  China during the same four months as that of TDS-1 in Figure 6. Overall, the CYGNSS data well reflects the SM dynamics during the observed period with different seasonal amplitudes, and there is a good agreement between CYGNSS and SMAP. Spatially, the estimated SM varies significantly, similar to that of TDS-1 in Figure 6. Note that the estimated SM of the northeast (the box in Figure 7     May to October, which are relevant to the declining trend of the R. As shown in Figure 9(b), the CYGNSS R are higher than 0.7 in most months and, conversely, tend to be low from August to October. Similar to TDS-1, the CYGNSS MAE value shows a rising trend from June to September, which indicates that the estimated SM had lower accuracies from June to September compared to other months. This is probably due to the growth of vegetation and the increase in surface roughness.
The histograms of sample numbers were also shown. The sample numbers of CYGNSS show little differences per month. However, the sample numbers of TDS-1 vary from month to month. For example, the numbers of TDS-1 in July and October 2017 are very few, which may due to the instability of the payload. This may lead to the precision of SM estimates in these two months lower than other months.

Validation of TDS-1 and CYGNSS Results with in situ Measurements
Given    (Figure 12(a-c)). Figure 12(a-c) shows the time-series at each site. Twenty-seven sites in each typical region were chosen. Since the in situ data were monthly averaged, so 12 points were contained for each site per year, representing twelve months. Figure 12

Issues Related to the BP-ANN Model
Overall, the proposed model based on BP-ANN generated promising results with comparable accuracy to the referenced SMAP data and the in situ measurements, demonstrating that it could be generalized for regional SM estimation. However, because the SMAP SM data were used in the output layer of the BP-CNN model during the training stage, the correlation coefficients of the estimated SM and the SMAP SM (R = 0.676 for TDS-1; R = 0.798 for CYGNSS) were similar with or better than that of the estimated SM and the in situ measurements (R = 0. 687 for TDS-1; R = 0. 724 for CYGNSS). Although similar validation methods have been used in other publications [12,24], it is better to involve other data sources to cross-validate the SM results.
The model can be further trained with additional variables affecting SM (e.g., vegetation optical depth and soil composition). It should also be noted that an increase in parameters does not always improve accuracy. Additionally, the proposed model in this study was not applicable for a daily scale. Future research should consider this issue and improve the model by involving additional daily ancillary data from multiple sources.

Advantages and Limitations of Spaceborne GNSS-R for Estimating SM
Spaceborne GNSS-R, e.g., TDS-1 and CYGNSS as presented in this study, can estimate SM with accuracy comparable to L band SM satellite missions such as SMAP. The results show the two mission can estimate SM with accuracy comparable to L band SM satellite missions such as SMAP. Moreover, as the first GNSS-R constellation, CYGNSS can provide detailed spatial variabilities of SM with a very short revisit time. The GNSS-R payload is light in weight and cost effective, which makes it possible to design small satellite constellations. It is believed that future spaceborne GNSS-R missions will have better spatial and temporal resolutions for sensing SM.
Possible errors of estimating SM using TDS-1/CYGNSS are explained as follows: 1. Different spatial scales between in situ/SMAP points and TDS-1/CYGNSS points. Although ground measurements from dense sites were used to reduce this well-known issue, the differences in spatial resolution continue to introduce deviations. Future research may consider downscaling the SMAP SM product to the same resolution as the GNSS-R. 2. Effects in terms of VWC, roughness, and elevation etc. For daily sensitivity analysis results, the vegetation severely affects the accuracy of SM, particularly over the central part of Mainland China (VWC>6 kg/m 2 vs. R<0.6). The accuracies of monthly results were improved since these variables were considered in the proposed neural network model. Nevertheless, surface roughness and complex terrain environments may still reduce the estimation accuracy. Subsequent research may attempt to a potential way to reduce this impact by using changes instead of absolute reflectivity values. 3. Mismatch between the depth of microwave penetration and the depth of in situ SM measurements. The in situ measurements used for validation are at 10 cm, whilst the GNSS L-band signal has various penetration depths between 0 cm and 20 cm, depending on the soil's wetness, as Camp et al. shown in [8]. 4. Difficulty of matching different remote sensing datasets to each other and the GNSS-R daily values. As mentioned before, the daily MODIS NDVI data set is severely affected by cloud and fog, and currently NDVI does not have a commonly used product. Future study may focus on generation of daily continuous surface soil moisture of high spatial resolution using spaceborne GNSS-R data; daily NDVI estimation method as Zhao et al. proposed [43] may be a good inspiration.

Conclusions
This study, taking Mainland China as an example, gave a comprehensive evaluation and