Improving CyGNSS-Based Land Remote Sensing: Track-Wise Data Calibration Schemes

: Cyclone Global Navigation Satellite System (CyGNSS) data have been used for generating several intermediate products, such as surface reﬂectivity ( Γ ), to facilitate a wide variety of land remote sensing applications. The accuracy of Γ relies on precise knowledge of the effective instantaneous radiative power (EIRP) of the transmitted GNSS signals in the direction of the specular reﬂection point, the precise knowledge of zenith antenna patterns which in turn affects estimates of EIRP, the good knowledge of receive antenna patterns etc. However, obtaining accurate estimates on these parameters completely is still a challenge. To solve this problem, in this paper, an effective method is proposed for calibrating the CyGNSS Γ product in a track-wise manner. Here, two different criteria for selecting data to calibrate and three reference options as targets of the calibrating data are examined. Accordingly, six calibration schemes corresponding to six different combinations are implemented and the resulting Γ products are assessed by (1) visual inspection and (2) evaluation of their associated soil moisture retrieval results. Both visual inspection and retrieval validation demonstrate the effectiveness of the proposed schemes, which are respectively demonstrated by the immediate removal/ﬁx of track-wisely noisy data and obvious enhancement of retrieval accuracy with the calibrated Γ . Moreover, the schemes are tested using all the available CyGNSS level 1 version 3.0 data and the good results obtained from such a large volume of data further illustrate their robustness. This work provides an effective and robust way to calibrate the CyGNSS Γ result, which will further improve relevant remote sensing applications in the future.

With public access to massive data collected by the Cyclone GNSS (CyGNSS) mission covering the continents below latitudes of 40 • , researchers are offered an unprecedented opportunity to explore land remote sensing applications using spaceborne GNSS-R, e.g., wetland classification [13], inland water detection [14,15], flash flood monitoring [16], and soil moisture (SM) retrieval [17][18][19][20][21].In general, CyGNSS-based land remote sensing is facilitated by the observed surface reflectivity Γ [22,23] or the bistatic radar cross section (BRCS) [24].However, it should be noted that the reliability of BRCS/Γ measurements depends on the good knowledge of the L1 Global Positioning System (GPS) effective instantaneous radiative power (EIRP) in the direction towards the specular point (SP) as well as the zenith and receiver antenna patterns.Unfortunately, unstable GPS transmitted power along with its non-uniform antenna gain patterns demises the confidence on EIRP estimations.Wang et al. proposed a dynamic calibration approach to compensate GPS EIRP variability [25].That work aimed at detecting and correcting the power fluctuations in GPS signals, but it relies on profound knowledge of antenna gain pattern and the effectiveness can be further improved (see demonstration in Section 3.2).The track-wise correction algorithm was applied to sea surface wind speed estimation first in [26] taking into account surface waves impact on the normalized BRCS (NBRCS), and was later investigated in [27,28] aided by an ancillary wind reanalysis product and a background numerical weather prediction model, respectively.However, a successful implementation of these approaches depends on the availability and quality of auxiliary reference data as well as a strong coupling between the desired and reference data.
In this paper, an effective and efficient method for track-wise calibration of CyGNSS Γ is proposed.Instead of depending on other auxiliary data, this approach treats "historical" CyGNSS Γ data as a reference.Three forms of such reference data are investigated, specifically, monthly medians, their mean, and their max/min.In addition, two standards are tested for selecting the data to be calibrated.Consequently, six data correction schemes are examined, and their performances are evaluated.Experimental assessment on these calibrated data involves direct visual inspections, statistics analysis, and validation of SM products retrieved subsequently.It is worth mentioning that although this work focuses on the calibration of CyGNSS Γ, and corresponding improvement on SM inversion, the proposed track-wise data calibration schemes can be extended to other CyGNSS observables e.g., BRCS/NBRCS and can promote various land remote sensing applications.After the introduction, Section 2 describes the CyGNSS data employed and the derivation of CyGNSS Γ. Section 3 presents the proposed calibration schemes, and the calibrated outcomes are displayed and appraised in Section 4. Section 5 illustrates a case study for demonstrating the benefit of calibrated data in SM sensing.Section 6 provides a summary and future improvement of this work.

CyGNSS Data
CyGNSS constellation is composed of eight micro-satellites offering GNSS-R measurements over different locations for most of the subtropics.CyGNSS is characterized by high spatial and temporal resolutions and good coverage from 37 • S to 37 • N. The CyGNSS Level 1 (L1) Science Data Record Version 3.0 (V3.0) dataset is used (available at https://podaac-tools.jpl.nasa.gov/drive/files/allData/cygnss/L1/v3.0, accessed on 20 May 2021) in this study.Compared with the older L1 V2.1 data that assume static GPS EIRP values, the V3.0 dataset uses a new real-time GPS EIRP monitoring technique to correct EIRP variations in the L1 variables.Details about the associated calibration process can be found in e.g., [25].Despite an improvement from V2.1 to V3.0, the effect of varying EIRP on both versions of CyGNSS data has yet to be fully mitigated (see demonstration below).In addition, as V3.0 supersedes V2.1 in the future, the main focus here is on V3.0 data, spanning from August 2018 to April 2021.
The CyGNSS L1 remote sensing data include the geo-located delay Doppler maps, BRCS (σ) and other information about the measuring status and geolocation, e.g., the incidence angle (θ), signal-to-noise ratio (SNR), latitude and longitude at SP, distances from SP to the transmitter and receiver (R t and R r ), etc.Here, CyGNSS data collected over land with an SNR over 0 dB (at SP) are used.To suppress errors in the estimated CYGNSS SP locations, only the BRCS data with a peak position within the 4th and 15th delay bins are persevered.A similar procedure is adopted in e.g., [19,29,30].In addition, data with the quality flag "SP in the sidelobe" are rejected, for which confidence in the antenna gain is low.Since only the single bin at the SP is used from each measurement, the spatial resolution is approximately 0.5 × 3.5 km 2 by considering a coherent reflection.

SMAP SM Product
The calibrated CyGNSS data will be undergo a further process of SM retrieval, and the effectiveness of proposed calibration schemes will be validated through evaluating the retrieved results.The CyGNSS-derived SM products will be compared with those obtained from the Soil Moisture Active Passive (SMAP) mission, specifically, the SMAP L3 radiometer global daily 36 km equal-area scalable earth grid (EASE-Grid) version 7 SM data [31].The spatial and temporal resolutions of the SMAP data are 36 × 36 km 2 and daily, respectively [32].
The SMAP data provide SM estimation, quality flag, vegetation optical depth τ in the EASE-Grid.Data with a retrieval quality flag of value 0 or 8 are retained, which indicates high-quality retrieval.The SMAP data acquired between August 2018 and April 2021 are used.

Method and Model
In this section, the computation of CyGNSS Γ is first introduced.Next, detailed calibrating procedures are described.Lastly, the adopted SM retrieval model is presented.

Derivation of CyGNSS Γ
The uses of CyGNSS BRCS and Γ have both lead to successful land remote sensing applications.For concision, surface reflectivity Γ is used in this work as a demonstration.In practice, CyGNSS-derived Γ can be readily computed from the CyGNSS BRCS σ, following the assumption of coherent reflections (see e.g., [13,19,23,29]) where R t and R r are, respectively the distances from the transmitter and receiver to SP.As mentioned in Section 2, σ, R t , and R r are accessible from the CyGNSS L1 data.An example of CyGNSS Γ for January 2020 is presented in Figure 1.Please note that Γ in this work are processed in logarithmic form.

Track-Wise Calibration Schemes
To improve the quality of CyGNSS σ, the L1 V3.0 dataset uses a real-time GPS EIRP monitoring configuration to correct its variations [25].Even so, errors can still be observed in a track-wise manner (see Figure 1a,c,e).Here, a CyGNSS track is uniquely identified by its 'track_id' and 'prn_code' (both from L1 metadata) that consists of the continuous measurements made over a single CyGNSS ground track and from a certain GPS pseudorandom number (PRN) code.In this work, a track-wise data calibration approach is developed to fix the abovementioned errors, and the detailed procedures are described as follows.First, a reference CyGNSS Γ needs to be set.Through investigation, the monthly medians are found to be more immune to track-wise biases than the monthly mean (see comparison between Figure 1c,e,d,f).To further explore the optimal baseline, three potential references are investigated, specifically, (1) the monthly medians (Med(Γ)), (2) the mean of monthly medians (Mean[Med(Γ)]), and (3) the max/min of monthly medians (Max/Min[Med(Γ)]).After the pre-processing introduced in Section 2, the CyGNSS Γ is track-wisely derived and processed, and the collocated reference Γs in three different forms are extracted.Next, the obtained track-wise Γ is compared with the references to check if a correction is needed.Here, two different criteria are empirically considered and designed: (1) Is Γ outside the range of Mean(Med(Γ)) ± 2 Var(Med(Γ)) and (2) Is Γ beyond the scope of [Min(Med(Γ)), Max(Med(Γ))]?Here, any sub-track containing 10 or more consecutive "biased" Γ should be fixed.The selection of such threshold (10) is found to well preserve the geophysical characteristics of land surfaces and effectively capture the fluctuations of GPS transmitted power through experiments, and thus adopted here.All sub-tracks awaiting correction are least-square fitted to the corresponding references via an additive offset (in dB) by assuming the error during a sub-track is uniform.To sum up, with three targeted references and two data sorting standards, six combinations of data calibration strategies are developed in this work.A flowchart summarizing the overall process of data correction is presented in Figure 2.

SM Retrieval Model
With the calibrated CyGNSS Γ data over land, their usefulness and robustness are worth being verified in relevant remote sensing practices.Here, these data will be applied to soil moisture inversion as a case study.
In the present work, the model for retrieving SM is built based on the correlation between the local fluctuations of SM (∆SM) and variations of CyGNSS Γ (∆Γ), and will be parameterized in a pixel-wise manner based on the EASE-Grid.The employed model is in the following form (in favor of SM retrieval) [17,30]: where a and b are coefficients to be determined.

Results and Analysis
In this section, outcomes of the proposed data calibration schemes are presented and assessed.The effectiveness and robustness of the data calibration process are verified with (1) direct visual inspection and (2) comparison among the peak SNRs (PSNRs) obtained using different data processing schemes.
As stated above, monthly medians and their mean/max/min are treated as optional baselines for calibration.Here, the CyGNSS data collected over the year of 2020 were used to compute these values because they represent the largest data volume among the calendar years from 2018 to 2021.It should be noted that only the data in 2020 were used as reference to prove that the proposed method does not require the knowledge of ongoing data (as reference).Through following the correction steps described above, the CyGNSS L1 V3.0 data from August 2018 to April 2021 were calibrated with six different schemes (hereafter, Scheme I-VI), respectively.In addition to these six solutions, the thresholding method (Thres.)introduced in [29] that only preserves Γ below 0.1 (rejected less than 4% of the original data) was also included in comparison.An illustration of the seven corresponding outcomes and the original uncalibrated (Uncali.)results for August 2020 is displayed in Figure 3.After calibration, different Γ outcomes show a similar pattern over the globe.Still, some large-scale distinctions can be observed.For example, over Congo Basin and Qinghai-Tibet Plateau, the values of Γ derived using Scheme I-VI appear lower than those of uncalibrated data.To further demonstrate this and to provide a closer inspection on those Γ results (whose data density was found to be roughly homogeneous and is shown in Figure 4), the annual mean values for the region covering [95,105] • E and [25,35] • N (that is part of Qinghai-Tibet Plateau) are exhibited in Figure 5.It is obvious that the calibrated data show clearer patterns of landscapes within such region.Notably, the inland water in the uncalibrated data is not as visible as that in the calibrated ones (see the black ellipses), which can be verified by the water fraction map (Figure 6) adapted from the Global Surface Water dataset [33].
Furthermore, the PSNR of the corrected Γ for month i (Γ ic ) is calculated through where subscript i indicates the month index.PSNR was computed for each data processing strategy and the results are summarized in Table 1.The calibrated data are characterized by the highest PSNR, followed by the thresholded and uncalibrated ones (decreasing progressively).An enhancement of data quality due to the proposed calibration schemes can be clearly noticed by simple visual inspection (see Figures 5 and 6).The presence of inland water can be well identified in the calibrated data, while it is blurred by biases in the original data.Thus, the efficiency of calibration process is demonstrated and the necessity of including such a process in CyGNSS-based remote sensing e.g., inland water mapping is obvious.Furthermore, the calculation of PSNR (see Table 1) offers informative and quantitative descriptions on these results and consolidates the superiority of the calibration process.Recall that Schemes I-III and Schemes IV-VI use two different data selection criteria and Schemes I and IV, Schemes II and V, and Schemes III and VI respectively share the same reference (see Figure 2).From one aspect, Schemes IV-VI have a stricter selection criterion for calibrating data, and thus, resulting in less sub-track data being processed, which is ultimately reflected by a lower overall PSNR than their counterparts.From the other perspective, since the monthly median of Γ, i.e., Med(Γ), is regarded as the "ground-truth" for CyGNSS product, it is natural for Scheme I (whose reference is Med(Γ)) to have a better PSNR over Scheme II-III, and for Scheme IV (for which the reference is Med(Γ)) over Scheme V-VI.For the case of thresholding, the overall performance is better than that of the uncalibrated data, but this scheme fails to reveal the water bodies as presented in Figure 5d because it cannot detect potential biases with a value lower than 0.1.It is worth mentioning that although the baseline Med(Γ) is determined using the CyGNSS data in 2020, the PSNR results are comparable in the inter-annual comparison.The overall good results indicate the generality of developed approach without requirement of simultaneous references.

Discussion
SM sensing using CyGNSS has been widely investigated.Here, the significance of calibrating the data will be shown based upon the improvement in their resultant SM products.Furthermore, it offers a new perspective to assess the calibrated data using the completely independent SMAP dataset.For these reasons, the outcomes achieved are further evaluated and discussed in this section.It is worth mentioning that the CyGNSS data are aggregated into the EASE-Grid (36 km) to facilitate the later comparison.
For each type of data (calibrated with Scheme I-VI, Thres., and Uncali.), the model for SM estimation i.e., Equation (2) was individually trained, more specifically, the coefficients a and b were determined for each category.The data collected in 2020 were employed for training to determine these coefficients for Equation (2), and the rest (from August 2018 to December 2019 as well as the first four months in 2021) were used as test data.For illustration, the density plots displaying the training performance of Scheme IV and uncalibrated data are presented in Figure 7.The overall statistics for both training and test groups with different correction schemes are shown in Figure 8.It can be seen that the root mean square error (RMSE) generally decreases and the correlation coefficient (R) increases after applying a correction process.In addition, the absolute errors (AE) between the SMAP and retrieved SMs were derived (see Table 2) and an example in Australia for November 2020 is shown in Figure 9.It can be noticed that the improvements are actually quite small when comparing the RMSE, AE and R.This can be due to (1) the rough spatial resolution (36 km) and (2) the simplicity of the retrieval model.The optimal one is in bold.To appraise the temporal variability of retrieved SM, the variances of SM products [Var(SM)] were calculated.It was found that the Var(SM) derived from CyGNSS Γ using various data process schemes all agreed well with that of the reference SMAP data, and their differences were insignificant.As an example, Var(SM) of SMAP and CyGNSS Scheme I data are presented in Figure 10.
Based on the SM retrieval results, Scheme IV shows the optimal accuracy in terms of RMSE, AE and R (see Figure 8 and Table 2).Through calculation, the overall improvement of SM estimates using Scheme IV over the original one can be demonstrated by an increase in R by 0.012 and decreases in AE and RMSE by 1.84 × 10 −3 cm 3 /cm 3 and 3.09 × 10 −3 cm 3 /cm 3 , respectively.Another direct proof of its outperformance can be inferred from Figure 9 in which trivial retrieval errors are observed with Scheme IV.Those findings suggest that the use of Scheme IV can improve future land remote sensing applications.However, it is worth mentioning that Schemes II and V produced lower accuracy than that of the original data.This may be due to their reference is the mean of monthly median that misses the seasonality characteristics.

Conclusions
Due to an incomplete amendment on CyGNSS L1 V3.0 data, their products and derivatives still suffer from errors due to inaccurate estimate of GPS EIRP.In this paper, an effective method is proposed for calibrating the CyGNSS Γ data in a track-wise manner over land, which is realized via each of the six different schemes based on various combinations of two data selection standards and three reference options.During the first stage of tests, all of the calibration schemes showed an enhancement of PSNR over the original data and a better capability in capturing inland water bodies.These results were further evaluated by assessing their corresponding SM estimation results.Scheme IV (that calibrates data beyond the max/min scope of monthly medians by referring to the monthly medians) produced the best retrieval accuracy, and thus was recommended for land remote sensing applications.Still, the improvements of SM retrievals are quite limited, as such, this work can be further investigated with reference data of better spatial resolution and using advanced retrieval models.
Although the capacity of calibrated data in better revealing inland water bodies was reported here, their efficiency in relevant real-world applications was not quantified.Future work consists of testing the calibrated data for other land remote sensing practices as well as applying the calibration scheme to data collected over oceans. dB

Figure 1 .
Figure 1.Comparison between monthly: (a,c,e) mean and (b,d,f) median of uncalibrated Γ for January 2020.The mean Γ suffer from track-wise biases, shown by demonstrations of two boxed regions (purple and red boxes) in (a,b).The zoomed views of these two boxed areas are exhibited in (c-f), respectively.The spatial resolution of each subplot is 0.1 • × 0.1 • .

Figure 2 .
Scheme IScheme II Scheme III Scheme 1V Scheme V Scheme VI dB

Figure 4 .
Figure 4. Data density (normalized with respect to the max occurrence) of CyGNSS data over the region covering [95,105] • E and [25,35] • N.

Figure 5 .
Figure 5. Annual mean of Γ for 2020 with/without calibration: (a) Scheme I, (b) Scheme II, (c) Scheme III, (d) Thres., (e) Scheme IV, (f) Scheme V, (g) Scheme VI, and (h) Uncali.Inland water can be well identified from the calibrated data (see the black ellipse).

Figure 9 .
Figure 9. Absolute errors of retrieved SM for November 2020 using: (a) Scheme I, (b) Scheme II, (c) Scheme III, (d) Thres., (e) Scheme IV, (f) Scheme V, (g) Scheme VI, and (h) Uncali.In general, Schemes I and IV exhibit less errors (see the indication by ellipses).

Figure 10 .
Figure 10.Variance of SMs for: (a) reference, (b) retrieved from CyGNSS Scheme I data, and (c) their difference.

Table 1 .
PSNR (in dB) of Calibrated and Uncalibrated Data.