Exploration of Multi-Mission Spaceborne GNSS-R Raw IF Data Sets: Processing, Data Products and Potential Applications

Earth reflected Global Navigation Satellite System (GNSS) signals can be received by dedicated orbital receivers for remote sensing and Earth observation (EO) purposes. Different spaceborne missions have been launched during the past years, most of which can only provide the delay-Doppler map (DDM) of the power of the reflected GNSS signals as their main data products. In addition to the power DDM products, some of these missions have collected a large amount of raw intermediate frequency (IF) data, which are the bit streams of raw signal samples recorded after the analog-to-digital converters (ADCs) and prior to any onboard digital processing. The unprocessed nature of these raw IF data provides an unique opportunity to explore the potential of GNSS Reflectometry (GNSS-R) technique for advanced geophysical applications and future spaceborne missions. To facilitate such explorations, the raw IF data sets from different missions have been processed by Institute of Space Sciences (ICE-CSIC, IEEC), and the corresponding data products, i.e., the complex waveform of the reflected signal, have been generated and released through our public open-data server. These complex waveform data products provide the measurements from different GNSS constellations (e.g., GPS, Galileo and BeiDou), and include both the amplitude and carrier phase information of the reflected GNSS signal at higher sampling rate (e.g., 1000 Hz). To demonstrate these advanced features of the data products, different applications, e.g., inland water detection and surface altimetry, are introduced in this paper. By making these complex waveform data products publicly available, new EO capability of the GNSS-R technique can be further explored by the community. Such early explorations are also relevant to ESA’s next GNSS-R mission, HydroGNSS, which will provide similar complex observations operationally and continuously in the future.


Introduction
The opportunistic use of Earth reflected signals from Global Navigation Satellite System (GNSS) has been a valid option for remote sensing during the last decades. This technique, known as GNSS reflectometry (GNSS-R), was proposed ∼30 years ago for ocean scatterometry [1] and ocean altimetry [2]. It can take measurements along several reflection tracks from different GNSS transmitters in parallel, and thus provide exceptional spatio-temporal sampling capability and rapid revisit time over the Earth surface. As investigations progressed, dedicated experimental (e.g., [3][4][5][6]) and theoretical studies (e.g., [7,8]) have expanded this concept to remote retrieval of geophysical information from various types of natural covers. Comprehensive introductions of GNSS-R technique and its applications can be found in [9,10].
The landscape of GNSS-R space missions (as described in Section 2) shows that the potential of the GNSS-R technique has been well recognized among the scientific community after its early space demonstrations. Currently there is a good number of LEO satellites in orbit carrying dedicated GNSS-R payloads, from which different geophysical data products have been derived over ocean, land and polar ice. Most of these missions generate the delay-Doppler map (DDM) of the reflected signal as the main data product. Normally, the sampling rate of these power DDM product is 1 Hz (with 1 s coherent integration time), which corresponds to a spatial sampling resolution of 6-8 km. Meanwhile, new GNSS-R spaceborne missions and advanced instruments have been proposed or under study by different space agencies or even commercial sectors. These new missions trend to address different geophysical applications over different types of natural surface by fully exploiting the capability of the GNSS-R concept. In particular, the G-TERN mission [22] was proposed for high spatio-temporal resolution sea ice and ice sheet sensing, and the HydroGNSS mission aims to explore the potential for GNSS-R to sense hydrologically related Essential Climate Variables (ECVs) [23]. Meanwhile, with continuous evolution of GNSS constellations, there are more visible GNSS satellites in orbit and more available GNSS signals at different frequency bands. By using reflected signals from multi-GNSS constellations and multi-frequency bands rather than only GPS L1 signal, the spatialtemporal sampling efficiency of further GNSS-R missions and the geophysical information included in the data products can be further improved.
Lessons learned and data collected from the previous missions can directly benefit the development of new GNSS-R applications, the definition and optimization of future GNSS-R spaceborne missions and advanced GNSS-R instruments. In addition to the standard data products (mainly the DDM of reflected signal power), most of the previous missions have collected a large amount of raw IF data, which are the bit streams of raw signal samples recorded after the analog-to-digital converters (ADCs) and prior to any onboard digital processing. The unprocessed nature of these raw IF data provides an unique opportunity to fully explore the potential of GNSS-R technique for new geophysical applications and future spaceborne missions. To facilitate such explorations, a dedicated GNSS-R software receiver was developed by the Institute of Space Sciences (IEEC, CSIC), and corresponding data products have been generated from the raw IF datasets collected by different missions. Although some preliminary results have been already obtained with these datasets by ICE-CSIC/IEEC (e.g., [24][25][26][27]) as well as the other teams (e.g., [28][29][30][31][32]), more extensive efforts are still needed by engaging a broader community into such explorations. In this context, we have recently made the data products publicly available through ICE-CSIC/IEEC's public open-data server.
With the aim of encouraging new users and new studies, the paper soughts to explain in an understandably way the raw IF data processing chain, the data products, together with their potential applications. Section 3 continues with a brief description of the raw IF data sets collected by different spaceborne missions, together with the processing chain of the raw IF data. The data products extracted from the raw IF data, i.e., the complex waveform of the reflected signals, are explained in detail in Section 4, including their availability, the data format and structure, as well as the scheme and methods to process these products. In Section 5, a set of applications of the data products, such as inland water detection and surface altimetry, are demonstrated preliminarily. Lastly, the conclusions and future prospects are summarized in Section 6.

A Review of Spaceborne GNSS-R Missions
The evolution of these spaceborne GNSS-R missions are presented in Figure 1, with each of the mission summarized below. Figure 1. The evolution of spaceborne GNSS-R missions. Note: Spire grazing angle GNSS-R measurements have been acquired by upgrading the GNSS receivers' firmware since early 2019, which were launched nominally to collect GNSS radio occultation measurements. DoT-1 satellite started to collect GNSS-R measurement by upgrading its payload firmware on September 2020, i.e., more one year after its launch. UK DMC, UK TDS-1 and UK DOT-1 image credits: SSTL. CYGNSS image credit: University of Michigan. Bufeng-1 image credit: China Academy of Space Technology. Spire satellites image credit: Spire Global, Inc. FY-3E image credit: China Meteorological Administration.

UK-DMC:
In December 2003, the UK-DMC satellite was launched into a sun-synchronous orbit with an orbit altitude of about 680 km. With the aim of exploration of GNSS-R technique from space, SSTL built an experimental GPS reflectometry receiver into the UK-DMC satellite [33]. The experimental receiver consists of a navigational receiver, with modifications to allow recording raw IF signal for short periods of time (∼20 s). More than 100 sets of raw intermediate frequency (IF) signals had been collected by the UK-DMC satellite [34], from which reflected GPS signals have been detected over the ocean, ice and land [13]. The UK-DMC experiment and the data collections played an important role in optimizing the design of SSTL's new version GNSS-R instrument [35], which have been used in the UK TDS-1 and CYGNSS missions.
UK TDS-1: Following the successful detection of reflected signals by UK-DMC, SSTL launched the UK TDS-1 on 8 July 2014. The satellite was placed into a Low Earth Orbit with a nominal altitude of 635 km and an inclination of 98.4 • [14]. The TDS-1 satellite carries a more capable GNSS-R receiver known as the Space GNSS Receiver Remote Sensing Instrument (SGR-ReSI), which can process the reflected GPS signal in real-time to generate the Delay-Doppler Map (DDM) of GPS scattered power [34]. As a technique demonstration satellite, the SGR-ReSI operations alternated between the other different payloads, which only operated two days out of every eight days in orbit. The TDS-1 GNSS-R measurement was operational since September 2014, and delivered a unique dataset of globally distributed spaceborne GNSS-R data, spanning a period of four years. Although mainly used for ocean surface wind measurements [35], the TDS-1 DDM products have been also analyzed for some other remote sensing applications, such as ocean altimetry [36], soil moisture [37,38], sea ice detection and classification [39][40][41][42], ice sheet sensing [43][44][45] and freeze-thaw detection [46,47]. In addition to the standard DDM products, TDS-1 mission also collected the raw IF samples of the direct and reflected signals occasionally, from which new GNSS-R applications have been demonstrated [24,32,48].
CYGNSS: As a NASA's Earth Venture Program mission, CYGNSS mission consists of a constellation of eight microsatellites for innovative research and applications using GNSS-R technique [15]. The CYGNSS satellites were launched on 15 December 2016, and deployed in a common circular low Earth orbit at 35 • inclination and 520 km altitude. An evolutionary version of the SGR-ReSI receiver is embarked on each CYGNSS satellite. The primary objective is the measurement of sea surface wind speed in and near the inner core of tropical cyclones [15]. Similar to TDS-1, the main data products of CYGNSS are the delay-Doppler maps (DDMs) of the forward scattered power and the bistatic radar cross sections, which are stored in the Level 1 data product together with the corresponding metadata. These high density and frequency GNSS-R measurements have made it possible to demonstrate different GNSS-R applications other than ocean surface wind, such as soil moisture measurement (e.g., in [49]), and inland water and flooding detection (e.g., in [50,51]). In addition to standard science data, CYGNSS satellites have been also collected large amount of raw IF data for exploration of new GNSS-R applications (e.g., in [25,30]).
BuFeng-1: The first Chinese GNSS-R satellite mission, known as BuFeng-1 (BF-1) A/B twin satellites, were launched on 5 June 2019 [17]. Towards sea surface wind speed measurement during typhoons and hurricanes, the BF-1 satellites are also placed in a low inclination orbit (45 • ) with a nominal altitude of 579 km. BF-1's GNSS-R instrument is compatible with both reflected GPS signals at L1 band and BeiDou (BDS) signal at B1I band (1561.098 MHz), which makes it possible to acquire more GNSS-R observations within the field-of-view of its science antennas. The raw IF data at GPS L1 and BDS B1I bands have been collected occasionally (∼one raw IF dataset per week per BF-1 satellite) for diagnostic purpose.
Spire GNSS RO Satellites: The Spire satellite constellation currently has over 100 3-Unit CubeSats located in low-Earth orbit between 450 and 600 km altitude. Most of these satellites are dedicated towards collecting GNSS RO measurements [52]. The instrumental and antenna configurations of the Spire GNSS RO satellites make it possible to receive earth reflected signals from GNSS satellites at grazing angle (GA). Since early 2019, the GNSS receivers onboard four Spire Lemur-2 satellites have been updated to be able to collect GA GNSS-R events. During the time period between 9 January 2019 and 11 April 2019, more than 2800 GA GNSS-R events over sea ice and open ocean had been collected. The GNSS-R measurements observed by the Spire satellites include both amplitude and phase information of the reflected signal. In addition, these measurements are collected at dual frequencies (GPS L1 and L2) and associated with precise orbit determination data. These features make the datasets unique for accurate and precise carrier phase altimetry [18]. To develop advanced signal processing methods for GA GNSS-R phase altimetry, a couple of dual frequencies raw IF datasets have been collected over inland water, open ocean and sea ice. Spire currently operates about 30 satellites with the ability to collect GA GNSS reflection data and plans to add more of them to ramp up data collection in the near future [18].
Spire GNSS-R Satellites: With the GNSS RO constellation in orbit, it is a natural progression for Spire to add GNSS-R scatterometer satellites to its constellation. On 11 December 2019, the first batch of two GNSS-R CubeSats were launched for early orbit experience [19]. Each Spire GNSS-R satellite carries an advanced GNSS-R receiver, which is capable of processing 16-24 simultaneous GNSS reflections from GPS, QZSS, Galileo and SBAS satellites. The second batch of Spire GNSS-R satellite were launch in 2021 to provide operational GNSS-R products for different applications, such as soil moisture, ocean wind/waves, sea ice and wetlands/flood inundation [19]. In addition to the operational datasets, a limited number of low-level, raw IF data have been also collected by the GNSS-R satellites.
FSSCat: As the winner of the 2017 Copernicus Master ESA Sentinel Small Satellite Challenge, the FSSCat mission is an innovative concept consisting of two federated 6-Unit Cubesats, called ³Cat-5/A and ³Cat-5/B [16]. The main payload onboard is the Flexible Microwave Payload 2 (FMPL-2), which integrates two instruments in a single platform: a multi-constellation (GPS and Galileo) GNSS-R receiver, and a Total Power Radiometer (TPR) [53]. The FMPL-2 payload is able to produce scientific data over land and sea ice, such as sea ice detection and thickness monitoring, mapping water ponds over ice, and low resolution soil moisture measurement. On 2 September 2020, the FSSCat satellites were launched successfully into a 535 km Sun-synchronous orbit.
DoT-1: After the decommission of the UK DMC and TDS-1 satellites, SSTL has successfully demonstrated GNSS-R from its 18kg DoT-1 satellite since September 2020. The latest GNSS-R payload on-board the DoT-1 satellite is incorporated within the new small form factor Core Avionics module integral to all SSTL's future satellite platforms. This innovation paves the way for any SSTL satellite that can accommodate a nadir pointing antenna to become part of a GNSS-R small-sat constellation [20].
FengYun-3E: On 5 July 2021, China's FengYun-3 E (FY-3E) meteorological satellite was lauched with the GNOS II [54] GNSS remote sensing instrument, which is capable of making measurements of both GNSS RO and GNSS-R. The primary objective of the GNOS II GNSS-R data is to measure the near sea surface wind speed [21]. In addition to the standard DDM product, GNOS II instrument is also scheduled to collect GNSS-R raw IF data sets from specific targets over ocean, land and sea ice.
In addition to above-mentioned GNSS-R missions in orbit, some other spaceborne missions have been proposed or under study to use reflected GNSS signals and other signals of opportunity for different geophysical applications, such as PARIS in-orbit demonstrator mission [55], PRETTY mission [56], Cookie concept [57], ³CAT-2 mission [58], G-TERN mission [22], HydroGNSS mission [23], TRITON mission [59], and SNOOPI mission [60], which are not explained in detail in this paper.

GNSS-R Raw IF Data from Different Missions
In a GNSS-R receiver, the direct and reflected signals are received by the up-and down-looking antennas, and amplified and down-converted to IF (or baseband) in the radio frequency (RF) front-ends. These IF signals are digitized using a set of analog-todigital converters (ADCs) to generate the raw IF data streams, which are the first and most fundamental data available following the antennas (as shown in Figure 2). Due to their high data rate (a few mega samples per second to a few tens of mega samples per second), the raw IF data streams are normally not recorded continuously but processed directly by the digital signal processing unit to derive lower sampling rate (a couple of Hz) measurements (e.g., positioning observations from the direct signal and power DDM of the reflected signal). As the data density is reduced from IF data to the final products, information is irretrievably lost. Fortunately, as mentioned in Section 1, most of the GNSS-R missions are capable of recording and down-linking the raw IF data streams occasionally, which allows the recovery of much more flexible GNSS-R measurements through on-ground post-processing than these generated by the onboard real-time processors. To date, the ICE-CSIC/IEEC team gets access to the raw IF data sets collected by the TDS-1, CYGNSS, BF-1 A/B and SPIRE RO missions, which are either publicly available or obtained with the support of different projects. The main features of these raw IF datasets are summarized in Table 1, which shows that these data sets can complement each other in terms of the geographic coverage, bistatic geometry and frequency band of the received signal. First, the raw IF data collected by the CYGNSS and BuFeng-1 missions are limited within low latitude regions due to their orbit inclinations (i.e., 35 • and 45 • , respectively). While the TDS-1 and Spire's GNSS RO satellites are in polar orbits, which can provide opportunities for the collection and analyses of reflected signals from sea ice and ice sheet at high latitude. Second, Bufeng-1 and SPIRE GNSS RO satellites are capable of receiving direct and reflected signal at a second frequency band other than only GPS L1. GNSS-R measurements at different frequency bands can be potentially combined to improve the performance for different observations, e.g., the ionospheric delay can be corrected in altimetric retrieval by combining delay observations at GPS L1 and L2 frequencies. Finally, dedicated GNSS-R missions, i.e., TDS-1, CYGNSS and BuFeng-1, can only receive reflected signals from near nadir direction with enough antenna gain. In contrast, reflected GNSS signals at low elevation angle can be collected with enough SNR though the antennas facing toward the Earth's limb onboard Spire's GNSS RO satellites, which makes it possible to demonstrate GNSS-R carrier phase altimetry at grazing angle.

Raw IF Data Processing
The raw IF data processing is based on a software-defined radio GNSS receiver extended for remote sensing purposes. The earlier versions of this receiver have been used in processing GNSS-R and GNSS RO raw IF data sets collected in different groundbased and airborne experiments, e.g., [61][62][63]. With the accumulation of the spaceborne GNSS-R raw IF datasets, the software receiver is redesigned to the spaceborne scenarios by updating the computations open-loop tracking parameters. It is worth mentioning that the software receiver is selected as the official Level-0 raw IF data processor for the ESA's Scout-2 HydroGNSS mission, which will be adapted to process the raw IF data collected at both L1/E1 and L5/E5A bands.
The main outputs of the processing are the complex waveforms of the reflected signals, which can be generated following either the conventional GNSS-R processing or the interferometric processing [10]. However, in this paper as well as in the provided data products, only the conventional GNSS-R processing is introduced. For each GNSS satellite, the complex waveforms are computed by cross-correlating the reflected signal with a set of local generated clean replicas as in which t 0 is the start time of the cross-correlation indicating the time label of each complex waveform, τ i is the delay of each waveform lag, t c is the coherent integration time, c(·) is the local generated PRN code series after the navigation bit correction, s r is the reflected signal, Φ r code is the code phase of the local generated PRN code, f r carr and φ r carr are the frequency and phase of the local generated carrier replica, respectively. These local generated code and carrier replicas parameters, known as open-loop (OL) tracking model, are computed corresponding to the reflected signal through the specular point, which are estimated from the direct signal carrier and code parameters together with the bistatic geometry.
In addition to the raw IF samples of the direct and the reflected signals, the inputs of the processing also include the meta data of the mission science products (e.g., the Level 1 products), the GNSS orbit products from the International GNSS Service (IGS), and the earth surface elevation data derived by combining the GMTED2010 digital elevation model (DEM) [64] and the DTU18 mean sea surface (MSS) model [65]. The processing scheme of the reflected signal is presented in Figure 2, in which the main processing procedures, i.e., the direct signal processing and the open-loop model computation are described in detail as follows.

Direct Signal Processing
The direct signal is processed following a generic GNSS signal processing scheme, from which the code and carrier parameters are estimated through a combination of a delay locked loop (DLL) and a phase locked loop (PLL). These parameters are then fitted to 3rd degree polynomial functions to get the time series of the code offset Φ d code (t), the carrier frequency f d carr (t), and the carrier phase φ d carr (t). In addition, the navigation bits or the secondary code D * (t) are also decoded from the outputs of the tracking loops. By further processing of these navigation bits and code offset measurements from different GNSS satellites, the position, velocity and timing (PVT) information of the receiver can be also estimated, which can be used to calibrate and validate the PVT information provided in the meta data. With the direct signal parameters, the complex waveform of the direct signal z d (t 0 , τ i ) are also generated at the same epoch as the reflected signal (t 0 ).

Open-Loop Tracking Model Computation
The main inputs for the bistatic geometry computation are the positions and velocities of the receiver and GNSS transmitter. The position the receiver R(t M ) is extracted from the mission science data product, and the positions of the GNSS satellites T(t M ) are interpolated from the IGS Multi-GNSS Experiment (MGEX) products [66]. The coarse positions of the specular points are computed first by using the World Geodetic System 1984 (WGS84) ellipsoid as the reference surface, and the mean surface elevation along the SP track h elv is interpolated from the combined GMTED-DTU model. Then, an approximated Earth ellipsoid is placed at a height corresponding to the mean surface elevation h elv , and the specular point positions S(t M ) are recomputed on this resized ellipsoid. It is noted that the propagation time of the direct and reflected signals δt d and δt r are also taken into account in the computation of the specular point positions. With the positions of the transmitter, receiver and specular point, the bistatic delay δρ dr , defined as the range delay difference between the direct and reflected signal, can by computed by With the bistatic delay and the direct signal parameters from Section 3.2.1, the code phase (Φ r code ), carrier phase (φ r carr ) and carrier frequency ( f r carr ) of the reflected signals can be predicted by in which δτ dr OL (t) = δρ dr OL (t)/c is the time delay between the direct and the reflected signal fitted to a 3rd degree polynomial function, f RF and f C are the carrier frequency and the chip rate of the transmitted GNSS signal, e.g., f RF = 1575.42 MHz and f C = 1.023 MHz for the GPS L1 C/A code signal.

Remarks for Raw IF Processing
The following aspects are remarkable for the processing of the raw IF data: - The navigation signals transmitted by modernized GNSS satellites, e.g., Galileo E1 B/C and BDS-3 B1C, consist of both data and pilot components. The complex waveform for each signal component can be generated independently by cross-correlating the reflected signal to its PRN code. However, instead of the complex waveform for each signal component, the combined complex waveform are generated by crosscorrelating the reflected signal to the composite PRN codes of these signal components.
For Galileo E1 signal, both E1B and E1C components are in-phase modulated and the composite code c E1 (t) are generated by [67] c in which c E1B (t) and c E1C (t) are the PRN codes for the E1B and E1C components, and D E1B (t) and D E1C (t) are the navigation data bit or the secondary code modulated to the E1B and E1C components. The BDS-3 B1C data and pilot components are modulated in phase and quadrature, and the composite code c B1C (t) are generated by [68] c where c B1CD (t) and c B1CP (t) are the PRN codes for the B1CD and B1CP components, and D B1CD (t) and D B1CP (t) are the navigation databit or the secondary code modulated to the B1CD and B1CP components. Note that the navigation databits and the secondary codes are extracted from the direct signal processing channel. Depending on the power ratio of the data component to the pilot component, the coherent combination can improve the SNR of the reflected signal by 1.25-3 dB with respected to only using the pilot component. -Due to the limitation of the processing time and storage capability, only the complex waveform corresponding to the zero Doppler bin are included in the data products. Nevertheless, the software receiver is also capable of generating the complex DDM with configurable delay and Doppler ranges and resolutions. Moreover, the complex waveforms are generated only with the forward scattering configuration, i.e., corresponding to the signal reflected around the specular point. However, the waveform or DDM can be also generated for the signal scattered from other directions (e.g., backscattering in [48]) or staring at fixed surface regions (e.g., in [34]) by only tuning the open-loop tracking model. - The complex waveform is generated with a fixed coherent integration time of t c = 1 ms. As the complex waveform includes both phase and amplitude information, it is possible to further integrate them coherently with longer coherent integration time. Such configuration can facilitate to characterize the coherence of the reflected signal at different surface conditions and different bistatic geometry (e.g., elevation and azimuth). - The above-mentioned processing scheme is only for the processing of the reflected signal at one single frequency band. For the direct and reflected signal from the second frequency band, e.g., GPS L2 signal collected by SPIRE RO satellites and BDS-3 B1I signal collected by the BuFeng-1 satellites, the same processing scheme is applied by only changing the PRN code and carrier frequency parameters. For the processing of multi-frequency GNSS-R data, the complex waveforms from different frequencies are synchronized by using the same start time in (1) for the cross-correlations.

Currently Available Data Products
CYGNSS is the first mission that released its full raw IF data set, which is publicly available through NASA's Earth Observing System Data and Information System (EOSDIS) Physical Oceanography Distributed Active Archive Center (PO.DAAC). Under the support of HydroGNSS mission's [23] system consolidation study, the ICE-CSIC/IEEC team also got access to the full raw IF datasets collected by the UK TDS-1 mission. In addition, a limited sets of raw IF data from BuFeng-1 mission and Spire GNSS RO satellites have also been provided under the supports of different projects. Currently, these raw IF data sets have been processed following the procedure explained in Section 3.2 for all visible GNSS satellites from GPS, Galileo and BDS-3 constellations. The processing and results are organized by track, which is defined as the GNSS-R measurements obtained from one set of raw IF data and corresponding to one GNSS satellite. To guarantee the quality of the complex waveform products, the tracks with the mean SNR lower than −6 dB are removed.
The data products from CYGNSS and TDS-1 missions are fully available under the Creative Commons (CC) license, while these from Spire GNSS RO satellites and Bufeng-1 mission are available upon request. The geographical distributions of the product tracks for TDS-1 and CYGNSS are presented in Figure 3, with different colors indicating the specular points from different GNSS systems. There are 828 tracks of complex waveform products from the TDS-1 satellite (between 1 September 2014 and 25 March 2019) and 2646 tracks from the CYGNSS constellation (between 25 August 2017 and 20 January 2022). It is noted that the CYGNSS mission is still in operation and continues collecting new Raw IF data sets, from which the complex waveform products will be also updated periodically.

Format and Data Structure
The complex waveform product is stored in netCDF (network Common Data Form) format, for which compatible libraries with several programming languages exist. Each netCDF file is generated by following the processing procedure shown in Section 3.2, which contains the complex waveforms and relevant metadata for the reflected signal from a given GNSS transmitter. The variables are organized by groups in the netCDF file as shown in Figure 4. The root group ("/") includes the basic information of the track, e.g., the raw IF collection, the PRN of the GNSS transmitter. The complex waveforms of the direct and reflected signals are stored in the "cWF" group along the time and waveform lag dimensions. The "cWF" group also includes some intermediate results for the complex waveform generation, such as the direct signal tracking results and the OL tracking model. It is noted that there can be more than one "cWF" groups corresponding to the complex waveform of the reflected signals at different frequency bands. The "MetaData" group mainly includes the positions and velocities of the transmitter, receiver and specular point along the time dimension. Table 2 provides the detailed explanations of some important variables, and how these variables can be linked to the raw IF processing in Section 3.2. Note that the brief explanations for all the variables can be found in the self-describing attributes in the netCDF file.    (2) r_Code_Phase time Code phase of the reflected signal, which is interpolated at t 0 and applied to the local code replica generation Chip Φ d code in (1) and (3) r_Doppler time Carrier Doppler of the reflected signal, which is interpolated at t 0 and applied to the local carrier replica generation Hz f r carr in (1) and (3) r_Phase time Carrier phase of the reflected signal, which is interpolated at t 0 and applied to the local carrier replica generation Cycle Φ d code in (1) and (

Accessing and Processing of the Data Product
The complex waveform products are available at ICE-CSIC/IEEC's public opendata server (https://www.ice.csic.es/research/gold_rtr_mining/index.php, accessed on 4 February 2022) [69], which are organized by folders for different spaceborne missions. To make it easier for users to get started with these data products, a Python script is provided through ICE-CSIC/IEEC's Gitlab repository (https://gitlab.ice.csic.es/earthobservation/spaceborne_gnssr_cwf_processing.git, accessed on 4 February 2022) as an example for processing the complex waveform products. The script consists of a sets of basic functions to properly obtain, search and analyze the complex waveform products, which are presented in Figure 5 and explained in detail below: -Download_cWF_File_List: In addition to the complex waveform netCDF files, the lists of the product tracks are also available at the server. This function is to download the upto-date lists of the complex waveform products, which include the basic information of the raw IF tracks, such as the raw IF data ID, the data collection time, the PRN of the GNSS transmitter, the SNR of the direct and reflected signals, and the geolocation of the specular point. Read_CW_File: Given the filename of the complex waveform file, this function can return two labeled multi-dimensional arrays including all the variables in the "cWF" group and the "MetaData" group, respectively. -ShiftCW and CounterRotateCW: As the OL tracking model (Section 3.2.2) is computed without considering the variation of the surface elevation along the track, it is necessary to realign the complex waveform by recomputing the bistatic delay of the reflected signal following a refined surface elevation model. With the precise estimation of the delay evolution along the track, ShiftCW function is to align the complex waveforms by shifting each of them along the delay axis, and CounterRotateCW function is to counter rotate the phase of the complex waveform by means of a product with a phasor rotated with the corresponding delay difference. It is noted that some other delay correction terms, such as the ionosphere delay and the troposphere delay can be also included in the bistatic delay computation and applied in the complex waveform realignment. -Integration: In the raw IF processing stage (Section 3.2), the direct and reflected signals are integrated coherently for a fixed period of 1 ms. To decrease the impact of thermal and speckle noise, coherent integration (complex sum) and incoherent average (averaging of the squared amplitudes) are applied to the 1-ms complex waveforms of the direct and reflected signals by and in which Z d coh and Z r coh are the direct and reflected complex waveforms with a coherent average time of T c = N c t c , and Z d inc and Z r inc are the power waveforms by incoherently averaging N I complex waveforms (Z d coh or Z r coh ). -GNSSR_Obs: Different GNSS-R observables can be derived from the complex or power waveforms. Currently, the main observables provided with the example function include the SNRs of the direct and reflected signals, the carrier phase of the reflected signal (∆φ r carr ) and the coherence factor of the reflected signal (Γ r ). In addition, the geographic locations, i.e., the latitude and longitude, corresponding to these variables are also interpolated at each observation epoch.

Discussion on Potential Applications of the Data Products
With the complex waveform products and corresponding processing tools, different aspects of the GNSS-R technique can be further investigated, and new applications or approaches can also be envisaged. In this section, we preliminarily demonstrate some potential applications of these complex waveform products.

Spaceborne GNSS-R from Different GNSS Constellation and Frequency Bands
One of the main features of the complex waveform products is their multi-GNSS and multi-frequency feature. Indeed, the development of new GNSS constellations (e.g., Galileo and BDS-3) can significantly increase the number of available sources of signal of opportunity, and thus can improve by folds the spatial-temporal sampling efficiency of spaceborne GNSS-R systems. However, most of the GNSS-R missions can only provide their Level 1 products for GPS L1 C/A signal. The unprocessed nature of the raw IF data makes it possible to extract GNSS-R measurements from all available GNSS signals within the receiver bandwidth. Specifically, the Galileo E1 open service (OS) signal and the BDS-3 B1C signal share the same frequency band with the GPS L1 C/A signal, and thus the corresponding complex waveforms are also generated. In addition, the reflected signals with different carrier frequencies may provide additional geophysical information comparing to these from only the L1/E1/B1C band. Figure 6 shows a set of examples of the power waveforms obtained from different GNSS systems and frequency bands. The upper row corresponds to the power waveforms of reflected GPS L1 C/A, Galileo E1 B/C and BDS B1C signals, using the corresponding publicly available PRN codes. The bottom row corresponds to the power waveforms of reflected GPS L2C (1227.60 MHz) and BDS-3 B1I (1561.098 MHz) signals together with these of reflected L1 C/A and B1C signals as the references. The presented results show that the raw IF data processing software is capable of adequately processing the reflected signals from different GNSS constellations and at different frequency bands. The multi-GNSS and multi-frequency features of the complex data product can facilitate the community in the following explorations: -As the outputs of the raw IF data processing, the prototype receiver processing algorithms (e.g., the combination of these GNSS signals with multiplexing structures) have been applied to the reflected signal samples. Thus, the complex waveforms themselves cannot be used directly for the developments of low-layer signal processing algorithms for multi-constellation, multi-frequency GNSS-R. However these intermediate results (e.g., code phases and carrier frequencies/phases of the reflected signals) provided in the data product can facilitate the users to re-process the raw IF data by using advanced signal processing algorithms even without the direct signal processing and the computation of the open-loop tracking model. -These complex waveforms are integrated coherently for a relative short integration time (i.e., 1 ms), which can be used to assess the optimal coherent integration times for different GNSS-R signals. In fact in [71], by using the complex waveform products, it is clearly shown that the SNR of the reflected signal can be further improved by up to 1.2 dB for Galileo E1 B/C and BDS-3 B1C signal by increasing the coherent integration time to 4 ms. However, it still needs further investigation how the coherent integration times for different GNSS-R signals can be further optimized as the functions of GNSS-R geometry (e.g., elevation and azimuth angles) and reflecting surface (e.g., open ocean, land, sea ice or ice sheet). - The complex waveform products from the other GNSS constellations can be used to assess the performance of these new GNSS signals in different applications (e.g., in [27,71,72]). Such assessments are of great importance for the definition and optimization of mission and instrument specifications for future spaceborne GNSS systems, e.g., in establishing the link budget for reflected signals from different GNSS constellations. Moreover, the combinations of these simultaneous GNSS-R measurements at different frequencies (e.g., as shown in Figure 6d,e) can also be further investigated, which can improve the performances of the GNSS-R retrievals. For example, the combination of the L1 and L2 delay observations (phase delay or group delay) can be used to estimate the ionosphere delay for precise GNSS-R altimetry [18] or ionospheric total electron content measurement [73]. In addition, the combination of their power or amplitude observations at different frequencies can provide a more robust way to estimate the surface bistatic radar cross section or reflectivity, which can be also attempted in future studies.

Surface Characterization Using Coherence of the Reflected Signal
Most of the GNSS-R missions can only provide the power of the reflected signal at low sampling rate (e.g., 1-4 Hz), which corresponds to a spatial sampling resolution of 1.5-7 km. Instead, the complex waveform products can provide both amplitude and phase information of the reflected signal at a higher sampling rate (1000 Hz), which makes it possible to characterize the surface parameters at higher spatial sampling resolutions.
Here we demonstrate the use of high sampling rate complex waveform for surface characterization over around Lake Rousseau in Florida. One track of complex waveform data with the reflected signal from Galileo PRN 31 is used in this demonstration. The corresponding raw IF data set is collected by the CYGNSS FM02 satellite on 13 September 2018. The coherence coefficient of the reflected signal is used as the main observable in this exercise, which can be generated directly with the post-processing software introduced in Section 4.3. The coherence coefficient between two successive complex waveforms is defined as in which z r peak is the peak amplitude of the waveform. Then, the complex coherence coefficient can be further averaged to generate the mean coherent coefficient as in which M is the number of coherent coefficients to generate their mean value. From (8) and (9), it can be seen that the mean coherence coefficient is mainly determined by the phase differences between the successive complex waveforms. For the GNSS signal reflected from smooth surface, it can be assumed that the phase of the complex waveform does not change significantly within a short period of time (e.g., in 2 ms). Thus, the coherent average of the coherent coefficient results in a relative high mean coherent coefficient (close to 1). While for the reflected signal from rough surface, diffuse scattering can be the dominate component, and the phase differences between the successive complex waveforms are more random. In this case, the mean coherent coefficient is close to 0. Figure 7 shows an overlay of of the mean coherent coefficient in a Google Earth historical image, which clearly shows the consistence between high coherent coefficient pixels and the presence of inland water bodies. Note that the number of averaged coherence coefficients in (9) is 20 (i.e., M = 20) in this processing, which corresponds to an along-track resolution of ∼120 m. This resolution can be adapted to characterize flood inundation dynamics at appropriate scales by using different coherent integration times or number of averaged coherence coefficients. It is noted that a similar approach has been attempted using the CYGNSS raw IF data collected over Mississippi River in [27], and this new analysis presented here is just to provide the users a simply example on how to extract observables from the complex waveform for different geophysical application. It is also remarkable that a similar algorithm will be also implemented as the HydroGNSS mission's Level 2 inundation detection processor. However, the impact of dense vegetation and saturated soil on the performance of such detector should be further investigated. Note that CYGNSS mission has made dedicated raw IF data collections over inundation regions as well as tropical wetlands, e.g., Amazon basin, Mississippi watershed, Yucatan Lake, Florida Everglades and White River watershed. The complex waveforms generated from these raw IF data can be used to demonstrate the application of inland water detection, and can also facilitate the development and validation of the HydroGNSS baseline inundation processor. The user can set the target region in the function Find_RawIF_Track to find and download the complex waveforms data files in these regions.

Demonstration of GNSS-R Altimetry
Altimetry using reflected GNSS signal is one of the main potential applications of GNSS-R concept. However, there is still no dedicated GNSS-R altimetry mission in orbit. The publicly available Level 1 datasets from TDS-1 and CYGNSS are generated mainly for ocean scatterometry applications, and there are several limitations to using them for altimetry purpose. For example, the main variable for reconstructing the onboard OL delay in CYGNSS Level 1 data product, i.e., "add_range_to_sp", is truncated to 0.1 chip numeric precision (∼30 m), which can significant increase the total altimetry error and mask the truly achievable altimetry precision.
In our processing of the raw IF data sets, the OL delay model, i.e., δρ dr OL in (2), is provided in full precision in the netCDF files for each complex waveform, which makes it possible to demonstrate GNSS-R altimetry. In fact, such demonstrations have been attempted by using either group delay or carrier phase delay over different types of surface, e.g., over sea ice in [24,32], ice sheet in [43], open ocean in [26,31,74], and lakes in [25].
To facilitate the users' further explorations on GNSS-R altimetry using the complex waveform data products, here we explain briefly how to use the data products to generate the altimetry profiles along the specular point tracks: - The first step is to compute the delay of the reflected signal from the group delay or carrier phase observables. The group delay can be estimated from the incoherently averaged power waveform (i.e., Z r inc in (7)) through waveform retracking, which is to determine the position of the specular point in the waveform window. For the incoherent reflections from rough ocean, the determination of the specular point relies on proper modeling of the reflected waveform as introduced in [26]. On the other hand, for the GNSS signals reflected coherently from smooth surfaces (e.g., lakes or sea ice), the position of the specular point can be assumed to be the peak of the power waveform. Once the position of the specular point is determined in the waveform window, the group delay residual ∆ρ dr obs,gp relative to the OL delay can be computed from the variable "delay_of_bin". The observed bistatic delay of the reflected signal ρ dr obs,gp can be computed through ρ dr obs,gp = δρ dr OL + ∆ρ dr obs,gp For the coherent reflections, the carrier phase residual ∆φ r obs can be computed from the peak of the coherently integrated complex waveform by ∆φ r obs = angle z k peak (11) The bistatic carrier phase delay ρ dr obs,cp can be computed directly from the unwrapped carrier phase residual ∆Φ r obs by ρ dr obs,cp = δρ dr OL + (N L + ∆Φ r obs ) · λ L (12) in which N L is an unknown integer ambiguity and λ L is the wavelength of the carrier. It is noted that the carrier phase residual ∆φ r obs can be noisy and with cycle slips due to multipath or the other fading effects, and some advanced filtering and processing algorithms (e.g., [75]) can be applied to improve the robustness of the phase unwrapping. - The second step is to compute the modeled bistatic delay by considering the different propagation delay corrections by ρ dr model, * = ρ dr geo + ∆ρ dr tropo + ∆ρ dr iono, * + ∆ρ dr ots (13) in which ρ dr model, * is the modeled bistatic delay, the subscript ( * ) indicates the dependence of the delay on different types of measurements (i.e., group delay or phase delay) and carrier frequency (e.g., GPS L1, L2, L5 or their combinations), ρ dr geo is the geometric delay between the direct and reflected signal, ∆ρ dr tropo is the troposphere delay, ∆ρ dr iono, * is the ionosphere delay, and ∆ρ dr ots includes the other delay correction terms, e.g., the antenna baseline correction.

-
The last step is to retrieve the surface height using the measured and modeled bistatic delay by H obs, * = − ρ dr obs, * − ρ dr model, * 2 · cos i (14) in which i is the incidence angle, and H obs, * is the relative surface height above the reference surface. Note that the reference surface is the one used in the computation of the modeled bistatic delay.
An example of GNSS-R altimetry results using the complex waveform product is shown in Figure 8. The complex waveform is generated from the raw IF data collected by TDS-1 on 27 February 2016, when the sea surface is still frozen along the specular point track of GPS PRN 24. Strong coherent reflection has been detected along the first half of the track (red segment in Figure 8 Left). The delay of the reflected signal can be derived from both the group delay and the carrier phase observations, from which the corresponding surface height measurements can be obtained. As shown in Figure 8 (Right), the sea surface height measurements from both group delay and carrier phase follow the evolution of the DTU18 mean sea surface (MSS) model. In addition, the group delay altimetry results show much larger error comparing to the carrier phase ones (0.88 m vs. 0.04 m), which is mainly due to the longer wavelength of the GPS L1 C/A ranging code (∼300 m). It is important to note that due to the receiver orbit error, the carrier phase integer ambiguity in (14) and the residuals of the ionosphere and troposphere corrections, a linear trend has to been removed from the bistatic delay measurements, so that the retrieved sea surface height can better fit the MSS model. In fact, such correction is required for the most of the tracks in the complex waveform products, which should be considered by the users in altimetry exercises. In addition, to demonstrate the surface altimetry application using both the group delay and carrier phase measurements, it is suggested to use the complex waveform tracks over the polar regions from the TDS-1 mission or these over the high altitude lakes, e.g., Lake Qinghai, from the CYGNSS mission.

Conclusions
This paper presents the processing, data products and applications of a type of low level GNSS-R data set, i.e., the raw IF data, collected by different spaceborne GNSS-R missions. The unprocessed nature of these raw IF data provides an unique opportunity to explore the potential of GNSS-R technique for advanced GNSS-R applications and future spaceborne missions. Limited by the amount of available raw IF data collection, it is not possible to demonstrate any operational applications with these raw IF data sets. However, the post-processing of the raw IF data can make complements to the available higher level GNSS-R data (e.g., Level 1 power DDM data). To facilitate such explorations, a dedicated GNSS-R software receiver was developed by the Institute of Space Sciences (IEEC, CSIC), and corresponding data products have been generated from the raw IF datasets collected by different missions.
The main data product of the raw IF data processing, i.e., the complex waveform data, is publicly available for the TDS-1 and CYGNSS missions through ICE-CSIC/IEEC's public open-data server. The format and variables of the data products are introduced in detail. To facilitate the users accessing and processing of the data products, a software tool is also provided and introduced.
With the available complex waveform products and the corresponding processing tools, different aspects of the GNSS-R concept can be investigated. One of the main features of the products is that it provides GNSS-R waveforms from multi-GNSS and at multi-frequency, which can help the community to optimize the processing algorithms and analyse the link budgets of the other GNSS signals than only GPS L1 C/A signal. In addition, the complex data products provide both the amplitude and the phase information of the reflected signal at high sampling resolution (i.e., 1000 Hz), which can provide extra information to characterize the surface coherence properties or even to make high precision surface height measurements. A few examples of possible applications, such as inland water detection and surface altimetry, have been demonstrated in this paper. By making these complex waveform data products publicly available, new geophysical applications of the GNSS-R technique can be further explored by the community. Such explorations are of relevance at the moments, as the next ESA GNSS-R mission, HydroGNSS, will include for the first time ever, operational and continuous generation of these complex observations.
The data products will keep updating with the availability of new spaceborne GNSS-R raw IF data sets. For example, the ESA Scout-2 HydroGNSS mission (expected to be launched in H2 2024) is capable of collecting raw IF data at different frequencies and polarizations. Being selected as the official Level 0 raw IF data processor, ICE-CSIC/IEEC's software receiver will provide the complex waveform products with more advanced features. Moreover, new data products will be also developed by reprocessing the raw IF data with different algorithms or strategies, e.g., the complex or power DDM product with higher delay/Doppler resolution, the GNSS-R observation with stare or focused processing strategies [34], which will be also provided in the same data server.
Funding: This work of Weiqiang Li was supported by the Spanish Ministry of Science, Innovation and Universities through the Ramón y Cajal Program (No. RYC2019-027000-I). This research was partially funded by Spanish National Research Council (20215AT007) and the Spanish Ministry of Science, Innovation and Universities grant number RTI2018-099008-B-C22/AEI/10.13039/501100011033/FEDER, EU. This work was also supported by the Spanish program "Unidad de Excelencia María de Maeztu" (CEX2020-001058-M). The processing of the CYGNSS raw IF data was performed in the frame of the NASA CYGNSS Extended Science Team. The processing of the TDS-1 raw IF data was funded through the ESA Scout-2 HydroGNSS consolidation study (4000129140/19/NL/CT, Subcontract to IEEC MS000397). The processing of the BF-1 raw IF data was supported through the ESA-MOST China Dragon 5 Program (ID. 58070) project.