Developing and Testing Models for Sea Surface Wind Speed Estimation with GNSS-R Delay Doppler Maps and Delay Waveforms

: This paper focuses on sea surface wind speed estimation based on cyclone global navigation satellite system reﬂectometry (GNSS-R) data. In order to extract useful information from delay-Doppler map (DDM) data, three delay waveforms are presented for wind speed estimation. The delay waveform without Doppler shift is deﬁned as central delay waveform (CDW), and the integral of the delay waveforms with di ﬀ erent Doppler shift values is deﬁned as integral delay waveform (IDW), while the di ﬀ erence between normalized IDW (NIDW) and normalized CDW (NCDW) is deﬁned as di ﬀ erential delay waveform (DDW). We ﬁrst propose a data ﬁltering method based on threshold setting for data quality control. This method can select good-quality DDM data by adjusting the root mean square (RMS) threshold of cleaned DDW. Then, the normalized bistatic radar scattering cross section (NBRCS) and the leading edge slope (LES) of IDW are calculated using clean DDM data. Wind speed estimation models based on NBRCS and LES observations are then developed, respectively, and on this basis, a combination wind speed estimation model based on determination coe ﬃ cient is further proposed. The CYGNSS data and ECMWF reanalysis data collected from 12 May 2020 to 12 August 2020 are used, excluding data collected on land, to evaluate the proposed models. The evaluation results show that the wind speed estimation accuracy of the piecewise function model based on NBRCS is 2.3 m / s in terms of root mean square error (RMSE), while that of the double-parameter and triple-parameter models is 2.6 and 2.7 m / s, respectively. The wind speed estimation accuracy of the double-parameter and triple-parameter models based on LES is 3.3 and 2.5 m / s. The results also demonstrate that the RMSE of the combination method is 2.1 m / s, and the coe ﬃ cient of determination is 0.906, achieving a considerable performance gain compared with the individual NBRCS- and LES-based methods.


Introduction
Sea surface wind speed is a key factor that affects ocean circulation and global climate. In addition, as one of the most serious natural disasters, tropical cyclones greatly damage infrastructure and endanger lives. For these reasons, it is vital to monitor sea wind speed to study and predict some The power of scattered GNSS signals is a 2D function of time delay and doppler. For given time delay τ and doppler frequency shift f, the received signal power of scattered GNSS signals is the integral of the power distribution on the scattering surface. Zavorotny and Voronovich derived a model to describe received signal power of scattered (or reflected) signals [22], which is given by: where 2 Y(τ, f) is the reflected signal power at the receiver, which is a function of time delay τ and Doppler frequency shift f ; i T is the coherent integration time; λ denotes the signal wavelength; t P is the satellite transmitting power; t G is the satellite antenna gain; r G is the receiver antenna gain; t R and r R are distance from satellite transmitter and receiver to specular point of reflection, respectively. Λ(τ) is PRN code auto-correlation function, and sinc(f) is the sin function describing doppler frequency shift induced power reduction. 0 σ refers to the scattering coefficient per unit area of surface A, namely NBRCS, which is related to the roughness of the scattering surface. In this work, NBRCS ( 0 σ ) was used as a surface roughness sensitivity parameter (also known as an observable parameter) to estimate sea surface wind speed. The NBRCS equation is given as follows [34,22,40]: where ℜ is Fresnel reflection coefficient, whose calculation formula is shown in Equations (3)-(6) [22,41]; q is the scattering unit vector; z q is the vertical component (surface normal direction); ⊥ q is the horizontal component of the scattering unit vector;  is the vector from the specular point to the scattered point, and P is the probability density function (PDF). The power of scattered GNSS signals is a 2D function of time delay and doppler. For given time delay τ and doppler frequency shift f, the received signal power of scattered GNSS signals is the integral of the power distribution on the scattering surface. Zavorotny and Voronovich derived a model to describe received signal power of scattered (or reflected) signals [22], which is given by: where Y(τ, f ) 2 is the reflected signal power at the receiver, which is a function of time delay τ and Doppler frequency shift f ; T i is the coherent integration time; λ denotes the signal wavelength; P t is the satellite transmitting power; G t is the satellite antenna gain; G r is the receiver antenna gain; R t and R r are distance from satellite transmitter and receiver to specular point of reflection, respectively. Λ(τ) is PRN code auto-correlation function, and sinc( f ) is the sin function describing doppler frequency shift induced power reduction. σ 0 refers to the scattering coefficient per unit area of surface A, namely NBRCS, which is related to the roughness of the scattering surface. In this work, NBRCS (σ 0 ) was used as a surface roughness sensitivity parameter (also known as an observable parameter) to estimate sea surface wind speed. The NBRCS equation is given as follows [22,34,40]: where is Fresnel reflection coefficient, whose calculation formula is shown in Equations (3)-(6) [22,41]; q is the scattering unit vector; q z is the vertical component (surface normal direction); q ⊥ is the horizontal component of the scattering unit vector; is the vector from the specular point to the scattered point, and P is the probability density function (PDF).
Remote Sens. 2020, 12, 3760 5 of 24 In the above, θ is the angle of incidence, and ε is the complex permittivity. R,L,V, and H represent right-hand circular polarization (RHCP), left-hand circular polarization (LHCP), vertical polarization, and horizontal polarization, respectively. Let the permittivity of sea water be ε Water = 73.42 + 56.07i [42], then we can derive the relation between the modulus of Fresnel reflection coefficient and the elevation angle as shown in Figure 2. It can be seen that the LHCP component of reflected signal increases with the increase in GNSS satellite elevation angle, while its RHCP component decreases with the increase in GNSS satellite elevation angle. The horizontal component of the reflected signal polarization decreases with the increase in the elevation angle of GNSS satellite, while its vertical component increases with the increase in the elevation angle. Note that the above trend only appears when the elevation angle is greater than a certain value (about 6.8 • ). This indicates that after the incident RHCP GNSS satellite signal is scattered over the sea surface, its polarization characteristics will change-that is, the signal dominated by RHCP will change to that dominated by LHCP, and the energy ratio of conversion is relatively large.  [42], then we can derive the relation between the modulus of Fresnel reflection coefficient and the elevation angle as shown in Figure 2. It can be seen that the LHCP component of reflected signal increases with the increase in GNSS satellite elevation angle, while its RHCP component decreases with the increase in GNSS satellite elevation angle. The horizontal component of the reflected signal polarization decreases with the increase in the elevation angle of GNSS satellite, while its vertical component increases with the increase in the elevation angle. Note that the above trend only appears when the elevation angle is greater than a certain value (about 6.8 °). This indicates that after the incident RHCP GNSS satellite signal is scattered over the sea surface, its polarization characteristics will change-that is, the signal dominated by RHCP will change to that dominated by LHCP, and the energy ratio of conversion is relatively large. Spaceborne GNSS-R can be regarded as a bistatic radar scatterometer. In order to illustrate the power of the scattered GNSS signals, a theoretical integral delay waveform (IDW) model is proposed in [22,25], shown as follows: Spaceborne GNSS-R can be regarded as a bistatic radar scatterometer. In order to illustrate the power of the scattered GNSS signals, a theoretical integral delay waveform (IDW) model is proposed in [22,25], shown as follows: Remote Sens. 2020, 12, 3760 6 of 24 where W I (τ) is the IDW, which is obtained by accumulating the time delay waveform corresponding to all doppler frequency shift values. Y incoh (τ, f i ) 2 is the scattered power value minus the noise floor, at the delay and Doppler coordinates τ and f i . For CYGNSS, N τ = 17, N f = 11 and the Doppler bandwidth ∆ f = 1/2T i = 500 Hz. It can be seen from [22,43] that when the width of the glistening zone is smaller than the Doppler bandwidth, the influence of Doppler effect can be ignored. At this point, a central delay waveform (CDW) without Doppler frequency shift can be obtained as follows: where most of the terms are the same as those in Equations (1) and (2), and L 2 is the function of power antenna footprint. When IDW and CDW are normalized, respectively, the normalized multi-delay waveform (NIDW) and the normalized central delay waveform (NCDW) can be obtained, as shown in Equations (9) and (10): where W max I and W max C represent the maximum values of IDW and CDW before normalization, respectively. Differential delay waveform (DDW) can be obtained by differentiating between NIDW and NCDW, shown as follows: In the retrieval of sea surface wind speed, the power values of time-delay 1D correlation function (also called time-delay waveform) are closely related to the physical parameters such as sea surface wind speed and wind direction. Figure 3a-d show the DDM maps under different wind speed conditions, and Figure 3e-g present the corresponding delay waveform. It can be seen that the time delay waveform of reflected signal has different characteristics under different wind speed conditions. In view of this, the time delay waveform is used as the basic observations to retrieve wind speed in our study. See Sections 2.2 and 3.2 below for details.

Data Set
In order to derive sea surface wind speed from DDM and delay waveform, we, respectively, adopt the methods of NBRCS, delay waveform LES, and optimal weight eigenvalue combination to construct GMF. We introduce two data sets, namely CYGNSS L1 DDM data and European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data.
In 2016, the CYGNSS was deployed, which was developed by NASA, the University of Michigan, and the Southwest Research Institute. It consists of 8 microsatellites, with an orbital inclination of 35 • , a height of 510 km, and a spatial resolution of 25 × 25 km [44]. Delay doppler mapping instrument (DDMI) [45] is equipped onboard each CYGNSS satellite, which is specially used for GNSS reflectometry. The CYGNSS products mainly include four levels of products, namely, level 1, level 2, level 3, and level 4. The first three levels of products are open to the public, and now available in NASA's Physical Oceanography Distributed Active Archive Center (PO. DAAC) on (https://podaac.jpl.nasa.gov/) [46], with the data being in NetCDF format. In our wind speed retrieval, we use level 1 V2.1 product, downloaded from the website (https://podaac-tools.jpl.nasa.gov/drive/files/allData/cygnss/L1/v2.1). We downloaded the observation data from 12 May 2020 to 12 August 2020, with day-of-year ranging from 133 to 225.

Data Set
In order to derive sea surface wind speed from DDM and delay waveform, we, respectively, adopt the methods of NBRCS, delay waveform LES, and optimal weight eigenvalue combination to construct GMF. We introduce two data sets, namely CYGNSS L1 DDM data and European Centre for Medium-Range Weather Forecasts (ECMWF) reanalysis data.
In 2016, the CYGNSS was deployed, which was developed by NASA, the University of Michigan, and the Southwest Research Institute. It consists of 8 microsatellites, with an orbital inclination of 35°, a height of 510 km, and a spatial resolution of 25 × 25 km [44]. Delay doppler mapping instrument (DDMI) [45] is equipped onboard each CYGNSS satellite, which is specially used for GNSS reflectometry. The CYGNSS products mainly include four levels of products, namely, level 1, level 2, level 3, and level 4. The first three levels of products are open to the public, and now available in NASA's Physical Oceanography Distributed Active Archive Center (PO. DAAC) on (https://podaac.jpl.nasa.gov/) [46], with the data being in NetCDF format. In our wind speed retrieval, we use level 1 V2.1 product, downloaded from the website (https://podaac-tools.jpl.nasa.gov/drive/files/allData/cygnss/L1/v2.1).
We downloaded the observation data from 12 May 2020 to 12 August 2020, with day-of-year ranging from 133 to 225.
The latest ECMWF data, called ERA5, can be downloaded from Copernicus climate change service (C3S) climate database (https://cds.climate.copernicus.eu/cdsapp#!/home). ERA5 provides hourly information on the atmosphere, land surface, and ocean waves from 1979 to the present. The spatial resolution of wind speed products is 0.5° × 0.5°, and there are two data formats: GRIB and NetCDF. For long-term large-scale observations of sea surface wind speed, the current wind speed product of ECMWF can be used as the ground-truth data in CYGNSS wind speed retrieval. The latest ECMWF data, called ERA5, can be downloaded from Copernicus climate change service (C3S) climate database (https://cds.climate.copernicus.eu/cdsapp#!/home). ERA5 provides hourly information on the atmosphere, land surface, and ocean waves from 1979 to the present. The spatial resolution of wind speed products is 0.5 • × 0.5 • , and there are two data formats: GRIB and NetCDF. For long-term large-scale observations of sea surface wind speed, the current wind speed product of ECMWF can be used as the ground-truth data in CYGNSS wind speed retrieval. ECMWF assimilates meteorological data from different sources and obtains an ERA-5 reanalysis data set containing wind speed of 10 m above the sea surface. Therefore, NBRCS and LES of IDW calculated from CYGNSS DDM data are matched with wind speed of ECMWF reanalysis data, respectively, to develop an empirical geophysical function model.

Methodology
In order to obtain accurate wind speed retrieval results, a series of processing work was performed for the selected data sets. Data preprocessing is necessary to improve the efficiency of subsequent data processing and the reliability of the algorithm. This section mainly deals with data preprocessing, with a focus on data filtering and DDM data alignment. We proposed a data filtering method based on threshold setting for data quality control.
First, the method described in [47,48] was used to reduce the noise in the noisy DDM data. Since the initially denoised DDM data usually contain significant noise, it is necessary to select better Remote Sens. 2020, 12, 3760 8 of 24 quality data from the initially denoised DDM data. The traditional selection method generally uses the SNR peak as the index of data filtering. For example, in [26], the threshold of SNR peak is set to 0 dB, and the DDMs with SNR peak greater than the threshold are selected; in [29,49], the SNR peak threshold is set to 3 and 4 dB, respectively. Although the above SNR peak method can be effectively used for DDM selection, there are some deficiencies in threshold selection. If the selected threshold is too large, good quality data may be lost, while if the selected threshold is too small, noise data cannot be well excluded, resulting in performance degradation. In view of this, in [43], a DDM selection method based on differential time-delay waveform was proposed. The RMSE of 0.5 of differential delay waveform was set as the threshold, which is employed in our data selection method. We use DDM data of CYGNSS L1 level, which is composed of 17 time-delay rows and 11 doppler columns of data. Since the differential data were used, [43] is not completely applicable to the data adopted in this paper, so in the data filtering process, the calculation of IDW has a slight improvement in the processing method, as shown in Equation (12). Figure 4 shows a map of DDM ( Figure 4a) of CYGNSS and its corresponding delay waveforms (NIDW, NCDW, and DDW, Figure 4b). We mainly selected the grids corresponding to the 17 time-delay rows and the 4-8th Doppler-shift columns in the DDM data to be accumulated and calculated to obtain IDW and then normalized. Then, we calculated the difference between NIDW and NCDW to obtain the differential delay waveform (DDW).
In order to obtain accurate wind speed retrieval results, a series of processing work was performed for the selected data sets. Data preprocessing is necessary to improve the efficiency of subsequent data processing and the reliability of the algorithm. This section mainly deals with data preprocessing, with a focus on data filtering and DDM data alignment. We proposed a data filtering method based on threshold setting for data quality control.
First, the method described in [47,48] was used to reduce the noise in the noisy DDM data. Since the initially denoised DDM data usually contain significant noise, it is necessary to select better quality data from the initially denoised DDM data. The traditional selection method generally uses the SNR peak as the index of data filtering. For example, in [26], the threshold of SNR peak is set to 0 dB, and the DDMs with SNR peak greater than the threshold are selected; in [29,49], the SNR peak threshold is set to 3 and 4 dB, respectively. Although the above SNR peak method can be effectively used for DDM selection, there are some deficiencies in threshold selection. If the selected threshold is too large, good quality data may be lost, while if the selected threshold is too small, noise data cannot be well excluded, resulting in performance degradation. In view of this, in [43], a DDM selection method based on differential time-delay waveform was proposed. The RMSE of 0.5 of differential delay waveform was set as the threshold, which is employed in our data selection method. We use DDM data of CYGNSS L1 level, which is composed of 17 time-delay rows and 11 doppler columns of data. Since the differential data were used, [43] is not completely applicable to the data adopted in this paper, so in the data filtering process, the calculation of IDW has a slight improvement in the processing method, as shown in Equation (12). Figure 4 shows a map of DDM ( Figure 4a) of CYGNSS and its corresponding delay waveforms (NIDW, NCDW, and DDW, Figure 4b). We mainly selected the grids corresponding to the 17 time-delay rows and the 4-8th Doppler-shift columns in the DDM data to be accumulated and calculated to obtain IDW and then normalized. Then, we calculated the difference between NIDW and NCDW to obtain the differential delay waveform (DDW).    Figure 5b) and the corresponding delay waveforms. Both DDM data were recorded over the ocean on 12 May 2020. As shown in Figure 5, in the presence of large noise (low SNR = 0.5207 dB), significant variation and dispersion occurred in different time-delay waveform, while in the presence of small noise (high SNR = 2.6943 dB), the variation in the differential waveform was much smaller. Large noise may be caused by the space environment of signal propagation, such as atmospheric attenuation, space signal interference, and propagation loss. Data quality is also affected by incident angle, antenna gain, and satellite attitude. After denoising the DDM data, we calculated the RMS values of DDW, as shown in Equation (13). We selected good-quality DDM data by adjusting the root mean square (RMS) threshold of cleaned DDW.
where RMS DDW represents the RMS of DDW, and DDW p denotes the waveform value corresponding to each time-delay value. Figure 5 shows the two DDMs in the presence of large noise ( Figure 5a) and small noise ( Figure 5b) and the corresponding delay waveforms. Both DDM data were recorded over the ocean on 12 May 2020. As shown in Figure 5, in the presence of large noise (low SNR = 0.5207 dB), significant variation and dispersion occurred in different time-delay waveform, while in the presence of small noise (high SNR = 2.6943 dB), the variation in the differential waveform was much smaller. Large noise may be caused by the space environment of signal propagation, such as atmospheric attenuation, space signal interference, and propagation loss. Data quality is also affected by incident angle, antenna gain, and satellite attitude. After denoising the DDM data, we calculated the RMS values of DDW, as shown in Equation (13). We selected good-quality DDM data by adjusting the root mean square (RMS) threshold of cleaned DDW.
where DDW RMS represents the RMS of DDW, and p DDW denotes the waveform value corresponding to each time-delay value. We selected 4767 DDM data and calculated the RMS of DDW using formula (15) as shown in Figure 6. The results show that when the threshold is set as RMS = 0.25, about 88.15% DDWs can be selected; when the threshold is set as RMS = 0.2, about 92.96% DDWs can be selected; when the threshold is set as RMS = 0.1, about 95.51% DDWs can be selected. Considering both data utilization and data quality, in our study, DDW data with RMS greater than 0.2 were considered to contain large noise and thus excluded. We selected 4767 DDM data and calculated the RMS of DDW using formula (15) as shown in Figure 6. The results show that when the threshold is set as RMS = 0.25, about 88.15% DDWs can be selected; when the threshold is set as RMS = 0.2, about 92.96% DDWs can be selected; when the threshold is set as RMS = 0.1, about 95.51% DDWs can be selected. Considering both data utilization and data quality, in our study, DDW data with RMS greater than 0.2 were considered to contain large noise and thus excluded. Now, it comes to the last step of data preprocessing. We use the alignment algorithm to align DDM data with time delay and Doppler zero values as reference. In this data matching stage, the obtained high-quality DDM data and delay waveforms are used to calculate NBRCS and LES of DDW, respectively. After that, they are matched with ECMWF data, respectively, and the DDM data are randomly divided into training data set and test data set according to the proportion of 70% and 30%, respectively. In the third stage (model development, model selection, and performance evaluation), the model is first developed using training data, selected characteristic parameters (NBRCS, LES), and curve fitting methods (such as nonlinear least square fitting). Then, the proposed empirical model is applied to the test data to obtain the sea surface wind speed estimate and evaluate the performance of different models. The flow chart of wind speed estimation model construction is shown in Figure 7. Now, it comes to the last step of data preprocessing. We use the alignment algorithm to align DDM data with time delay and Doppler zero values as reference. In this data matching stage, the obtained high-quality DDM data and delay waveforms are used to calculate NBRCS and LES of DDW, respectively. After that, they are matched with ECMWF data, respectively, and the DDM data are randomly divided into training data set and test data set according to the proportion of 70% and 30%, respectively. In the third stage (model development, model selection, and performance evaluation), the model is first developed using training data, selected characteristic parameters (NBRCS, LES), and curve fitting methods (such as nonlinear least square fitting). Then, the proposed empirical model is applied to the test data to obtain the sea surface wind speed estimate and evaluate the performance of different models. The flow chart of wind speed estimation model construction is shown in Figure 7.   Now, it comes to the last step of data preprocessing. We use the alignment algorithm to align DDM data with time delay and Doppler zero values as reference. In this data matching stage, the obtained high-quality DDM data and delay waveforms are used to calculate NBRCS and LES of DDW, respectively. After that, they are matched with ECMWF data, respectively, and the DDM data are randomly divided into training data set and test data set according to the proportion of 70% and 30%, respectively. In the third stage (model development, model selection, and performance evaluation), the model is first developed using training data, selected characteristic parameters (NBRCS, LES), and curve fitting methods (such as nonlinear least square fitting). Then, the proposed empirical model is applied to the test data to obtain the sea surface wind speed estimate and evaluate the performance of different models. The flow chart of wind speed estimation model construction is shown in Figure 7.

Wind Speed Estimation Model
In this section, ECMWF reanalysis data from 12 May 2020 to 12 August 2020 are used as training and test data sets to derive wind speed estimation models. The reanalysis data provide a spatially continuous wider range of wind speeds and correspond to a large number of data sets. We developed new empirical models for sea surface wind speed estimation by using NBRCS, LES, and the combination of NBRCS and LES observations, respectively.

Modelling Using NBRCS Methods
We first selected the preprocessed DDM data to calculate NBRCS, and then matched the NBRCS with ECMWF data. We selected 70 % of the data for modeling. However, the spatial, temporal resolution of ECMWF reanalysis data was 0.5 • × 0.5 • , 1 h [50], while the spatial, temporal resolution of NBRCS was about 0.082 • × 0.082 • and 0.5 s. Considering these differences, we used the biharmonic spline interpolation (also called V4 interpolation) to combine them into DDM data with the spatial resolution being 0.1 • × 0.1 • , and the alignment criterion was 1 h.

NBRCS Calculation
In order to calculate NBRCS (σ 0 ), the Equation (1) can be rewritten as follows [29]: where P DDM is the power calibration result of Y(τ, f ) 2 , Y(τ, f ) 2 is derived from Equation (1), and other definitions are consistent with Equation (1). It should be noted that the double integral in Equation (14) is determined only by the geometry. NBRCS has good accuracy only near specular reflection points. Therefore, up to 0.75 delay chips and 2500 Hz Doppler shifts were selected to calculate NBRCS.
Model-1 and Model-2 are derived from the training data in this paper, while the piecewise function in Model-3 is modified on the basis of [26,29]. They are expressed as follows: Model-1: Model-2: Model-3: where f (σ 0 ) is the wind speed estimated by the model; σ 0 is NBRCS; A, B, and C are unknown parameters, which are determined by fitting the training data set. As shown in Figure 8, there is a functional relationship between NBRCS and true wind speed, which can be described by Model-1, Model-2, and Model-3. Using nonlinear least squares fitting method, Model-1 and Model-2 are obtained, which are shown in Figure 8 (left) as dotted lines and solid red lines, respectively. Model-3, constructed by piecewise functions, is shown in Figure 8  This takes into account the fitting effect of high wind speed and low wind speed separately. The fitting parameters of the three models are given in Table 1.
solid red lines, respectively. Model-3, constructed by piecewise functions, is shown in Figure 8  speed. This takes into account the fitting effect of high wind speed and low wind speed separately. The fitting parameters of the three models are given in Table 1.

Modelling Using LES Methods
Using the same process as in Section 3.1, we first selected the preprocessed DDM data to calculate LES and matched LES with ECMWF data. Then, we selected 70 % of the data for modeling, while the remaining 30 % was used to test model performance.

LES Calculation
The least squares fitting is usually used to calculate the slope of multiple points. The calculation is expressed as Equation (18).
where n (n ≥ 2) denotes the number of the selected points used in the slope fitting, and τ i represents the abscissa value of each point used in fitting the leading edge slope, namely the time

Modelling Using LES Methods
Using the same process as in Section 3.1, we first selected the preprocessed DDM data to calculate LES and matched LES with ECMWF data. Then, we selected 70 % of the data for modeling, while the remaining 30 % was used to test model performance.

LES Calculation
The least squares fitting is usually used to calculate the slope of multiple points. The calculation is expressed as Equation (18).
where n (n ≥ 2) denotes the number of the selected points used in the slope fitting, and τ i represents the abscissa value of each point used in fitting the leading edge slope, namely the time delay values. W LES i is the ordinate value, namely the leading edge waveform value. τ and W LES are the average of the abscissa value and that of the ordinate value.
In this paper, LES is defined as the slope of the leading edge of the IDW. In other words, after incoherent accumulation of DDM corresponding to Doppler frequency within a certain range to obtain one-dimensional delay waveform, the least squares fitting is adopted to obtain the optimal slope of the function, which is the leading edge slope of this delay waveform.

Model-1 and Model-2
As shown in Figure 9, the model trained by LES and ECMWF reanalysis data follows the form of power function. We developed the double-parameter Model-1 and triple-parameter Model-2, respectively. When the determination coefficient between the fitted wind speed and the true wind speed is close to 1 and the RMSE is the smallest, the fitting model is optimal. The dashed curve is the best fitting function of Model-1, and the red curve is the best fitting function of Model-2. The model formulas are described as follows: Model-2: where ( ) LES F x is the estimated wind speed; LES x is LES; G, M, and L are undetermined parameters, which are obtained by fitting the training data set.

Modeling Using Determination Coefficient-Based Combination
The solution of fitting parameters is an optimization problem, and we use the minimum mean square error (MSE) criterion to achieve the goal. In other words, taking the comparison data as the ideal truth value, we found that the MSE between the retrieval results and the comparison data is minimized.
Suppose that the estimated value of wind speed is x , and the true value of wind speed is y .
due to the randomness of x , the MSE can be expressed statistically by the following equation: Model-1: Model-2: where F(x LES ) is the estimated wind speed; x LES is LES; G, M, and L are undetermined parameters, which are obtained by fitting the training data set.

Modeling Using Determination Coefficient-Based Combination
The solution of fitting parameters is an optimization problem, and we use the minimum mean square error (MSE) criterion to achieve the goal. In other words, taking the comparison data as the ideal truth value, we found that the MSE between the retrieval results and the comparison data is minimized.
Suppose that the estimated value of wind speed isx, and the true value of wind speed is y. due to the randomness ofx, the MSE can be expressed statistically by the following equation: It can be seen that the MSE consists of variance (σ(x) 2 ) and square of deviation ([E(x) − y] 2 ). Whenx is the unbiased estimation of y, the MSE is reduced to variance, and the uniformly minimum mean square error estimation is the uniformly minimum variance unbiased estimation.
Under the minimum mean square error criterion, we exploit the determination coefficient to measure the actual performance of a model and use it as the optimization index in fitting. The equation is as follows: where R 2 is the determination coefficient, y i , and y i are the wind speed and average wind speed of ECMWF data, respectively. y model,i is the wind speed of the fitting model. Based on the minimum MSE criterion, by calculating the determination coefficient of the constructed model in Sections 3.1 and 3.2, we found that the fitting result of Model-3 was better than that of the other two models (Model-1 and Model-2) when the NBRCS method was used to construct the model. When the LES method was used to construct the model, the fitting result of Model-2 was better than that of Model-1. In view of this, the combination model was established by using the determination coefficient of Model-3 in the NBRCS method and that of Model-2 in LES method. In other words, the wind estimates of NBRCS and LES methods were linearly combined into a best weighted estimator, using determination coefficient. Here, the combination model based on determination coefficient (CMDC) is defined as follows: where U Model is a CMDC-based wind speed estimate, κ 1 and κ 2 are weights, y

Results and Performance Evaluation of Retrieving Wind Speed
This section uses the remaining 30 % test data to verify the performance of the empirical wind speed estimation models deduced above. As shown in Figure 10

Assessment Method
In order to verify the performance of the wind speed estimation model, the estimated wind speed was compared with ERA-5 reanalysis data of ECMWF. RMSE, mean absolute error (MAE), Pearson correlation coefficient, and mean absolute percentage error (MAPE) were used to evaluate the performance of wind speed estimated by the models [51]. Their calculation formulas are as follows: where m is the number of samples, h i is the estimated wind speed, h i is the mean of h i , y i is the wind speed derived from ERA-5 reanalysis data, and y i is the average of y i .

Performance Assessment of wind Speed Estimation Model Based on NBRCS Method with Double Parameters (Model-1), Triple Parameters (Model-2), and Piecewise Function (Model-3)
In this section, we first selected the data pairs for 6 days in the test data set, and then compared the wind speed estimated by the models with the ECMWF reanalysis wind speed. See Figure 12 for statistical results. The wind speed estimation values were shifted by the same amount to avoid overlapping. From Figure 12, we can observe that the accuracy of the estimated wind speed of Model-3 was better than that of Model-1 and Model-2. For example, during the observation period 20,200,710 02:00-02:30, the wind speed estimation accuracy of Model-3 was 0.9 m/s, which is 56.49 % and 57.21% better than that of Model-1 and Model-2, respectively. The estimation accuracy of Model-3 was 1. To sum up, whether from the perspective of a single day and/or a certain period of time wind speed estimation results, or from the general (the remaining 30%) test data of wind speed estimation results, Model-3 achieved the best wind speed estimation performance among the three models, which illustrates that the construction of Model-3 to estimate the wind speed has the best feasibility, applicability, and robustness. However, in the case of high wind speed (the wind speed is greater than 20 m/s), the wind speed estimation errors became larger for the three models generally, but Model-3 had a certain improvement compared with the other two models. The error sources of the model estimation of wind speed mainly include: (1) The time resolutions of CYGNSS and ECMWF reanalysis data are not exactly the same. The matching of some sample instants is based on a close approximation of time, which may lead to differences in wind speed. (2) ERA-5 is reanalysis data absorbing a large amount of historical data from different sources, which have some errors and uncertainty as well [44].  Then, correlation analysis was conducted between the model's estimated wind speed and the ECMWF's ERA-5 reanalysis data. Figure 13 shows the scatter plot of the estimated wind speed of Model-1, Model-2, and Model-3 and the ECMWF wind speed, respectively. RMSE, MAE, R 2 , and MAPE are marked in Figure 12a. The RMSE, MAE, and MAPE of the estimated wind speed using Model-1 were 2.6 m/s, 2.0 m/s, and 18.65 %, respectively, and the coefficient of determination R 2 was 0.836. As shown in Figure 12b, the RMSE, MAE, and MAPE of the estimated wind speed using Model-2 were 2.7 m/s, 2.2 m/s, and 22.42 %, and the coefficient of determination R 2 was 0.876, respectively. In Figure 12c, the RMSE, MAE, and MAPE of the estimated wind speed using Model-3 were 2.3 m/s, 1.9 m/s, and 18.43 %, respectively, and the coefficient of determination R 2 was 0.881.  Colors (from cold to warm) represent data density.

Double-Parameter (Model-1) and Triple-Parameter (Model-2) Performance Assessment Based on LES Observations
As described in Section 4.2, we first selected data pairs for 6 days in the test data set, and then compared the estimated wind speed with ECMWF reanalysis wind speed. The statistics results are presented in Figure 14. The wind speed estimation values were shifted by the same amount to avoid overlapping. We can observe that the performance of the triple-parameter Model (Model-2) constructed in LES method was better than that of the double-parameter Model (Model-1), and Model-2 had good robustness. The average RMSE of Model-1 and Model-2 was 2.6 and 1.9 m/s, respectively, and the wind speed estimation accuracy of Model-2 was about 26.92% higher than that of Model-1. Then, the model developed by the LES method was used to estimate the wind speed and analyze the correlation between ECMWF ERA-5 reanalysis data. Figure 15 shows the scatter plots of Model-1-and Model-2-based wind speed estimates versus ECMWF wind speed, respectively. RMSE, MAE, R 2 , and MAPE are marked in Figure 15. As shown in Figure 15a, the RMSE, MAE, and MAPE of Model-1 estimated wind speed were 3.3, 2.7, and 26.73%, respectively, and the coefficient of determination R 2 was 0.714. As shown in Figure 15b, the RMSE, MAE, and MAPE of Model-2-based wind speed estimation were 2.5, 2.0, and 18.28%, respectively, and the coefficient of determination R 2 was 0.843.
To sum up, the performance of Model-2 was better than Model-1 in wind speed estimation. Different models have a great impact on wind speed estimation accuracy, especially under high wind speed cases (wind speed greater than 20 m/s), where the error of wind speed estimation is To sum up, whether from the perspective of a single day and/or a certain period of time wind speed estimation results, or from the general (the remaining 30%) test data of wind speed estimation results, Model-3 achieved the best wind speed estimation performance among the three models, which illustrates that the construction of Model-3 to estimate the wind speed has the best feasibility, applicability, and robustness. However, in the case of high wind speed (the wind speed is greater than 20 m/s), the wind speed estimation errors became larger for the three models generally, but Model-3 had a certain improvement compared with the other two models. The error sources of the model estimation of wind speed mainly include: (1) The time resolutions of CYGNSS and ECMWF reanalysis data are not exactly the same. The matching of some sample instants is based on a close approximation of time, which may lead to differences in wind speed. (2) ERA-5 is reanalysis data absorbing a large amount of historical data from different sources, which have some errors and uncertainty as well [44].

Double-Parameter (Model-1) and Triple-Parameter (Model-2) Performance Assessment Based on LES Observations
As described in Section 4.2, we first selected data pairs for 6 days in the test data set, and then compared the estimated wind speed with ECMWF reanalysis wind speed. The statistics results are presented in Figure 14. The wind speed estimation values were shifted by the same amount to avoid overlapping. We can observe that the performance of the triple-parameter Model (Model-2) constructed in LES method was better than that of the double-parameter Model (Model-1), and Model-2 had good robustness. The average RMSE of Model-1 and Model-2 was 2.6 and 1.9 m/s, respectively, and the wind speed estimation accuracy of Model-2 was about 26.92% higher than that of Model-1. Then, the model developed by the LES method was used to estimate the wind speed and analyze the correlation between ECMWF ERA-5 reanalysis data. Figure 15 shows the scatter plots of Model-1-and Model-2-based wind speed estimates versus ECMWF wind speed, respectively. RMSE, MAE, R 2 , and MAPE are marked in Figure 15. As shown in Figure 15a, the RMSE, MAE, and MAPE of Model-1 estimated wind speed were 3.3, 2.7, and 26.73%, respectively, and the coefficient of determination R 2 was 0.714. As shown in Figure 15b, the RMSE, MAE, and MAPE of Model-2-based wind speed estimation were 2.5, 2.0, and 18.28%, respectively, and the coefficient of determination R 2 was 0.843.

Performance Assessment of the Combination Model Based on Determination Coefficient (CMDC)
Based on the construction of wind speed estimation model by NBRCS and LES, a CMDC method is proposed to achieve a performance gain as described earlier. As shown in Figure 16, 2000 sampling points were randomly selected in the remaining 30% test data set to compare the wind speed estimated by CMDC method, Model-3 in NBRCS method, and Model-2 in LES method with the ECMWF wind speed. It can be seen that the RMSE value of CMDC method was lower than that of Model-3 in NBRCS method and Model-2 in LES method. Figure 17 shows the scatter plot of CMDC estimation of wind speed and ECMWF wind speed, Table 2 shows the RMSE, MAE, R2, and MAPE of wind speed estimated by NBRCS, LES, and CMDC methods. As shown in Figure 17 and Table 2, from the comparison of wind speed estimated by the CMDC method with ECMWF wind speed for the 30% test data set, the RMSE of the estimated wind speed was 2.1 m/s, and the coefficient of determination R 2 was 0.906, which improved by 8.69% compared with Model-3 in NBRCS method, and by 16.0% compared with that of Model-2 in the LES method.
In order to perform a differentiated analysis for low and high wind speed regimes, Table 3 shows the results of wind speed estimation by the three models against ECMWF reanalysis wind speed data when the wind speed is below and above 15 m/s. We can see from Table 3 that when the wind speed was less than 15 m/s, the RMSE, MAE, and MAPE of the wind speed estimation by the CMDC method were 1.5 m/s, 1.3 m/s, and 9.1%, respectively, compared with NBRCS and LES methods. The accuracy of the wind speed estimation improved by 6.3% and 16.7%, respectively. When the wind speed was greater than 15 m/s, the RMSE, MAE, and MAPE of the CMDC method were 2.3 m/s, 1.6 m/s, and 14.7%, respectively, and the wind speed estimation accuracy improved by 11.5% and 28.1%, respectively. Compared with NBRCS and LES methods, the combined model showed good performance enhancement. The results show that the wind speed estimation accuracy of the three models was higher when the wind speed was lower than 15 m/s, but it was lower when the wind speed was greater than 15 m/s. Despite all this, when the wind speed was greater than 15 m/s, the RMSE, MAE, and MAPE of wind speed estimation by the CMDC method were 2.3 m/s, 1.6 m/s, and 14.7%, respectively. To sum up, the performance of Model-2 was better than Model-1 in wind speed estimation. Different models have a great impact on wind speed estimation accuracy, especially under high wind speed cases (wind speed greater than 20 m/s), where the error of wind speed estimation is larger. However, compared with Model-1, Model-2 showed some improvement, with an accuracy increase of about 24.24%.

Performance Assessment of the Combination Model Based on Determination Coefficient (CMDC)
Based on the construction of wind speed estimation model by NBRCS and LES, a CMDC method is proposed to achieve a performance gain as described earlier. As shown in Figure 16, 2000 sampling points were randomly selected in the remaining 30% test data set to compare the wind speed estimated by CMDC method, Model-3 in NBRCS method, and Model-2 in LES method with the ECMWF wind speed. It can be seen that the RMSE value of CMDC method was lower than that of Model-3 in NBRCS method and Model-2 in LES method. Figure 17 shows the scatter plot of CMDC estimation of wind speed and ECMWF wind speed, Table 2 shows the RMSE, MAE, R2, and MAPE of wind speed estimated by NBRCS, LES, and CMDC methods. As shown in Figure 17 and Table 2, from the comparison of wind speed estimated by the CMDC method with ECMWF wind speed for the 30% test data set, the RMSE of the estimated wind speed was 2.1 m/s, and the coefficient of determination R 2 was 0.906, which improved by 8.69% compared with Model-3 in NBRCS method, and by 16.0% compared with that of Model-2 in the LES method.
In order to perform a differentiated analysis for low and high wind speed regimes, Table 3 shows the results of wind speed estimation by the three models against ECMWF reanalysis wind speed data when the wind speed is below and above 15 m/s. We can see from Table 3 that when the wind speed was less than 15 m/s, the RMSE, MAE, and MAPE of the wind speed estimation by the CMDC method were 1.5 m/s, 1.3 m/s, and 9.1%, respectively, compared with NBRCS and LES methods. The accuracy of the wind speed estimation improved by 6.3% and 16.7%, respectively. When the wind speed was greater than 15 m/s, the RMSE, MAE, and MAPE of the CMDC method were 2.3 m/s, 1.6 m/s, and 14.7%, respectively, and the wind speed estimation accuracy improved by 11.5% and 28.1%, respectively. Compared with NBRCS and LES methods, the combined model showed good performance enhancement. The results show that the wind speed estimation accuracy of the three models was higher when the wind speed was lower than 15 m/s, but it was lower when the wind speed was greater than 15 m/s. Despite all this, when the wind speed was greater than 15 m/s, the RMSE, MAE, and MAPE of wind speed estimation by the CMDC method were 2.3 m/s, 1.6 m/s, and 14.7%, respectively.

Discussion
As mentioned above, in the wind speed estimation based on GNSS-R, if the DDM data with large observation noise are not eliminated, the accuracy of wind speed inversion will be degraded. Therefore, we first proposed a data filtering method based on threshold setting for data quality control. This method can select good-quality DDM data by adjusting the root mean square (RMS) threshold of cleaned DDW. Then, five models were developed using observations, and a fusion-based model was also developed by combining the results of two models. In the model test phase, we compared the wind speed estimations of different models with the reanalysis wind speed data of ECMWF. Wind speed was divided into two regions, lower than 15 m/s and higher than 15 m/s to analyze the impact of low and high wind speed. Compared with NBRCS and LES methods, the CMDC model showed good performance enhancement. However, when the wind speed is higher than 15m/s, the accuracy of wind speed estimation is reduced; this is mainly due to the reduced sensitivity of ocean scattering cross section to the change of heavy wind speed and that the random error in DDM signal is increased. This is in accordance with the results in the literature; that is, the larger the sea surface wind speed, the greater the error of wind speed estimation.

Conclusions and Future Opportunities
This paper focused on empirical modeling for sea surface wind speed estimation using NBRCS and LES observations which are generated by processing GNSS-R delay Doppler map (DDM) data. True wind speed derived from ECMWF reanalysis data and NBRCS and LES observations were employed to train and test the model. Accordingly, three NBRCS-based models (double-parameter, triple-parameter, and piecewise function models) and two LES-based models (double-parameter and triple-parameter models) were generated from linear regression analysis and least-squares fitting. The combination model combines the wind speed estimate based on NBRCS and that based on LES with the weights determined in an optimal way. By testing these models, we observed that among the three NBRCS-based models, the piecewise function model had the smallest RMSE of 2.3 m/s, considerably less than those of the other two models. Among the two LES-based models, the triple-parameter model had an RMSE of 2.5 m/s, which was less than that of the double-parameter model. Based on these results, we finally proposed the combination model based on determination coefficient to improve the wind speed estimation performance by combing the NBRCS and LES observations Through this study, the RMSE of the combination model for wind speed estimation was 2.1 m/s, and the coefficient of determination R 2 was 0.906.
In the future, we will further process GNSS-R signals scattered over the ocean surface to obtain more accurate wind speed estimates. Specifically, it is useful to develop new methods to effectively mitigate the noise and interference, so as to reduce the error in the wind speed inversion model and improve the precision of wind speed retrieval. Moreover, a study on the sea surface wind speed under heavy rainfall is required to improve the NBRCS behavior in the presence of high wind speeds such as hurricane. Finally, it is interesting to study the sea surface wind direction estimation based on GNSS-R observations, which does not attract significant attention.