2.1. Gaofen-3 Wave Mode
The Chinese Gaofen-3 satellite carrying a C-band (5.3 GHz) SAR sensor has been in orbit since August 2016. Gaofen-3 SAR can operate in 12 imaging modes, of which wave mode is dedicated to ocean wave detection. In wave mode, Gaofen-3 SAR collects small SAR images (called imagettes) with an approximate coverage of 5 km × 5 km every 50 km along the flight direction and a nominal spatial resolution of 4 m over the open ocean. It provides quad-polarimetric (HH (horizontal–horizontal)+HV (horizontal–vertical)+VH+VV) capability, and its incidence angle is designed to be capable of switching from 20° to 50° corresponding to 27 radar beams (denoted as ID, ranging from 189 to 216). In this paper, the Level-1A single-look complex (SLC) wave mode imagettes for the years from 2016 to 2020 were collected. The SAR scenes contaminated by non-wave phenomena were rejected, based on the following procedure: (1) The power saturated data were rejected by checking ‘echoSaturation’ value provided in the Gaofen-3 SAR product annotation file; (2) The imagettes contaminated by ice and land/island were excluded; (3) The homogeneity was checked, according to the method proposed by Schulz-Stellenfleth [
26]. The percentage of rejection by the quality controls was approximately 30%, and finally, approximately 11200 Gaofen-3 SAR imagettes were selected in this study (
Figure 1).
Figure 2 displays a typical example of the quad-polarization Gaofen-3 SAR imagettes, which was acquired on 7 February 2017, at 18:17 UTC. The images shown in
Figure 2 were normalized by the min–max method, from which a clear wavy structure can be seen.
In the previous studies (e.g., [
4,
21,
27]), the normalized radar cross-section (NRCS), the normalized image variance (
cvar), and the azimuth cut-off wavelength (
λc) were the three SAR features assumed to be strongly correlated with SWH, and they were most commonly used for SAR SWH inversion. In addition, the radar incidence angle (
θ) was assumed to be an important parameter and has been considered for SAR SWH retrieval in recent studies. Therefore, in this paper, these four features were selected. The ways to extract these features are provided below.
(1) Normalized radar cross-section (NRCS)
The NRCS of the SAR image is typically related to the ocean surface wind, and thus, can represent information on short wave roughness [
25]. The Gaofen-3 NRCS values at HH, HV, VH, and VV polarizations can be obtained by the following formula:
where
pq denotes the polarization state,
σ0 is the NRCS in dB, <
DNpq> denotes the mean value,
DN =
Is × (
qv/32767)
2 denotes the image intensity,
Is =
I2 +
Q2 with
I (
Q) being the value of real (imaginary) channel for the single look complex SAR image,
qv is the maximum qualified value stored in the product annotation file according to the polarizations, and
K is the calibration constant also stored in the product annotation file according to the polarizations. However, only a small portion of the official Gaofen-3 wave mode products provide the quad-polarization
K values. Moreover, there are still some problems with the official radiometric calibration, though great efforts have been made. The comparisons of the Gaofen-3 NRCS values calibrated using the calibration constant of officially released values with those predicted by the empirical geophysical model functions (GMFs) at HH, HV, VH, and VV polarizations are shown in
Figure 3a. The GMF CMOD5.n was used for VV; the combination of CMOD5.n and the VV-HH polarization ratio (PR) model proposed in Zhang et al. [
27] was used for HH; and the C-3PO developed in Zhang et al. [
28] was used for HV and VH. As seen, the calibrated NRCSs by the calibration constant of official released values significantly deviated from the GMF predictions with an RMSE up to ~4 dB, even in the best performing case of VV polarization. That is to say, extra calibration consideration and activity are needed to improve the accuracy of the Gaofen-3 SAR wave mode products.
This paper performs an ocean recalibration for the quad-polarization Gaofen-3 wave mode imagettes, based on the GMFs of CMOD5.n, CMOD5.n+PR, and C-3PO. The recalibration dataset was obtained by interpolating the 10-m height ocean winds from ERA5 at a 0.25° spatial and a 1-hour temporal resolution into the acquisition times and center locations of Gaofen-3 wave mode imagettes. For every imagette, the mean
DN values at HH, HV, VH, and VV polarizations were computed by averaging all
DN values within the corresponding 5 km × 5 km Gaofen-3 SAR images. The corresponding GMF-based NRCS values were computed using the collocated ERA5 winds. By correspondingly subtracting the GMF-predicted NRCS values (in dB) from the mean Gaofen-3 measured
DN values (in dB), the new quad-polarization calibration constants were obtained for every imagette. Then, finally, 24 groups of quad-polarization recalibration constants were determined by averaging these new quad-polarization calibration constants within each Gaofen-3 radar beam (here, it was 24, not 27, since there were no imagettes in three radar beams with IDs of 192, 194, and 196 in the collected Gaofen-3 wave mode dataset). The values of the quad-polarization recalibration constants of the 24 Gaofen-3 radar beams are provided in
Table A1 of
Appendix A.
Figure 3b shows the comparisons of the Gaofen-3 NRCS values calibrated using the ocean recalibration procedure with the GMF predictions at HH, HV, VH, and VV polarizations. It can be seen that the recalibrated NRCSs show good agreement with the GMF predictions.
(2) Normalized image variance
The normalized image variance (
cvar) contains information on the sea state of longer waves. It is defined as the variance of the Gaofen-3 image normalized by the mean intensity:
where <
DNpq> is the mean intensity of the
pq polarization Gaofen-3 image in linear unit. In this study, the normalized variances for HH-, HV-, VH-, and VV-polarized wave mode images were considered.
(3) Azimuth cutoff
In the azimuth direction, SAR image processing relies on the backscattered signal phase analysis assuming a homogeneous and frozen surface to achieve high resolution. Over the ocean, according to the SAR-ocean imaging mechanism of velocity bunching, the surface wave motions may distort the phase history of the backscattered signal, leading to nonlinear transformation between the local wave and the SAR image. As a result, the small wave components propagating near the azimuth direction may be blurred. This leads to a cutoff value, where waves with wavelengths below the cutoff cannot be resolved by SAR. Using linear wave theory, the azimuth cutoff (
λc), in meters, can be written as:
where
F is the wave spectrum,
f is the wave frequency,
ω = 2
πf denotes the angular frequency, and
β =
R/
V, with
R being the satellite slant range and
V being the satellite velocity. The magnitude of the spectral integration is directly related to the sea state conditions [
23]. Therefore, the azimuth cutoff, normalized by the ratio of
β, was chosen as another input parameter for our models. The azimuth cutoff can be estimated by fitting a Gaussian function to the inter-correlation of SAR cross-spectrum (real part) [
29]. The Gaussian fit function
C is stated as follows:
where
x denotes the spatial distance in the azimuth direction.
Figure 4 shows estimation of azimuth cutoff from the imagette shown in
Figure 2 at polarization channels of HH, HV, VH, and VV. As seen, the values of the azimuth cutoff obtained from SAR images at different polarizations were different. The HV and VH cutoffs were larger than the HH and VV estimates. This is probably attributed to the fact that the smearing effects of cross-polarization SAR were larger for shorter coherence times [
30]. The HH cutoff was slightly larger than the VV estimate. This may have been related to the larger HH-polarization modulation transfer function [
30]. The azimuth cutoff estimates under HH, HV, VH, and VV configurations were considered in our models.
(4) Incidence angle
The incidence angle (
θ) is an important parameter that should be considered when building empirical models for SAR SWH retrieval (e.g., [
4]). Unlike the wave mode imagettes from European SAR satellites involving only one or two specific incidence angles, the incidence angle of Gaofen-3 wave mode could be switched from 20° to 50°.
Figure 5 shows the histogram of incidence angles in 1° bin for Gaofen-3 wave mode data used in this study. As seen, the incidence angles were mostly distributed around 36° and 40°. Inspired by Wang H. et al. [
24], and considering the amount of data, we categorized the Gaofen-3 wave mode data into five groups, with respect to incidence angle, called WV01 for 20–33°, WV02 for 33–37°, WV03 for 37–42°, WV04 for 42–46°, and WV05 for 46–50°. Details are listed in
Table 1. The incidence angle range of WV01 was set so wide because of the small amount of data. The incidence angle was considered in two ways: first, it was included as an independent variable in the models, and second, the models were separately built at each incidence angle bin.
2.2. Buoy, Altimeter, and ERA5 SWH Data
The SWH observations from the standard meteorological data of the 61 moored buoys in the waters around the U.S., operated by the National Data Buoy Center (NDBC), were collected in this paper. All the buoys were located in the waters more than 50 km away from land and over 150 m deep in depth. The quality of the NDBC SWH observations was very high, with an accuracy of approximately 0.2 m [
31]. The NDBC SWH observations were used as an independent data source to validate the derived SWH from the models. Besides, they were also used to assess the quality of the altimeter and ERA5 SWH data.
The SWH observations from Jason-3 altimetry mission were selected as an additional data source for the independent verification. The Jason-3 satellite was launched in January 2016 and carries a dual frequency (Ku- and C-bands) radar altimeter. The geophysical data records distributed by the Archiving, Validation, and Interpretation of Satellite Oceanographic Data (AVISO) for the period 2016–2020 were collected, and the SWHs retracted from Ku-band data were selected here. The Jason-3 SWH observations were recognized as being of good quality. The comparison between Jason-3 SWH and buoy SWH with correlation coefficient (Corr), root mean square error (RMSE), mean bias (Bias), and scattering index (SI) are shown in
Figure 6a. As can be seen, the Jason-3 SWH observations were rather consistent with the buoy ones, with RMSE being about 0.252 m.
ERA5 is the fifth generation ECMWF atmospheric reanalysis for the global climate and weather [
32]. It combines as many observations as possible into model estimates using advanced modeling techniques and latest data assimilation systems, and it provides new best estimates of the state of the atmosphere, ocean waves, and land surface. Compared to its predecessor, ERA-Interim, ERA5 has an improved temporal resolution as 6-hour in ERA-Interim to hourly in ERA5. The ERA5 hourly data on single levels published so far cover the period from 1 January 1950 to near real time. This dataset provides estimates for a number of ocean-wave variables at a regular lat-lon grid of 0.5 degrees, in which the significant height of combined wind waves and swell, i.e., SWH, are focused here. The accuracy of the ERA5 SWH was quantitatively assessed by comparing with the buoy observations.
Figure 6b shows the comparison of ERA5 SWH with buoy SWH. As can be seen, the ERA5 SWH estimates were well-consistent with the corresponding buoy SWH observations, with RMSE being about 0.243 m.
The Gaofen-3 SAR imagettes were collocated, respectively, with the NDBC buoy SWH observations and the Jason-3 altimeter SWH observations using the criteria of time separation within 1 h and spatial separation less than 100 km. This procedure yielded only 43 SAR-buoy matching points, of which, the buoy SWHs were mainly distributed 2–3 m. The collocation with Jason-3 yielded 215 points, of which, the Jason-3 SWHs were between 1–7 m. Each Gaofen-3 imagette was collocated with the time/space interpolated SWH from ERA5, yielding approximately 11,200 matched up cells, and the collocated ERA5 SWHs roughly ranged from 0.3 to 8 m. The collocations of Gaofen-3 SAR wave mode imagettes and ERA5 data were used to maximize the samples, since the collocations of SAR-buoy and SAR-altimeter were not sufficient for the model training. That is, the SAR-ERA5 data were used for the training of the PolR and GPR models. The SAR-buoy and SAR-altimeter data were never seen by the models when tuning to ensure an independent verification. The SAR-ERA5 data were randomly divided into two subsections for training (70% of the data) and for testing (30% of the data), both for the development of the models. The training set tuned the parameters of the PolR and GPR models, while the validation set cross-validated and determined the parameters. The effects of polarization and incidence angle on the models for estimating SWH from Gaofen-3 wave mode data were analyzed based the SAR-ERA5 data, as well.
2.3. PolR and GPR Models
The polynomial regression (PolR) model and the Gaussian process regression (GPR) model were adopted in this study for the multi-incidence angle polarimetric Gaofen-3 SAR SWH retrieval. The PolR model uses the basic formulation of the CWAVE model as:
where
Hs is the SWH,
si represents the SAR-based parameters, and
ai, j (
i ≤
j ≤
n) represents the tuned coefficients. The PolR model states that the SWH is expressed as linear combinations of the SAR-derived parameters (
s1, …,
sn) with the extended coefficient vector (
a0, …,
an,
a11, …,
ann) in a dimension of 0.5 (
n2 + 3
n + 2). The second-order terms in the model function reflect the nonlinear combinations among the SAR image parameters. The derivation of the PolR model was based on the collocated Gaofen-3 SAR wave mode imagettes and ERA5 SWH data, using a least squares minimization procedure.
The GPR is a machine learning model with strong adaptability and good generalization ability for dealing with high dimensional nonlinear data. It is a flexible nonparametric Bayesian approach, using nonlinear mapping to relate the output to the input [
33]. The salient feature of GPR is that it directly defines a prior probability over a latent function. The functional relationship of GPR is typically expressed in the form:
where
y is the model output,
X is the model input,
ε is the independent identically distributed Gaussian noise with zero mean and constant variance, and
f (
X) is a Gaussian process that can be specified by its mean (which is taken to be zero) and covariance matrix
K. The elements of
K can be computed by using a kernel function. Several kernel functions were evaluated here, and it was found that the anisotropic exponential kernel was the most suitable. This exponential kernel function can be expressed as:
where
k(
xi,
xj) is the (
i,
j) element of covariance matrix
K,
xi and
xj are the
ith and
jth input parameters, and
θ1 and
θ2 represent hyper-parameters that should be optimized. In this work, the hyper-parameters of kernel function were estimated based on minimization of the negative log marginalized likelihood (NLML) [
34]. To optimize the NLML, the quasi-newton optimization method was employed. The extracted features from the polarimetric Gaofen-3 SAR images were used as the input, and the ERA5 SWH was used as the training output. The inputs were transformed into the standardized values, so that the mean was 0 and the standard deviation was 1. Of particular note is that the GPR model does not need to include squared terms and cross-terms as input because it can model the nonlinear interactions between the input independent variables.