Global Wind and Wave Climate Based on Two Reanalysis Databases: ECMWF ERA5 and NCEP CFSR

: In the present work, the global wind and wave climate is studied on the basis of two well-known reanalysis products, namely ERA5 and CFSR-W (WW3 hereafter). Several statistical features of the datasets are assessed, such as seasonal variability, quantiles of the probability distribution, monthly, annual and inter-annual variability, and several error metrics. The time span covers a period of 31 years (1979–2009), a fact that assures that most of the long-scale features are equally present in both datasets. The analysis performed is depicted both on a global and regional scale. The results are also assessed by means of a global satellite altimeter dataset.


Introduction
The study of ocean wave climate is of great importance for a large number of applications, including among others: ocean climate change studies, design of ships and offshore/coastal structures, planning of sea operations, wind and wave energy conversion.
There are various sources of wind and wave data, namely (i) visual observations [1], (ii) satellite altimeters [2,3], (iii) in situ buoy measurements, and (iv) numerical models [4,5]. Each one of them have its own advantages and disadvantages. The great advantage of numerical models is their wide coverage on a high time and space resolution in a global scale, making it possible to produce long-term wind and wave climatologies without gaps.
The quality of wave data is heavily dependent on the quality of the wind forcing, improvements on the physical modelling and assimilation techniques among others. At least two meteorological centers (National Centers for Environmental Prediction, NCEP, and European Centre for Medium-Range Weather Forecasts, ECMWF) work constantly for the last few decades towards the improvement of their forecast models. For this, apart from the results of their operational versions, they perform reanalysis studies from time to time to offer homogenized wave products. (Stopa [6], Table 1) summarizes most of the well known wind datasets used throughout the years to generate global wave hindcasts. These wind products led to several global reanalyses of wave datasets over the last few decades. Among them, it is worth noting ERA-15, ERA-40, ERA-Interim from ECMWF [7,8], NCEP/NCAR, and CSFR-W from NCEP [8][9][10]. Other climatologies include HIPOCAS [11], GOW1 and 2 [12,13].
The emergence of these reanalysis databases has been followed by intercomparison studies against each other and/or against other sources of data such as buoy and satellite measurements. For example, Caires et al. [14] have compared six different reanalysis datasets (ERA-40 and NCEP/NCAR among them) against NOAA buoy and TOPEX/Poseidon altimeter datasets. Semedo et al. [15] compared ERA-40 against visual observations and satellite data. Stopa in a series of works has analyzed CSFR-W wave data and their relation to climate indices [16], against ERA-Interim and a number of altimeter data [17], using 12 different wind fields as input and against two different satellite datasets [6]. Finally, [18] have compared the results of HIPOCAS climatology against CSFR-W and ERA-Interim.
In the present work, a global wind and wave climate analysis is performed by intercomparing two reanalysis databases: (a) the CFSR-W [8,10] called for simplicity WW3, and (b) the newly released ERA5 by ECMWF through Copernicus Climate Data Store [19]. The former has been extensively studied by Stopa and coauthors [16,17], whereas the latter is studied here for the first time. Preliminary results, including only results for waves, have also been presented in [20].
In the next section, a brief description of the two datasets is given, as well as a description of the analysis procedure followed. Then, numerical results are presented and commented on, and finally, conclusions are drawn.

Data Used
In the present study, three datasets were used for the intercomparison, covering the entire globe. The two of them were generated by means of third-generation numerical models, namely WAVEWATCH III [21] and WAM [4], and the third one consisted of satellite altimeter measurements merged from several satellite missions. The model data were in regular gridded netcdf format (361 lats × 720 lons = 259,920 datapoints), while the satellite data were stored in tracks.
The first dataset, called WW3 hereafter for simplicity, is a reanalysis generated by means of WAVEWATCH III model [10]. It consists of fields of significant wave height and wind speed covering the entire globe for the period 1979-2009 in 3-h intervals (31 years × 2920 3-h = 90,520 time instances). For a more detailed description of the data and the model setup, one can see here: https://polar.ncep.noaa.gov/waves/hindcasts/ nopp-phase2.php (last accessed: 10 September 2021).
The second dataset, ERA5, is a reanalysis generated by means of WAM model and has recently been released [19]. It also consists of fields of significant wave height and wind speed. Although the data cover the period 1979-present in hourly intervals, only the period 1979-2009 in 3-hourly intervals was taken into account to facilitate the comparison with WW3. For a more detailed description, see: https://www.ecmwf.int/en/forecasts/ datasets/reanalysis-datasets/era5 (last accessed: 10 September 2021).
In addition, and for comparison purposes, satellite altimeter data from the archive of IFREMER were used. The archive contains data from nine altimeter missions, namely ERS-1,2, ENVISAT, TOPEX/Poseidon, Jason-1,2, GEOSAT-FO, Cryosat-2, SARAL, covering the period 1992-2016. Since some of the altimeters (ERS-1,2, ENVISAT, Jason-1, Cryosat-2, SARAL) have already been assimilated in ERA5 [19], they were excluded from the calculations. More detailed information about the missions, as well as about the validation against buoy data and the induced corrections can be found in [22].
Since satellite data have a different time-space data structure, and in order to make possible a comparison between them and the two model datasets, mean monthly values were calculated from all sources for 13 different subregions; see Figure 1. These discrete non-overlapping subregions have been defined by [23], such that the wave conditions within each of them to be qualitatively similar [24].

Statistical Analysis Procedure
The data were available as fields in the form where i runs over the time instances, and j, k over the latitudes and longitudes, respectively. In the sequel, two kind of analysis were performed: (a) field analysis , showing results for the entire field, and (b) averaged analysis, showing results averaged over all (or a subset of) datapoints. In this way, various statistics related to different aspects of the datasets were depicted. First, the time index was reparametrized according to Buys-Ballot triple index [25] in order to properly treat variability at different time scales. Hence, the following triple index (y, m, n) was used. The first component y is the yearly index. The second m = {1, 2, . . . , 12} is the monthly index. The third n = {1, 2, . . . , N m } represents the time within a month, with N m being the number of 3-hourly observations within the m-th month.
The three indices (y, m, n) represent three different time scales, making it possible to explicitly define statistics with respect to each one of them, separately.

Seasonal Analysis
First, the fields of monthly values of mean value and standard deviation were formed Then, the mean monthly values were obtained by averaging Equations (3) and (4) over the years Y: These parameters are also known as seasonal mean value and seasonal standard deviation, depicting the seasonal patterns of the data, and they have been used in a nonstationary time series modelling suitable for metocean and maritime parameters; see [25][26][27].
If, further, one averages over all φ j 's and λ k 's, an averaged picture for the mean monthly values in Equations (5) and (6) is obtained Combining Equations (5) and (6), or, equivalently, Equations (7) and (8), one can calculate the coefficient of variation depicting the Mean Monthly Variability (MMV) of the field (equiv. of the averaged data). Similarly, one can calculate the yearly values of mean value and standard deviation where Then, the Mean Annual Variability (MAV) is obtained as where µ 32 (·, ·) and σ 32 (·, ·) are calculated in a similar way as in Equations (5) and (6). Further, the Inter-Annual Variability (IAV) is defined as where are the mean value and standard deviation of µ 32 (y, ·, ·); see Equation (10). By investigating these parameters, one can look into the year-to-year variability; see, e.g., [16].

Probability Analysis
If the probability function of month m and year y is denoted by then the quantiles of it are given by In the present paper the quantiles of p = 50% (median), 90%, 95%, 99%, and 99.9% were calculated.
The overall probability function, and consequently the associated quantiles, could be obtained by adding the frequencies of each month m and year y. Similarly, in the averaged analysis, the probability was obtained by adding the frequencies of all points together.

Error Analysis
The following statistics were used as error metrics in order to assess the difference between the two reanalysis datasets: Bias, Root-Mean-Square (RMSE), Scatter Index (SI), and Pearson's correlation coefficient (CorrCoeff).
Following the parametrization of the previous sections, the monthly values of these metrics were (1 stands for ERA5, and 2 for WW3 dataset): where X ≡ µ 3 (y, m, ·, ·) is calculated using Equation (3).

Seasonal Analysis
Following the analysis procedure presented in the previous sections, the mean monthly variability was assessed on the basis of the mean monthly values µ 3 (m, ·, ·) and σ 3 (m, ·, ·), calculated by Equations (5) and (6). According to these, there was a zonal distribution of the values with a distinct different behaviour between the Northern and the Southern Hemisphere. Especially, around the equator there was a zone with the least variability in all months. It seemed that the variability of mean values of ERA5 was a bit lesser than the corresponding of WW3 in both hemispheres.
In addition, the seasonal variability on a regional basis was studied by analysing separately the 13 subregions shown in Figure 1. In Figure 2, the mean monthly values µ 3 (m) and σ 3 (m), calculated using all points included in each subregion, are depicted for 8 out of the 13 subregions: two showing the variability in the northern extratropical zone (ETNP, ETNA), four in the tropical zone (TENP, TNAO, TNIO, TSIO) and two in the southern extratropical zone (ETSP, ETSA).
Additionally, and for comparison purposes, the mean monthly values of satellite data are also plotted. Although the satellite measurements did not have the structure of a time series, but rather of a time-space series, it was possible to calculate equivalent mean monthly values µ 3 (m) and σ 3 (m) by considering all points included in each subregion [25]. In addition, this was expected to reduce the error due to undersampling of altimeters, as suggested by [28]. As expected, the general picture showed more pronounced variability in the extratropical areas (ETNA, North Atlantic; ETNP, North Pacific; ETSP, South Pacific; ETSA, South Atlantic), and a lesser one in the tropical zone (TWNP, TNAO, TESP, TSAO). Additionally, in TNIO and TSIO the regional phenomenon of monsoon was observed.
Overall, there was better agreement between ERA5 and WW3, rather than with the satellite data; especially, in the extratropical subregions (ETNP, ETNA, ETSP, ETSA). In the tropical zone, satellite data were also in agreement with ERA5 and WW3 (TWNP, TNAO, TESP, TSAO). Generally, small deviations between satellite and model data could be attributed to the fact that the former were estimated using 18 years (1992-2009), while the latter were estimated usin 31 years (1979-2009).
Further, the mean annual variability was investigated in terms of cv 32 (·, ·); see Figure 3. The agreement between ERA5 and WW3 datasets was good. Especially for the waves, more pronounced variability was shown by both models in the North Pacific and North Atlantic Ocean.   Finally, the inter-annual variability was studied on the basis of cv 32 (·, ·); see Equation (14). In Figure 4, the coefficient of variation is plotted for both datasets with WW3 exhibiting greater variability in most of the areas (for waves), and especially in the Southern Hemisphere (for wind). It is worth noted that, the present WW3 results were in accordance with findings in [16].

Probability Analysis
In this part of the analysis, the monthly empirical cdf's of ERA5 and WW3 were calculated, and then, the quantiles of 50% (median), 90%, 95%, 99%, and 99.9% were chosen to described the behaviour of the empirical distribution. Due to the fact that the distribution was not symmetric, the median described the mean behaviour of the distribution better than the mean value. The other four were used for the description of the tail.
In Figure 5 the overall (averaged over all points) mean monthly values of the quantiles were plotted, giving a bird eye's view of the monthly variability of the distribution. It seems that WW3 distribution had higher values in all quantiles. In addition, ERA5 exhibited greater (month-to-month) variability in the 99.9% quantile. On the average, the differences in each quantile between the two datasets were as follows: (a) significant wave height: x 50% : −0.98%, x 90% : −9.05%, x 95% : −9.21%, x 99% : −6.53%, x 99.9% : −1.18%, (b) wind speed: x 50% : −2.80%, x 90% : −5.73%, x 95% : −6.78%, x 99% : −5.93%, x 99.9% : −4.41%, where minus means that ERA5 had lower values than WW3. Further, it should be noted that some behaviors like, e.g., the low variability of the 99% quantile of wind speed might be attributed to the large spatial differences of this quantile, as we will see further below in Figures 6 and 7, which resulted in evening out these differences now where all datapoints were taken into account for the calculation.
In Figures 6 and 7, the mean annual fields of 50%, 90%, 95%, 99% and 99.9% quantiles of significant wave height and wind speed are depicted for ERA5 and WW3. In general, there better than the mean value again a zonal distribution of the quantiles, and ERA5 seemed to exhibit less variability than WW3 (especially the wave height). Additionally, quantile-values in the Southern Hemisphere seemed to be higher than the ones in the Northern counterpart, except from the two (resp. three) last quantiles for waves (resp. wind). The results were in accordance with [29,30], based on altimetry data, and [16], based on WW3 data.

Error Analysis
Here an assessment of the differences between the two reanalysis datasets was performed on the basis of the error metrics defined in Section 2.2.3. It is reminded that, in the definition of these metrics, 1 stands for ERA5, and 2 for WW3 dataset. Thus, a "minus" sign in the results means lower ERA5 values with respect to WW3 ones.
In Figure 8, the overall (averaged over all points) mean monthly variability of the error measures is given, namely bias, RMSE, SI, CorrCoeff, and NBias, NSTD, CRMSE (left panels: significant wave height, right panels: wind speed). In addition, in Tables 1 and 2, the monthly values for the plots are given.  Similarly, in Figure 9, the overall (averaged over all points) mean annual variability of the same quantities is also plotted (left panels: significant wave height, right panels: wind speed), and in Table 2, the monthly values are given.
Most of them seemed to have no significant variability at all. One might see a decrease in mean annual bias after 1994, which meant better agreement between the two reanalyses after that year. However, this fact was not considered statistically significant, since the values were relatively low. The same held true if one observed the normalized Bias NBias which showed a more stable behavior throughout the years. It is worth noticing that CorrCoeff was near 1 (0.9 for wave and 0.8 for wind), which means that the two datasets were in good correlation.   Further, the overall (averaged over all the years) mean annual variability of the same measures is depicted in Figures 10-12. In these figures, one can observe lower values of ERA5 (negative bias and NBias) in the Northern and the Southern Hemisphere, and higher values (positive bias and NBias) in the swell-dominated tropical zone. The greater differences, in absolute value, of the two datasets were exhibited in the Southern Hemisphere, as shown by RMSE. In addition, the study of NSTD revealed that the variability of WW3 was generally higher than the one exhibited by ERA5, while the reverse situation was exhibited in the areas near the coasts of South America for both wind and waves, and additionally near the coast of Canada for waves.
Additionally, following the spatial distribution of SI, it seemed that wind speed exhibited larger variability than wave height, especially in areas around the Equator in Southeast Asia, west coast of Central America and West Africa.
Further, the two datasets were more correlated in the extratropical zones in both the Northern and the Southern Hemisphere according to the CorrCoeff, with waves showing higher values than winds. This was also justified by CRMSE, which exhibited its lowest values (low relative variability of the differences) in the same areas.
One may argue that there was an inconsistency between the spatial distribution of the values of RMSE and those of CorrCoeff. However, the two error metrics should be seen as complementary, giving only partially overlapping (and not exactly the same) statistical information. According to [31], a more convenient measure to be related with CorrCoeff is the CRMSE, which is normalized with the standard deviation of the data; see Equation (25). Indeed, one can observe that in the areas where CorrCoeff suggested low correlation between the two datasets, CRMSE error exhibited its highest values.    Tables 3 and 4. In most of the regions, bias was negative (satellite data had greater values than ERA5 and WW3), except for regions: TESP (ERA5 greater than satellite), and TSAO (both ERA5 and WW3 greater than satellite). The values of RMSE were pretty close for each area. As already mentioned, CorrCoeff was near 1, which was an indication of the correlation of the datasets. NSTD, which showed the ratio of the standard deviations of the two datasets, varied between 1.17-1.71 for "WW3 vs satellite" (satellite had greater variability 17-71%), and 1.19-1.99 for "ERA5 vs. satellite" (satellite showed even greater variability compared to ERA5).

Concluding Remarks
In the present paper, the newly released ERA5 wind and wave climatology is compared against WW3 climatology, as well as a merged satellite database. The intercomparison covers a period of 31 years (1979-2009), and has been performed using three-hourly wave fields for the entire globe.
The mean monthly values of the datasets are used as cornerstones of the analysis. First, seasonal analysis is performed based on seasonal characteristics of them, showing a very good agreement. Moreover, the mean annual and the inter-annual variability computed are in accordance with findings of other researchers [16].
Then, quantiles of the empirical cumulative distribution are calculated in order to get a picture of the variability of the distribution and of several percentiles of interest; especially in the tail of the distribution.
Finally, several error measures are derived in order to assess the agreement between the datasets.
In addition to the field analysis, data are considered for thirteen non-overlapping subregions following [23], and the above mentioned analysis (seasonal, probability, error) is performed for each one of them.
All in all, the two datasets are in a very good agreement, with WW3 having little greater variability than ERA5.
Some particular comments are as follows. The two datasets are well correlated in the extratropical zones in both the Northern and the Southern Hemisphere. ERA5 has lower values than WW3 in the Northern and the Southern Hemisphere, and higher values in the swell-dominated tropical zone. The variability of WW3 is generally higher than the one exhibited by ERA5, while the reverse situation is exhibited in the areas near the coasts of South America for both wind and waves, and additionally near the coast of Canada for waves. It also seems that wind speed exhibits larger variability than wave height, especially in areas around the Equator in Southeast Asia, west coast of Central America and West Africa. Data Availability Statement: The data used for this paper have been retrieved from the following sites: https://polar.ncep.noaa.gov/waves/hindcasts/nopp-phase2.php, https://cds.climate. copernicus.eu/#!/home, and ftp://ftp.ifremer.fr/ifremer/cersat/products/swath/altimeters/waves/ (last accessed: 10 September 2021).