Intercomparison of Assimilated Coastal Wave Data in the Northwestern Paciﬁc Area

: The assimilated coastal wave data are useful for wave climate study, coastal engineering, and design for marine disaster protection. However, the assimilated coastal wave data are few. Here, wave analysis data produced by the JMA (Japan Meteorological Agency) and ERA5 wave data were compared with GPS (Global Positioning System) buoy-measured wave data. In addition, the accuracy of ERA5 wave data for various conditions was investigated. The accuracy of JMA analysis wave height was better than that of ERA5 wave height. The ERA5 wave height was underestimated as the wave height increased. The accuracy of the ERA5 wave height was signiﬁcantly different in fetch-unlimited and fetch-limited conditions. The difference of the skill metrics between fetch-unlimited and fetch-limited conditions was due to the overestimation of the fetch in the ERA5 grid. This result also applied to the wave period.


Introduction
The study of wave climate is critical to the assessment of the utilization of all types of marine vehicles, ships, drilling, and other civilian uses. The wave data on the grid with high spatial resolution are useful for the study of the wave climate. One of the types of wave data with high spatial resolution is the hindcast wave data, which are the output of the wave model forced by the analyzed wind. There are databases of wave hindcast in various areas. Reguero et al. [1] introduced a wave hindcast dataset with a resolution of 0.5 • covering the world coastline. Shi et al. [2] established a wave hindcast database with a resolution of 1 km along the Chinese coast. Groll and Weisse [3] produced a wave hindcast dataset in the North Sea covering the period 1949-2014. Haakenstad et al. [4] presented wave hindcasts for the Norwegian Sea, the North Sea, and the Barents Sea.
There are studies of wave climate based on hindcast in the Northwestern Pacific area. Shimura and Mori [5] hindcasted and analyzed the wave climate around Japan. Taniguchi [6] verified the wave hindcast in the Japan Sea and predicted the future wave climate of the Japan Sea. Hu et al. [7] simulated wave parameters associated with typhoons in the Northwestern Pacific and investigated the wave climate by typhoons. Hindcast wave data were used to study the wave climate of various regions, including the Indian Ocean [8], North Sea [9], Hawaiian coast [10], Brazilian coast [11], Swedish coast [12], Black Sea [13,14], and Persian Gulf [15].
Assimilated wave data are more useful for studying the wave climate, because the observed wave data are incorporated into the assimilated wave data. Typical assimilated wave data are ERA5 (Fifth generation of European Centre for Medium Range Weather Forecasting atmospheric reanalyses of the global climate), which was updated from ERA-Interim (European Centre for Medium-Range Weather Forecasts reanalysis data interim version) data. The ERA-Interim and ERA5 reanalysis data are global atmospheric and oceanic reanalysis data produced by the European Centre for Medium Range Weather Forecasting (ECMWF). The ERA5 reanalysis data have been improved compared with the ERA-Interim data, especially in spatial and temporal resolution [16]. It was shown that the ERA5 wave height was more accurate than the ERA-Interim wave height by comparing with the independent buoy data [17]. However, the spatial resolution of ERA5 wave data is 0.36 • , which is not enough for coastal wave data.
The Japan Meteorological Agency (JMA) produces wave analysis data by data assimilation in the marginal seas or coastal seas in the Northwestern Pacific area [18]. The hindcast wave spectra are modified using optimal interpolation with observations of significant wave heights from the radar altimeters of satellites, buoys, coastal wave recorders, and ships. The JMA wave analysis data are used as the initial values of the wave forecast of the JMA. The short-term wave forecast is improved by using assimilated data [18]. The spatial resolution is 0.05 • , which is sufficient for coastal wave data in many cases.
Wave analysis data such as JMA and ERA5 data are expected to be used for wave climate research. These data will also provide the basis of the guidelines for designing port structures such as breakwaters, coastal structures such as seawalls, and marine structures such as platforms for offshore oil drilling. In addition, wave analysis data will also be used to select suitable locations for wave power generation [19][20][21]. Therefore, it is necessary to evaluate the accuracy of JMA and ERA5 wave data.
There were many studies for validating ERA-Interim wave data by comparing measured wave data by buoys in the global ocean [22], Atlantic Ocean [23], Indian Ocean [24], and Arabian Sea [25], which showed good agreement. There were also some studies for validating ERA5 wave data by comparing measured wave data by buoys [26,27] or remotely sensed wave data by altimetry [28].
For example, Bruno et al. [26] compared buoy-measured wave parameters with ERA5 wave parameters. However, the buoy was close to the coast in the shallow area, and it was impossible to compare buoy-measured wave parameters with ERA5 wave parameters directly. Sifnioti et al. [27] also compared buoy-measured wave parameters in the shallow area near the coast with ERA5 wave parameters and hindcast wave data. However, the positions of the ERA5 wave data were different from those of the buoys. No studies have been conducted to intercompare ERA5 wave data, other wave assimilation data, and observed data in the coastal area. The objective of this paper is to evaluate the assimilated wave data near the coast by comparing ERA5 wave data and JMA wave analysis data with buoy-measured wave data. The accuracy of ERA5 wave data for various conditions is also investigated. Section 2 describes the JMA wave data, ERA5 wave data, and GPS buoy data. The method of analysis is also described in Section 2. Section 3 presents the result of the comparison of wave heights. The comparison of wave periods is also presented in Section 3. Section 4 discusses the interpretation of the results. The conclusions are summarized in Section 5.

Data and Methods
The JMA wave analysis data are produced from hindcast wave data by the Coastal Wave Model (CWM) and the assimilation. The CWM is based on the MRI (Meteorological Research Institute)-III, which was the third-generation wave model developed by the MRI of JMA [29]. The wave spectra are predicted from the energy balance equation. The parameterization of the source function is described in [30]. The Global Wave Model (GWM) is also operated by the JMA, and the boundary conditions of the CWM are given by the output of the GWM. The wave spectra are corrected based on the significant wave height by using the optimal interpolation (OI) with observations from altimeters, buoys, coastal wave recorders, and ships [30]. The area of the JMA wave analysis data is from 20 • N to 50 • N and from 120 • E to 150 • E. The JMA wave analysis data can be downloaded from http://database.rish.kyoto-u.ac.jp/arch/jmadata/. The time interval of the JMA wave analysis data is 6 h. The JMA significant wave heights (H j ), peak wave periods (T pj ), peak wave directions, and surface winds U j = (u j , v j ) are archived in the datasets.
The ERA5 significant wave heights (H e ), peak wave periods (T pe ), and mean wave periods (T me ) were also used for comparison, where the mean wave period is defined as where f is the wave frequency, and F( f ) is the wave frequency spectrum. The native spatial resolution of ERA5 wave data is 0.36 • , and the Climate Data Store data (https://cds.climate.copernicus.eu/), where it is possible to download ERA5 data, were converted to 0.5 • [17]. The ERA5 winds U e were also used for the analysis. The wave parameters observed by the GPS buoys were used for validation. The significant wave heights of the both JMA wave analysis data and ERA5 wave data were estimated from the wave spectrum as H s ≡ 4M 1 2 0 (Equation (1)). On the other hand, the moored GPS-estimated wave height (H g ) and period (T g ) were estimated by the zero-up-crossing method (H g = H 1 ) from 1024 surface elevations [31], where H 1 3 and T1 3 are significant wave height and period estimated by the zero-up-crossing method, respectively. The value of H s /H 1 3 ranges from 1.01 to 1.07 [32]. The wave height (H g ) and period (T g ) were estimated at 20 min intervals. The GPS wave data can be downloaded from https://nowphas.mlit.go.jp. Figure 1 shows the locations of the GPS buoys. There are 18 buoys (A-R) near the coast of Japan. Most of the GPS buoys are located between 10 km and 20 km from the coast. The water depth at most of the GPS buoy locations is over 100 m. Figure 1 also shows the land grid points at 0.5 • intervals on the ERA5 grid. The distances from coast lines to GPS buoys are not well resolved in the ERA5 grid. The ERA5 wave data are bilinearly interpolated at the buoy position. If at least one of the four ERA5 grids surrounding the GPS buoy position is a land grid, the buoy wave data were not used for comparison. The skill metrics for the comparisons are: where X and Y denote the observed and computed values and . . . denotes averaging. The skill metric R d is the root mean squared difference (RMSD); r c is the correlation coefficient; SI is the scatter index; C rmsd is the normalized centered RMSD (CRMSD); and S sdn is the normalized standard deviation (NSD). The closer R a is to 1, the closer R d is to 0, and the closer r c is to 1, the higher the accuracy of Y. The SI is defined as the ratio of the standard deviation of the differences to the mean observed value. The CRMSD and NSD are normalized by the observations, and it is possible to compare across different data groups. The closer SI is to 0, the closer CRMSD is to 0, and the closer NSD is to 1, the higher the accuracy of Y.
The period of the intercomparison of GPS buoy wave data, ERA5 wave data, and JMA wave data is from 2014 to 2018. The error of wave forecast by the GWM has been small since 2014 [30]. The period of the validation of ERA5 wave data is from 2012 to 2018. Figure 2 shows an example of significant wave heights by ERA5 and JMA. The wave heights at some of the GPS buoy locations cannot be evaluated from the ERA5 wave heights because of the low spatial resolution. The patterns of wave heights are similar to each other. However, there are some differences between them. For example, the JMA wave heights around the buoys A, B, and C are higher than those of the ERA5 wave heights. Figure 3 shows the scatter density plots between H g and H e and between H g and H j . This figure shows the ratio of the number of wave height data in the 0.2 m bins to the total data. For example, the number of (H g , H e ) satisfying 0.8 m ≤ H g < 1 m and 0.8 m ≤ H e < 1 m is 3222; the number of comparisons is N c = 58411; and the ratio is 3222/58411 = 5.5% (Figure 3a). Table 1 summarizes the skill metrics of the comparison of the wave heights for each GPS buoy. The 10 GPS buoys' data were used for the comparison. For the other eight GPS buoys, at least one of the four ERA5 grids that surround them is a land grid.

Comparison of Wave Heights
The skill metrics are , and SI(H g , H e ) > SI(H g , H j ) for most of the buoys, except Buoy K. Thus, as explained in Section 2, the skill metrics for H j are better than those for H e . The accuracy of the JMA wave heights is better than that of ERA5 wave heights.    (2)-(6).

Comparison under Various Conditions
We investigated the accuracy of ERA5 and JMA wave heights with increasing wave height. Figure 4a shows the ratios of mean wave heights (R a ) and the CRMSD (C rmsd ) as a function of H T for H g ≥ H T . Figure 4b (2)). Blue: R a (H g , H j ). Red: C rmsd (H g , H e ) (Equation (6)). Green: C rmsd (H g , H j ). The period of the comparison is from 2014 to 2018. Figure 5 shows the Q-Q (quantile-quantile) plots between H g and H e and between H g and H j . The ERA5 wave heights are underestimated in higher wave conditions, while the JMA wave heights are not underestimated. The development of wind waves depends not only on the wind speed, but also on the fetch length. We investigated the accuracy of ERA5 wave heights in the fetch-unlimited condition and the fetch-limited condition. The fetch-unlimited condition and the fetch-limited condition are classified from the JMA winds (u j , v j ) at the buoy locations. The fetch-limited conditions are u j > 0 for Buoys D, E, F, G, and J and v j < 0 for Buoys K, L, O, P, and Q ( Figure 1). For example, if u j > 0 at the position of Buoy D, the fetch is limited at that time. Figure 6 shows the Taylor diagram between H g and H e for individual buoys in the fetch-unlimited condition and in the fetch-limited condition. The data for validation are from 2012 to 2018. The plots of the fetch-unlimited condition (blue) and of the fetch-limited condition (red) are clustered with each other (Figure 6). The cluster of the fetch-unlimited condition is close to (r c , C rmsd ) = (0.94, 0.39), and the cluster of the fetch-limited condition is close to (r c , C rmsd ) = (0.9, 0.5). The skill indices in fetch-unlimited conditions are better than those in fetch-limited conditions except the NSD.  Table 2 summarizes the skill metrics of ERA5 wave heights in the fetch-limited condition (F) and the fetch-unlimited condition (U) for each GPS buoy. In total, R a (H g , H e , U) < R a (H g , H e , F), r c (H g , H e , U) > r c (H g , H e , F), R d (H g , H e , U) > R d (H g , H e , F), C rmsd (H g , H e , U) < C rmsd (H g , H e , F), and SI(H g , H e , U) > SI(H g , H e , F), where R a (H g , H e , U) indicates R a (Equation (2)) in the fetch-unlimited condition, and R a (H g , H e , U) indicates R a in the fetch-unlimited condition. The same applies to other skill metrics including U and F. Table 2. Comparisons between H g and H e in the fetch unlimited conditions (U) and the fetch-limited conditions (F) for various buoys from 2012 to 2018. The asterisk (*) denotes that the probability that C rmsd (H g , H e , U) < C rmsd (H g , H e , F) (|S sdn (H g , H e , U) − 1| > |S sdn (H g , H e , F) − 1| is less than that in the fetch-limited condition is greater than 95%. The asterisks (**) denote the probability is greater than 90%, but less than 95%. The statistical significance of C rmsd (H g , H e , U) < C rmsd (H g , H e , F), and |S sdn (H g , H e , U) − 1| > |S sdn (H g , H e , F) − 1|, which can be compared across different data groups, is explored. The bootstrap method [33,34] is used for the validation. The effective sample size N e is evaluated by the method in [33,35], which is smaller than the number of comparisons N c (1 ≤ N e ≤ N c ) from the serial data of (H g , H e ). The N e data (H g , H e ) were resampled randomly, and they may be sampled in duplicate. The skill metrics were computed from N e data (H g , H e ). The resamples and computations of the skill metrics were conducted 1000 times, and the probabilities of C rmsd (H g , H e , U) < C rmsd (H g , H e , F), and |S sdn (H g , H e , U) − 1| > |S sdn (H g , H e , F) − 1| were evaluated from 1000 resamples. This method is almost the same as those in [34,36]. The probabilities of C rmsd (H g , H e , U) < C rmsd (H g , H e , F) were greater than 95% in nine buoys out of 10 buoys. The result that the CRMSD in the fetch-unlimited case is smaller than that in the fetch-limited case is statistically significant more than at the 95% confidence level. The probabilities of |S sdn (H g , H e , U) − 1| > |S sdn (H g , H e , F) − 1| were greater than 95% in seven buoys out of 10 buoys. Figure 7 shows the Q-Q plots between H g and H e in the fetch-limited condition and the fetch-unlimited condition. The ERA5 wave heights are underestimated as the wave height increases in both cases. Although the ratio R a (H g , H e ) is larger in the fetch-limited condition than in the fetch-unlimited condition, the ERA5 wave heights in the fetch-limited condition are also underestimated in high wave conditions.

Comparison of Wave Periods
The comparisons of wave periods are summarized in Table 3. The correlations between peak wave periods (T pe and T pj ) and significant wave periods (T g = T1 3 ) from GPS buoys are r c (T g , T pe ) > r c (T g , T pj ). On the other hand, their SIs are SI(T g , T pe ) > SI(T g , T pj ), but the difference is small. Table 3. Comparison of periods (X = T g , Y = T pe , or Y = T pj , or Y = T me ). T: total. U: fetch-unlimited conditions. F: fetch-limited conditions. Other symbols are defined in Equations (2)- (6). The comparison period with T pe , T pj , and T me for the total cases (T) is from 2014 to 2018. The comparison period with T me for the fetch-unlimited (U) and the fetch-limited (F) cases is from 2012 to 2018. The mean period T m−1 is the closest to T1  [37]. The value of R a (T g , T me ) (Equation (2)) is close to one. In total, the ERA5 mean wave periods in the fetch-unlimited conditions are closer to T g than those in the fetch-limited conditions Figure 8 shows the Taylor diagram between T g and T me for individual buoys in the fetch-unlimited condition and in the fetch-limited condition from 2012 to 2018. The plots are scattered compared with Figure 6. In particular, the skill metrics in the fetch-limited conditions are more scattered than those in the fetch-unlimited conditions. The CRMSDs (C rmsd ) in the fetch-unlimited conditions tend to be smaller than those in the fetch-limited condition. The NSD (S sdn ) in the fetch-unlimited conditions is closer to one than those in the fetch-limited condition in most of the buoys. The probabilities of C rmsd (T g , T me , U) < C rmsd (T g , T me , F) and |S sdn (T g , T me , U) − 1| < |S sdn (T g , T me , F) − 1| are also explored by the bootstrap method. The probabilities of C rmsd (T g , T me , U) < C rmsd (T g , T me , F) were more than 90% in four buoys out of 10 buoys. The probabilities of |S sdn (T g , T me , U) − 1| < |S sdn (T g , T me , F) − 1| were more than 95% in seven buoys out of 10 buoys. The accuracy of the ERA5 wave period in the fetch-unlimited condition is better than that in the fetch-limited condition. However, the result is not so robust as wave height, because the skill metrics of wave periods are scattered (Figure 8), while the skill metrics of wave heights are clustered ( Figure 6).

Discussion
The intercomparison of ERA5 wave data, JMA wave assimilation data, and GPS-observed wave data in the coastal area was conducted. The period of the intercomparison was from 2014 to 2018. The accuracy of wave forecast by the GWM of JMA was not good in 2013, because the wind field associated with a typhoon was not reconstructed accurately [30]. In fact, the skill metrics from 2012 to 2018 were r c (H g , H j ) = 0.901, R d (H g , H j ) = 0.366 m, and SI(H g , H j ) = 0.230 for N c = 79613. The skill metrics of the JMA analysis wave height were lower than those in Figure 3b and Table 1.
The mean wave heights were H e < H j in all of the buoy positions (Table 1). Figure 9 shows the mean ERA5 wind speeds and mean JMA wind speeds from 2014 to 2018. Figure 9a and Figure 9b are similar to each other. The wind speeds at the GPS buoy positions range from about 5 m/s to 7 m/s. Figure 10 shows the differences of mean ERA5 wind speeds and mean JMA wind speeds from 2014 to 2018 ( |U j | − |U e | ). The differences of mean wind speeds ( |U j | − |U e | ) at the GPS buoy positions range from about −0.5 m/s to 0.3 m/s The mean wind speeds are |U j | < |U e | at five buoy positions out of 10 buoy positions. The difference of the local wind speeds cannot explain the result that H e < H j in all of the buoy positions.  GPS buoy-measured wave height data were assimilated into JMA wave data. Therefore, the accuracy of JMA wave height was better than that of ERA5. On the other hand, GPS buoy-measured wave period data were not assimilated into JMA wave data. The accuracy of JMA wave period was not better than that of ERA5.
The spatial resolution of the ERA5 wave model was 0.36 • . On the other hand, the JMA model had a spatial resolution of 0.5 • outside the CWM region (the CWM region is from 20 • N to 50 • N and from 120 • E to 150 • E), which was lower than that of the ERA5 wave model. It is presumed that this is because the ERA5 wave model had better reproducibility of the swell propagating from outside the CWM region. As a result, the accuracy of JMA wave period was not better than that of ERA5.
The ERA5 wave height was underestimated in higher wave conditions. The underestimation of ERA-Interim wave height in higher wave conditions was shown in [22]. This underestimation is true for ERA5 wave height.
It was found that the accuracy of ERA5 wave height was significantly different between the fetch-limited and fetch-unlimited conditions. In particular, the correlation of wave heights in fetch-limited conditions was lower than that in fetch-unlimited conditions, although the correlation cannot be compared across different data groups.
On the other hand, The ratio R a = H e / H g and NSD (S sdn (H g , H e )) were closer to one in the fetch-limited conditions than those in the fetch-unlimited conditions. The ERA5 wave heights tended to be underestimated. However, the distance between the buoy position and the coast was overestimated in the ERA5 grid (Figures 1 and 2a). The fetch was overestimated at the buoy locations, and R a was larger. The normalized variability of ERA5 wave height in the fetch-limited condition was larger than that in the fetch-unlimited condition. This was also related to the overestimation of fetch in the ERA5 grid. On the other hand, the Q-Q plots in the fetch-limited condition and the fetch-unlimited condition were similar to each other. Even though the fetch was overestimated, ERA5 wave height was underestimated when the wave height was high (Figure 7).
The statistical significance of the difference of the skill metrics between the fetch-limited conditions and the fetch-unlimited conditions was investigated for each buoy's data. The wave height changes seasonally, and the effective sample sizes N e were smaller than the number of comparisons N c . Therefore, the uncertainties of the skill metrics were large. However, the difference of the skill metrics were statistically significant at more than 90% confidence levels in most of the buoys. The uncertainties of the the skill metrics can be smaller as the data period will be extended.

Conclusions
The conclusions are summarized as follows: • The accuracy of JMA analysis wave height is better than that of ERA5 wave height by incorporating the observation data near the coast.

•
The accuracy of JMA analysis wave period is not better than that of ERA5 wave period.

•
The ERA5 wave height is underestimated as higher wave heights.

•
The accuracy of ERA5 wave height in the fetch-limited conditions is significantly lower than that in the fetch-unlimited conditions. • The accuracy of ERA5 wave period in the fetch-limited conditions is also lower than that in the fetch-unlimited conditions, but this is not so robust as wave height.
From these conclusions, the JMA wave height analysis data can be used as wave climate data around Japan. In addition, if ERA5 wave data were to be used in the various marine development guidelines, ERA5 wave data should be treated separately from the fetch-limited conditions and the fetch-unlimited conditions. Moreover, since ERA5 wave heights tend to be underestimated at higher wave heights, it is necessary to pay attention to this result when using the ERA5 wave height for marine disaster prevention.