Numerical Simulation of Wind Wave Using Ensemble Forecast Wave Model: A Case Study of Typhoon Lingling

: A wave forecast numerical simulation was performed for Typhoon Lingling around the Korean Peninsula and in the East Asia region using sea winds from 24 members produced by the Ensemble Prediction System for Global (EPSG) of Korea Meteorological Administration (KMA). Signiﬁcant wave height was observed by the ocean data buoys used to verify data of the ensemble wave model, and the results of the ensemble members were analyzed through probability veriﬁcation. The forecast performance for the signiﬁcant wave height improved by approximately 18% in the root mean square error in the three-day lead time compared to that of the deterministic model, and the difference in performance was particularly distinct towards mid-to-late lead times. The ensemble spread was relatively appropriate, even in the longer lead time, and each ensemble model runs were all stable. As a result of the probability veriﬁcation, information on the uncertainty that could not be provided in the deterministic model could be obtained. It was found that all the Relative Operating Characteristic (ROC) curves were 0.9 or above, demonstrating good predictive performance, and the ensemble wave model is expected to be useful in identifying and determining hazardous weather conditions.


Introduction
The ensemble prediction technique is widely used to compensate for the limitations of deterministic prediction of a deterministic forecast model, which has prediction errors due to the initial conditions of the numerical model and the uncertainty of the prediction model [1][2][3]. This technique performs probabilistic prediction by considering the possibilities for various initial conditions, physical processes, and boundary conditions and includes both the prediction information and information on forecast uncertainty provided by the conventional deterministic forecast model [4][5][6]. Therefore, it has better predictive performance than the deterministic forecast model and is extremely useful for determining various hazardous weather conditions, since it is based on possible marine weather scenarios. Recently, studies to improve the prediction accuracy of ensemble models using machine learning, etc., have been conducted [7,8]. However, the economic aspects and computational efficiency need to be considered when using the ensemble model, and a high-performance supercomputer capable of computing a large amount of data is required. In addition, the prediction results may have a large deviation, depending on the numerical model, since there is a large amount of prediction data, and it may be difficult to obtain the probability analysis data. However, it is a prediction technique that is extremely useful, as it can identify the uncertainty in prediction and information on various hazardous weather conditions. The wave forecast model to which the ensemble technique is applied provides prediction information through a field operation and is used for probabilistic wave forecast research for hazardous weather [9][10][11][12]. In the forecast model, although prediction uncertainty is affected by physical processes, initial conditions, and boundary conditions, it is known that the wave model results are directly affected by the prediction accuracy of sea surface winds by atmospheric models the most [13][14][15]. Although the result of the wave forecast model differs depending on the accuracy of the input sea surface wind, the prediction uncertainty cannot be determined by the deterministic forecast model, even though there is uncertainty in predicting the actual event, since the information is limited. On the other hand, the ensemble model can identify the uncertainty of prediction, as various sea surface wind scenarios are composed of each ensemble member, and a prediction model is performed [16].
Verification indices of the probabilistic forecast performance of the ensemble model include the Brier score (BS), Brier skill score (BSS), reliability diagram, and Relative Operating Characteristic (ROC), which are verification indices that can be generally used to determine the accuracy of the probabilistic forecast and uncertainty in prediction [17][18][19]. In addition, the ensemble spread is the standard deviation of each member of the ensemble mean. It is a representative verification index to diagnose the prediction results of the ensemble model [20][21][22]. This is an important diagnostic tool that can determine the accuracy of the prediction results of each member and the performance of numerical simulations for an actual event, which is used to verify the probabilistic prediction of the ensemble model.
In this study, numerical simulation of wave prediction was performed on Typhoon Lingling, which caused great damage to the Korean Peninsula in September 2019, using the sea surface wind data produced by Ensemble Prediction System for Global (EPSG) of Korea Meteorological Administration (KMA) based on the ensemble prediction technique, and the third-generation wave model WAVEWATCH III was identified by applying the probabilistic prediction verification method. In addition, the prediction accuracy of the ensemble model for rapidly changing marine weather due to hazardous weather was evaluated by comparing it with the prediction results of the deterministic forecast model, and the prediction uncertainty was analyzed.

Ensemble Wave Model Setup
The Northeast Asian region with the latitude of 20-50 • N and longitude of 115-150 • E is the computational domain of the ensemble wave model as the regional wave model operated by KMA. The bathymetry data was established, as shown in Figure 1, based on ETOPO1 and the global self-consistent hierarchical high-resolution shoreline (GSHHS) coastline data provided by the National Geophysical Data Center (NGDC). The spherical coordinate system was used as the coordinate system of the model, and the resolution of the model is approximately 8 km at 1/12 • .
The best track of Typhoon Lingling provided through the Joint Typhoon Warning Center (JTWC), is shown in Figure 1  The ensemble wave forecast model was applied with the wave energy physical package proposed by Ardhuin et al. [23] based on the WAVEWATCH III developed by the U.S. National Weather Service, and a version with improved accuracy through the optimization process for the physical variables of the wave model was used [24][25][26]. For the wind forcing, 10 m high sea wind data of a total of 24 ensemble members produced by EPSG of KMA was used. The prediction results of the model performed 12 h in advance by each ensemble member were used for the initial field, and the boundary data for the same computational domain of the regional wave model, a deterministic forecast model, was used for the boundary field. The ensemble wave model was performed twice a day (00 UTC, 12 UTC), which predicted up to 120 h from the start time of the model. In this study, the analysis was performed by setting the lead time for predicting performance verification up to 120 h at 3-h intervals (Table 1). The overall operation diagram of the EPSG and ensemble wave forecast system is as shown in Figure 2, and the operation of the EPSG produced the sea surface wind of 11 members predicted 6 h in advance and the sea surface wind of 13 members predicted while the model was executed, including the control member by introducing the time-lag technique to prevent overloading of computing resources. The prediction was performed up to 240 h at 3-h intervals by using the produced sea wind prediction data of a total of 24 members as input data for the wave model.

Numerical Method
In this study, the ensemble wave model was used to perform numerical simulations for the Typhoon Lingling that passed near the Korean Peninsula in September 2019. The model was executed using a total of 24 members, including control members at 00 UTC and 12 UTC, and the analysis period was from 00 UTC on 1 September 2019, to 12 UTC on 9 September 2019. To verify the predictive performance of the model, the predicted results of up to 120 h from the time the wave model was executed at 3-h intervals were used, and the prediction results were classified and analyzed by the lead time based on the time the model was executed.
The significant wave height data observed at 17 ocean data buoys around the Korean Peninsula operated by KMA (Table 2) was used for the probabilistic prediction verification of the ensemble wave model, and the probabilistic predictive analysis was performed, excluding cases in which data was lost due to the influence of typhoons during the verification process. In addition, the prediction performance of the ensemble wave model was compared using the prediction results of the deterministic forecast model. The prediction results of the regional wave model performed in the same computational domain as the ensemble model were used for the deterministic forecast model. The regional wave model was also performed twice a day at 00 UTC and 12 UTC as the ensemble wave model, and the prediction was performed up to 120 h at 3-h intervals. The proper distribution of the model prediction results and the performance of ensemble members in implementing the actual event were first diagnosed through the rank histogram and spread-skill graph based on the comparison between the prediction results of each member of the ensemble model and the results observed by the ocean data buoys as a method of verifying the wave model, and the probabilistic wave model's prediction accuracy and prediction uncertainty in hazardous weather conditions were evaluated through the probabilistic prediction verification indices of the BS and ROC.

Sea Wind Prediction Results of 24 Ensemble Members
The initial field of 23 members is produced through a perturbation process using the ensemble transformation Kalman filter (ETKF) when the initial field of the atmospheric model is produced. The control member that performed the model without adding perturbation and the perturbation member that added the analysis field by ETKF to the initial field were determined, and 24 different sea wind prediction results were obtained. The sea wind prediction results of three random members out of 24 members performed at 00 UTC on September 5, 2019, are as shown in Figure 3. The results of the sea surface wind with forecast lead time of control member of +00 h, +24 h, +48 h are shown in Figure 3a-c. The results by forecast lead time of the 13th member are as shown in Figure 3d-f, and the prediction results of the 24th member are as shown in Figure 3g-i. The prediction result of each member was slightly different than that of the control member, and the prediction deviation increased as the lead time increased. This increasing prediction deviation indicates the uncertainty of the prediction model, and it was found that there may be a large prediction error at the final lead time when predicting through a deterministic forecast model.

Ensemble Wave Model Forecasting Results
The wave model was run for 5 days before the typhoon moved north, using the sea wind data of 24 members. This was to simulate a realistic numerical simulation by implementing a state in which waves are sufficiently developed by the wind. The spaghetti contours of 3 m and 5 m significant wave heights, which were predicted at 00 UTC on 3 September 2019, are shown in Figures 4 and 5, respectively. The spaghetti contours, according to the threshold significant wave height, are shown by forecast lead times 00 h, 24 h, 48 h, and 72 h, respectively. Overall, a high significant wave height was predicted around the eye of the typhoon along the moving path of the typhoon, and the difference between the ensemble mean and the predicted significant wave height for each of 24 members was found. In addition, although the results of the deterministic forecast model tended to be similar to those of the ensemble model, the prediction result does not reflect the prediction uncertainty of the ensemble model, because the model was performed under an individual condition; thus, it has limitations for use in probabilistic forecasting.   The prediction results for each member were compared with the observation results. The predicted significant wave heights for each ensemble member were compared with the observed significant wave height, and the appropriateness of the spread of the ensemble wave model was diagnosed through the histogram for all buoy locations. The rank histogram is a representative diagnostic tool for evaluating the ensemble spread, which is generated by observed values from an ensemble sorted from lowest to highest [17]. A leftor right-skewed rank histogram means that there is a deviation in the ensemble spread; the U-shaped rank histogram means that the ensemble spread is either low and high biases, and a concave rank histogram means that the ensemble spread is narrow. Lastly, flat rank histograms indicate that the probability of the observed values belonging to each interval of the ensemble spread is similar and that the ensemble spread is relatively appropriate.
The rank histograms of lead times 00 h, 24 h, 48 h, and 72 h, respectively, are as shown in Figure 6a-d. The rank histogram up to a lead time of 24 h shows a U-shape with bias on the left and right, indicating that the ensemble does not spread out enough. After 24 h, it can be determined that the spread of the ensemble members is appropriate, because there was no bias with a flat shape, and that the range predicted by the model is reliable. The ensemble mean and spread were compared to determine whether the ensemble members could represent the actual event. The ensemble mean and the root-mean-square error (RMSE) of the observed data were calculated, and spread, which is the standard deviation of the members to the ensemble mean, was calculated through Equation (1) [27].
where M denotes the size of the ensemble, f m denotes the predicted value of the m member, and f denotes the ensemble mean. In general, it is known that there is a high correlation between RMSE and spread at the beginning of the forecast lead time, and the correlation decreases as the forecast continues [20]. It means that the ensemble spread simulates the actual event well when it is closer to the diagonal line of the graph. The relationship between the spread and the RMSE for the significant wave height over all buoy observation locations is as shown in Figure 7, and the spread frequency in the ensemble spread interval is as shown in the histogram. The 72 h forecast result tended to deviate slightly from the diagonal line than the beginning of the lead time, while it matched well with the diagonal line up to approximately 1 m. It was found that there was no significant correlation depending on the lead time, as this was a result of verifying a short-term probabilistic prediction for the typhoon period, and it was generally consistent with the diagonal line, indicating that the ensemble spread represents the actual event well.

Ensemble Wave Model Forecast Performance and Probabilistic Verification Result
The prediction results of the deterministic forecast model and the predicted result of the ensemble mean were compared with the observed significant wave height, and the bias and RMSE for each lead time were calculated through Equations (2) and (3).
where, F i is the predicted value, A i is the observed value, and n is the sample size. The results predicted up to 120 h at 3-h intervals in which the model was executed were compared with the significant wave heights observed in 17 ocean data buoys. In order to compare with the deterministic model result, averaged bias and RMSE values are calculated over all buoy observation points for each ensemble. As the result, both ensemble model and the deterministic model showed a tendency to over-estimation for the typhoon. The prediction error tended to increase when the lead time increased in both the ensemble model and the deterministic forecast model, as shown in Figure 8. The prediction error of the deterministic forecast model was particularly larger than that of the ensemble model, and the model tended to overestimate after 3 days of lead time. The ensemble wave model forecast performance for the significant wave height improved by approximately 18% in the RMSE in the 3-day lead time compared to the deterministic forecast model. In addition, it was confirmed that the typhoon intensity declined after 96 h of the forecast lead time, and the positive bias had decreased at the same forecast lead time. The time series of the observed significant wave height and the predicted data by both models at all buoy observation points were compared to determine the variations of bias and RMSE values, as seen in Figure 8. There are differences that were found in the prediction results of the wave model performed before and after the typhoon passed the buoy observation points. Overall, the model results predicted before the typhoon passed the observation points predicted that the maximum significant wave would occur earlier than the buoy observation, indicating that the positive bias and RMSE variations would occur later in the forecast lead time. Forecasting of waves conducted from the time the typhoon passed did not show significant differences between the predicted and observed values over the entire forecast lead time, and this difference seems to have reduced positive bias and forecast errors. The BS and ROC were used to verify the probabilistic prediction performance of the ensemble wave model. The BS is a representative probabilistic forecast verification index that determines the probabilistic prediction accuracy of the ensemble model. It consists of a total of three terms, as shown in Equation (4) [28].
where N is the number of forecasts for the actual event, f k and o k are the predicted probability and the mean of the observation frequencies in probability interval k, n k is the number of samples of the forecast, and o is the mean of the total observation frequencies.
The first term represents reliability, which means how close the predicted probability is to the actual probability of occurrence. A reliability value of 0 indicates a perfect prediction. The second term represents resolution, which means how far the actual probability of occurrence is from the mean of the observed frequencies in different probability intervals. Finally, the third term is uncertainty, which means the uncertainty included in the actual phenomenon. It has no relation to the accuracy of the prediction probability, since it represents the difficulty of the prediction situation. The BS consists of these three terms and indicates that the accuracy of probabilistic prediction is high if the sum is close to 0.
The total BS and element-specific values of the ensemble wave model calculated using Equation (4) is shown in Figure 9. The BS increased as the lead time increased, and although the occurrence probability decreased when the predicted significant wave height was large, the accuracy of the ensemble prediction was higher than that of the case where the predicted significant wave height was small.  ROC is a method of evaluating the predictive performance of a binary classification system, which is a graph of the true-and false-positive rates for events above a certain threshold [29,30]. In general, the x-axis of the graph is the false-positive rate, and the y-axis is the true-positive rate. It indicates that the forecast is perfect when the area of the ROC graph is close to 1, while it indicates that the forecast value is not large if the area is below 0.5. The ROC graph, when the threshold was above 2 m of significant wave height, is shown in Figure 10. Overall, although the accuracy of the forecast was relatively high, since the area of the graph was close to 1, the accuracy of the forecast tended to decrease slightly as the forecasting lead time increased. This trend was also found in the ROC graph with a significant wave height above the threshold of 4 m, as shown in Figure 11, and although the frequency of observation decreased as the threshold for the significant wave height increased, the predictive performance for the probability of wave tended to increase.

Conclusions
In this study, a numerical simulation of wave forecast at the time of Typhoon Lingling moving north was performed using the sea wind forecast field of EPSG, and the forecast performance of the ensemble wave model was verified using the significant wave height observed from the ocean data buoy around the Korean Peninsula.
As the results of the probability verification, the appropriateness of the distribution of the ensemble spread was diagnosed through the rank histogram, and it was found that the spread was appropriate without bias after 24 h of lead time. As a result of determining the accuracy of probabilistic prediction using the BS verification index, the prediction accuracy for the prediction of the ensemble model decreased as the lead time increased, and the BS was close to 0 when the significant wave height threshold increased. Even in ROC with a significant wave height of 2 m or above, it was found that the area under the curve gradually decreased to nearly 1 as the lead time increased, and the same trend was observed in ROC with a significant wave height of 4 m or above. This means that the accuracy of probabilistic prediction decreased along with the lead time, and although the observation frequency decreased as the threshold for the significant wave height increased, the forecast performance tended to increase compared to the high observation frequency.
The averaged RMSE of the ensemble members became smaller than the RMSE of the deterministic forecast model performed during the same period as the lead time increased, and it was particularly distinct after 3 days of lead time. The relatively stable forecast performance of the ensemble model was confirmed, despite the rapid changes in the sea wind due to the typhoon. Furthermore, the simulation result with the ensemble forecast model includes the uncertainty information. The prediction uncertainty is reflected in the wave model through the ensemble technique, which shows improved accuracy and forecast performance compared to the prediction result of a deterministic forecast model. Although the ensemble model necessitates a greater amount of computational time compared to that of the deterministic model, the ensemble model has greater prediction accuracy. It has been determined that the probabilistic forecast performance was secured, even though it was the result of short-term verification performed during the typhoon period. It will be useful in the field of probabilistic forecast with more improved accuracy of probabilistic prediction based on sufficient verification data.