Wind Features Extracted from Weather Simulations for Wind-Wave Prediction Using High-Resolution Neural Networks

: Nearshore wave forecasting is susceptible to changes in regional wind ﬁelds and envi-ronments. However, surface wind ﬁeld changes are difﬁcult to determine due to the lack of in situ observational data. Therefore, accurate wind and coastal wave forecasts during typhoon periods are necessary. The purpose of this study is to develop artiﬁcial intelligence (AI)-based techniques for forecasting wind–wave processes near coastal areas during typhoons. The proposed integrated models employ combined a numerical weather prediction (NWP) model and AI techniques, namely numerical (NUM)-AI-based wind–wave prediction models. This hybrid model comprising VGGN-Net and High-Resolution Network (HRNet) was integrated with recurrent-based gated recurrent unit (GRU). Termed mVHR_GRU, this model was constructed using a convolutional layer for extracting features from spatial images with high-to-low resolution and a recurrent GRU model for time series prediction. To investigate the potential of mVHR_GRU for wind–wave prediction, VGGNet, HRNet, and Two-Step Wind-Wave Prediction (TSWP) were selected as benchmark models. The coastal waters in northeast Taiwan were the study area. The length of the forecast horizon was from 1 to 6 h. The mVHR_GRU model outperformed the HR_GRU, VGGNet, and TSWP models according to the error indicators. The coefﬁcient of mVHR_GRU efﬁciency improved by 13% to 18% and by 13% to 15% at the Longdong and Guishandao buoys, respectively. In addition, in a comparison of the NUM–AI-based model and a numerical model simulating waves nearshore (SWAN), the SWAN model generated greater errors than the NUM–AI-based model. The results of the NUM–AI-based wind–wave prediction model were in favorable accordance with the observed results, indicating the feasibility of the established model in processing spatial data.


Introduction
Taiwan is located at the turning point of most typhoon paths in the western North Pacific-East Asia region [1]. Strong winds accompany typhoons, and storm surges, an abnormal rise in sea level, can cause damage to offshore facilities near coastal areas. Taiwan's high mountains play a pivotal role in changing the wind distribution of typhoons around the island and in reducing wind speeds. When waves abate under weakened winds, they become unpredictable [2]. Therefore, a useful scheme for wind-wave forecasting during typhoons is highly desirable for Taiwan.
Coastal waves are highly correlated to local coastal winds in small basins, such as lakes, or under influence of very extreme events, such as typhoons [3,4]. However, changes in coastal land and offshore wind fields cannot be comprehensively examined because in situ observations (e.g., those made at onshore weather stations and offshore buoys) are location-specific and lacking in data. Instead of using wind data from a single source, a numerical weather prediction (NWP) model can be used to simulate wind speed changes

•
Regarding spatial data processing, the HRNet [16] model is constructed using convolutional layers, which facilitates feature extraction from images with high-to-low resolution. An HRNet model maintains HR representations by connecting convolutions of high-to-low resolution in parallel and by repeatedly conducting multiscale fusion across parallel convolutions. • A recurrent-based GRU layer was employed to address the time delays in waves and winds. The GRU layer enables memory block units to determine the memory time length and provide the most effective memory length of time series data on the time axis.

Related Work
A typhoon with intense and rapidly varying wind fields forms dramatically different offshore waves under spatiotemporal constraints [18][19][20]. Numerical wave models are based on the concept of a wind-wave energy spectrum, which rises and falls in response to changes in the wind field and the wave energy propagated from other regions. The WAM [21] and WAVEWATCH III [22] for the deep ocean and Simulating WAves Nearshore (SWAN; [23]) for nearshore regions are the most commonly used numerical wave models for estimating ocean waves [24]. The third-generation wave model SWAN has been used extensively to model waves in coastal regions with gradual bathymetric variations [25][26][27]. SWAN has also been widely applied to the numerical simulation of typhoon waves [28][29][30]. Generally, physical models are complex in terms of their input requirements and execution time.
Wind forcing is a key determinant of the wave field [31], and wave models rely on the wind features to simulate the wave fields accurately. Data from in situ observations [32][33][34], remote sensing [35][36][37], and numerical modeling [38][39][40][41] have demonstrated the complex spatial distribution of wind and wave fields.
Multiple AI models, such as multilayer perceptrons [42], recurrent neural networks (RNNs; [43,44]), and CNNs [45], have been developed based on various DL algorithms. Notably, RNNs use their internal state (memory) to process input sequences of variable lengths [46]. Long short-term memory (LSTM) and GRU neural networks are RNNs that can learn the temporal dynamics of sequential data and address the vanishing gradient problem [47][48][49]. RNNs, which efficiently use temporal information for the sequential analysis of input data, are also suitable for addressing wind-wave prediction problems. Huang et al. [50] and Cheng et al. [51] have constructed accurate RNN-based wind speed prediction models, and Sadeghifar et al. [52], Demetriou et al. [53], Zhang et al. [54], and Zhou et al. [55] have conducted coastal wave height prediction using RNN-based models. In addition, in our previous study, we developed a Two-Step Wind-wave Prediction (TSWP) typhoon wind-wave prediction model, using deep RNNs to forecast wind speed and wave height during typhoons. However, because in situ observational data were applied, the wind speed data could not describe the spatial wind field features, resulting in the serious underestimation of peak values.
According to [56], CNNs have two main representations. Low-resolution representations are mainly used for image classification, whereas HR representations are essential for vision problems such as semantic segmentation and object detection. The fully convolutional network (FCN; [57]) computes low-resolution representations through the removal of the fully connected layers in a classification network, performing estimation from their coarse segmentation confidence maps. The FCN model is widely used as the backbone of VGGNet [13]. Studies that have employed low-resolution networks, such as [58,59], determined the spatiotemporal correlations of wind speed prediction using a predictive deep CNN. Silva et al. [60] identified rip currents with breaking waves by using a faster R-CNN. HR representations were maintained throughout the process through the use of a network constructed with connecting multiresolution (from high-to low-resolution) parallel convolutions, with repeated information exchange occurring across parallel convolutions [16]. Representative HR models include GoogLeNet [14], ResNet [15], SegNet [61], and U-Net [62]. Some researchers have studied HR networks in relation to wind-wave predictions. For example, Shivam et al. [63] employed ResNet for short-term wind speed prediction, and Choi et al. [64] estimated significant wave heights from ocean images using VGGNet, GoogLeNet, ResNet, and DenseNet.
HRNet [16], a new-generation HR CNN, maintains HR representations through the connection of convolutions of high-to-low resolution in parallel, where multiscale fusion occurs repeatedly across parallel convolutions. HRNet constitutes a novel convolution operation technique suitable for extracting the spatiotemporal features of wind fields and can therefore be further applied to enhance wind-wave prediction. To examine the prediction performance of high-and low-resolution representations, we compared the performance of the HRNet and VGGNet models with that of an established hybrid model integrating VGGNet, HRNet, and a recurrent-based GRU layer, referred to as the VHR-GRU model. In addition, we used the TSWP model [65] as a benchmark to compare the error range between the TSWP, in which only in situ observational data were applied, and the NUM-AI-based model, in which the NWP-based wind field simulation data were incorporated, and further evaluated whether the NUM-AI-based model could effectively increase prediction accuracy. Although we focused on developing the AI-based windwave model, we also analyzed the simulation results of the SWAN model to identify inter-model differences.

Framework of the NUM-AI-Based Model
In this section, we describe the methodology of the proposed compound numerical outputs and AI-based models for wind-wave prediction during typhoons. The methodology comprised the following three stages: (1) WRF wind numerical model simulation, (2) NUM-AI-based wind-wave prediction model development, and (3) model verification using numerical SWAN wave model simulations. The methodology diagram in Figure 1 outlines the data flow of these three stages (presented as cloud blocks). The processes of the stage simulations and techniques are described as follows. diction performance of high-and low-resolution representations, we compared the performance of the HRNet and VGGNet models with that of an established hybrid model integrating VGGNet, HRNet, and a recurrent-based GRU layer, referred to as the VHR-GRU model. In addition, we used the TSWP model [65] as a benchmark to compare the error range between the TSWP, in which only in situ observational data were applied, and the NUM-AI-based model, in which the NWP-based wind field simulation data were incorporated, and further evaluated whether the NUM-AI-based model could effectively increase prediction accuracy. Although we focused on developing the AI-based windwave model, we also analyzed the simulation results of the SWAN model to identify intermodel differences.

Framework of the NUM-AI-Based Model
In this section, we describe the methodology of the proposed compound numerical outputs and AI-based models for wind-wave prediction during typhoons. The methodology comprised the following three stages: (1) WRF wind numerical model simulation,

Wind Field Simulation Using the WRF Numerical Model
Advanced Research WRF version 3.9.1 was used to simulate the HR field data of typhoons. WRF is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. Additional details can be found in the WRF Version 3 Modeling System User's Guide [66]. The WRF model was employed as the numerical simulation-based model for precomputing numerical solutions to determine the wind field using the grid system. These simulated wind field outcomes are used for training the NUM-AI-based prediction model in the next stage. In this stage, the NOAA National Center for Environmental Prediction (NCEP) Global Data Assimilation System (GDAS) FNL (Final) Operational Global Analysis data were collected for the initial field and boundary conditions. FNL refers to the final analysis and the products from the data assimilation and NCEP global NWP system.

Wind Field Simulation Using the WRF Numerical Model
Advanced Research WRF version 3.9.1 was used to simulate the HR field data of typhoons. WRF is a next-generation mesoscale numerical weather prediction system designed for both atmospheric research and operational forecasting applications. Additional details can be found in the WRF Version 3 Modeling System User's Guide [66]. The WRF model was employed as the numerical simulation-based model for precomputing numerical solutions to determine the wind field using the grid system. These simulated wind field outcomes are used for training the NUM-AI-based prediction model in the next stage. In this stage, the NOAA National Center for Environmental Prediction (NCEP) Global Data Assimilation System (GDAS) FNL (Final) Operational Global Analysis data were collected for the initial field and boundary conditions. FNL refers to the final analysis and the products from the data assimilation and NCEP global NWP system.
The WRF model provides various options for physical parameters, such as weather patterns, geographic locations, and simulation scenarios. We referred to [17,[67][68][69] in selecting the physical parameters. These researchers studied the typhoon track, rainfall, and wind forecasts for WRF model simulations using reasonable physical parameters for typhoons approaching Taiwan. The suitable schemes in the respective WRF typhoon mod-els were as follows: Single-Moment 6-Class Microphysics Scheme [70] for microphysical processes, Yonsei University [71] for the planetary boundary layer, Kain-Fritsch [72] for cumulus convection, Rapid Radiative Transfer Model [73] for long-wave radiation, and Dudhia [74] for short-wave parameters.

Wave Outcomes Obtained Using the NUM-AI-Based Prediction Model
In this stage, the outcomes of the wind field from the WRF model were used for training the NUM-AI model. The NUM-AI-based prediction model was then developed to predict the wave heights in the study area. Note that the developed NUM-AI model is useful only for specific locations and not for wave field predictions. To train an NUM-AIbased model, the wind field outcomes of the WRF model and in situ data were applied as input attributes, with the wave heights at (t + i) i = 1,6 of a buoy representing the target. The forecast horizon was from 1 to 6 h.
The NUM-AI-based model combines the WRF numerical model with DL techniques to predict winds and waves in the study area. The network of the NUM-AI-based model ( Figure 2) executes feature extraction and recurrent operations. First, the feature extraction operation, involving cascade pooling and convolutional layers, was employed to extract the feature maps from the 2D wind field simulated using the WRF model. The recurrent operation was then used to construct the time series-based models. The time series data encompassed the wind field simulated using the WRF model and the in situ observational data (i.e., typhoon information, wind data from gauges, and wave data from buoys) and were transformed into one-dimensional (1D) array data.
The WRF model provides various options for physical parameters, such as weather patterns, geographic locations, and simulation scenarios. We referred to [17,[67][68][69] in selecting the physical parameters. These researchers studied the typhoon track, rainfall, and wind forecasts for WRF model simulations using reasonable physical parameters for typhoons approaching Taiwan. The suitable schemes in the respective WRF typhoon models were as follows: Single-Moment 6-Class Microphysics Scheme [70] for microphysical processes, Yonsei University [71] for the planetary boundary layer, Kain-Fritsch [72] for cumulus convection, Rapid Radiative Transfer Model [73] for long-wave radiation, and Dudhia [74] for short-wave parameters.

Wave Outcomes Obtained Using the NUM-AI-Based Prediction Model
In this stage, the outcomes of the wind field from the WRF model were used for training the NUM-AI model. The NUM-AI-based prediction model was then developed to predict the wave heights in the study area. Note that the developed NUM-AI model is useful only for specific locations and not for wave field predictions. To train an NUM-AIbased model, the wind field outcomes of the WRF model and in situ data were applied as input attributes, with the wave heights at (t + i)i = 1,6 of a buoy representing the target. The forecast horizon was from 1 to 6 h.
The NUM-AI-based model combines the WRF numerical model with DL techniques to predict winds and waves in the study area. The network of the NUM-AI-based model ( Figure 2) executes feature extraction and recurrent operations. First, the feature extraction operation, involving cascade pooling and convolutional layers, was employed to extract the feature maps from the 2D wind field simulated using the WRF model. The recurrent operation was then used to construct the time series-based models. The time series data encompassed the wind field simulated using the WRF model and the in situ observational data (i.e., typhoon information, wind data from gauges, and wave data from buoys) and were transformed into one-dimensional (1D) array data.  Figure 2. Artchiteture of an NUM-AI-based prediction model constructed with convolutional and recurrent neural networks and using numerical weather model outcomes.

Output layer
As detailed in Figure 2, feature extraction was performed using fully convolutional feature extractors, namely VGGNet [13] and HRNet [16]. We also developed a VGGNNet-HRNet hybrid called VHRNet, which was inspired by the backbones of both architectures. These architectures are described as follows. VGGNet networks [13], such as VGG11 and VGG16 (the common variant of VGGNet; [75]), structurally consists of five blocks of multiple convolutional layers and a single max pooling layer. Herein, the four blocks were employed to ensure consistency with the four-block architecture of HRNet. Figure 3 illustrates the modified VGGNets of VGG11 and VGG16 (termed mV11 and mV16, respectively). The four-block structure of VGG11 uses one, one, two, and two convolutional layers, whereas that of VGG16 uses two, two, three, and three convolutional layers. Both modified VGGNet models completed operations with three fully connected layers.

Figure 2.
Artchiteture of an NUM-AI-based prediction model constructed with convolutional and recurrent neural networks and using numerical weather model outcomes.
As detailed in Figure 2, feature extraction was performed using fully convolutional feature extractors, namely VGGNet [13] and HRNet [16]. We also developed a VGGNNet-HRNet hybrid called VHRNet, which was inspired by the backbones of both architectures. These architectures are described as follows. VGGNet networks [13], such as VGG11 and VGG16 (the common variant of VGGNet; [75]), structurally consists of five blocks of multiple convolutional layers and a single max pooling layer. Herein, the four blocks were employed to ensure consistency with the four-block architecture of HRNet. Figure 3 illustrates the modified VGGNets of VGG11 and VGG16 (termed mV11 and mV16, respectively). The four-block structure of VGG11 uses one, one, two, and two convolutional layers, whereas that of VGG16 uses two, two, three, and three convolutional layers. Both modified VGGNet models completed operations with three fully connected layers. HRNet [16] introduces a multiscale parallel design to produce HR features without requiring a decoding stage. As illustrated in Figure 4, the HRNet architecture consists of four parallel branches [76], with the upper branch (branch 1) storing the spatial details through the convolutions while maintaining HR. In the lower branches (branches 2-4), strided convolutions were applied to reduce the spatial size of the feature maps and enlarge the receptive fields [76]; these branches corresponded to the four high-to-low spatial resolutions. In Figure 4, "convs" refers to a convolutional block consisting of a convolutional layer and a pooling layer. HRNet [16] introduces a multiscale parallel design to produce HR features without requiring a decoding stage. As illustrated in Figure 4, the HRNet architecture consists of four parallel branches [76], with the upper branch (branch 1) storing the spatial details through the convolutions while maintaining HR. In the lower branches (branches 2-4), strided convolutions were applied to reduce the spatial size of the feature maps and enlarge the receptive fields [76]; these branches corresponded to the four high-to-low spatial resolutions. In Figure 4, "convs" refers to a convolutional block consisting of a convolutional layer and a pooling layer.  The proposed mVHRNet is a combination of the modified VGGNet and HRNet architectures ( Figure 5). To extract the feature maps, several blocks in the mV11 and mV16 were embedded into the backbone of a HRNet structure (hereafter mV11HR and mV16HR, respectively). In the original HRNet, the first convs block downsamples the inputs to a quarter of their original size. To obtain more spatial information from the input images, we applied half-image downsampling in block 1 of mVHRNet. Furthermore, we added a fifth branch of feature maps at high to low resolution. After feature extraction, The proposed mVHRNet is a combination of the modified VGGNet and HRNet architectures ( Figure 5). To extract the feature maps, several blocks in the mV11 and mV16 were embedded into the backbone of a HRNet structure (hereafter mV11HR and mV16HR, respectively). In the original HRNet, the first convs block downsamples the inputs to a quarter of their original size. To obtain more spatial information from the input images, we applied half-image downsampling in block 1 of mVHRNet. Furthermore, we added a fifth branch of feature maps at high to low resolution. After feature extraction, the highly abstract features were obtained and fed in a concatenation layer in which 1D and 2D data were fused and flattened into a 1D vector, which was then connected with the fully connected layers. Subsequently, model training with the fully connected layers was performed. The proposed mVHRNet is a combination of the modified VGGNet and HRNet architectures ( Figure 5). To extract the feature maps, several blocks in the mV11 and mV16 were embedded into the backbone of a HRNet structure (hereafter mV11HR and mV16HR, respectively). In the original HRNet, the first convs block downsamples the inputs to a quarter of their original size. To obtain more spatial information from the input images, we applied half-image downsampling in block 1 of mVHRNet. Furthermore, we added a fifth branch of feature maps at high to low resolution. After feature extraction, the highly abstract features were obtained and fed in a concatenation layer in which 1D and 2D data were fused and flattened into a 1D vector, which was then connected with the fully connected layers. Subsequently, model training with the fully connected layers was performed. RNNs, although suitable for time series data, are difficult to train and are detrimentally affected by vanishing or exploding gradients; therefore, they cannot solve the longterm dependency problem [77]. Proposed by [49], the GRU is a variant of LSTM, with a gated mechanism in the RNN structure. Specifically, the GRU has two gates (update and reset gates), and the LSTM has three (input, output, and forget gates) [78]. Because the GRU has fewer training parameters and usually converges quicker than the LSTM during training [7], we employed a GRU neural network to compute the temporal dynamics of time series data and shorten the computational time. After the dropout layer, wherein the RNNs, although suitable for time series data, are difficult to train and are detrimentally affected by vanishing or exploding gradients; therefore, they cannot solve the long-term dependency problem [77]. Proposed by [49], the GRU is a variant of LSTM, with a gated mechanism in the RNN structure. Specifically, the GRU has two gates (update and reset gates), and the LSTM has three (input, output, and forget gates) [78]. Because the GRU has fewer training parameters and usually converges quicker than the LSTM during training [7], we employed a GRU neural network to compute the temporal dynamics of time series data and shorten the computational time. After the dropout layer, wherein the weights of a certain proportion of nodes are randomly dropped to prevent overfitting, the results are produced from the output layer.

Comparison with the Wave Numerical Simulations
The SWAN numerical models were used to simulate the nearshore wave field during typhoons to enable comparison with the AI-based model. Additional details can be found in the SWAN User Manual [79]. According to [80,81], the application of the SWAN model to medium-and small-grid networks is suitable for simulating the wave state in nearshore Taiwan. Based on a Eulerian formulation of the discrete spectral balance of action density that accounts for refractive propagation over arbitrary bathymetry and current fields [25], SWAN is driven by boundary conditions and local winds. Here, the wind fields obtained from WRF results were used as inputs for the SWAN model simulations in the study.
To simulate the typhoon waves, the computational domain ranges from 10 • to 35 • N and 110 • to 140 • E. The near-field nested computation domains in northern Taiwan at Longdong and Guishandao buoys. The various settings of parameters were referred to [82]. For the far-field wave simulations the grid spacing is 12 km and the 20 exponentially spaced frequencies from 0.01 to 0.5 Hz with 60 evenly spaced directions for a time step of 15 min are used. For numerical computations, a grid size of 2 km and time step of 10 min are used in the coastal nested domain, while in near-field nested domains, a grid size of 300 m and a time step of 5 min are employed. The SWAN simulation results were verified using the wave heights at the experimental sites (Longdong and Guishandao buoys) and were compared against the results obtained from the previous stage of the NUM-AI-based wind-wave prediction model (see Section 5.3). To simulate the typhoon waves, the computational domain ranges from 10° to 35° N and 110° to 140° E. The near-field nested computation domains in northern Taiwan at Longdong and Guishandao buoys. The various settings of parameters were referred to [82]. For the far-field wave simulations the grid spacing is 12 km and the 20 exponentially spaced frequencies from 0.01 to 0.5 Hz with 60 evenly spaced directions for a time step of 15 min are used. For numerical computations, a grid size of 2 km and time step of 10 min are used in the coastal nested domain, while in near-field nested domains, a grid size of 300 m and a time step of 5 min are employed. The SWAN simulation results were verified using the wave heights at the experimental sites (Longdong and Guishandao buoys) and were compared against the results obtained from the previous stage of the NUM-AIbased wind-wave prediction model (see Section 5.3).    Pengjiayu. Information on typhoon characteristics, including the hourly climatologic data, was collected with reference to the pressure at the typhoon center, the maximum radius of winds over 15.5 m/s, the movement speed of the typhoon, and the maximum wind speed at the typhoon center. Surface wind speeds on the ground and notable wave heights at the buoys were sourced from CWB-managed stations. Table 2 lists these attributes by using the minimum, maximum, and mean values of the typhoon, ground, and buoy station information.

WRF Simulations
We collected the FNL Operational Global Analysis data from the NOAA NCEP GDAS to examine the WRF model's initial field and boundary conditions. The typhoon information, including the estimated track, intensity, and strength of the typhoons, was collected from the Regional Specialized Meteorological Center Tokyo-Typhoon Center of the Japan Meteorological Agency. A two-level nested grid was set (Figure 7).  The outer domain (Domain 1) included the typhoon movement path ranging between 117° E-132° E and 15° N-30° N, with a 15-km resolution (grid size = 0.15°, number of grids = 100 × 100). The inner domain (Domain 2) included the simulated circulation systems of medium-and small-scale typhoons and the wind-and wave-forming areas in northeastern Taiwan. To more precisely describe the spatial scale and resolution of windgenerating waves in Domain 2, we referred to [81], setting the range between 120° E-125.12° E and 22° N-27.12° N, with a 4-km resolution (grid size = 0.04°, number of grids = 128 × 128). Furthermore, 32 vertical levels were set. Each typhoon was simulated for several consecutive days, with numerical prediction and analysis conducted four times a day for 12 h each time. To more precisely describe the spatial scale and resolution of wind-generating waves in Domain 2, we referred to [81], setting the range between 120 • E-125.12 • E and 22 • N-27.12 • N, with a 4-km resolution (grid size = 0.04 • , number of grids = 128 × 128). Furthermore, 32 vertical levels were set. Each typhoon was simulated for several consecutive days, with numerical prediction and analysis conducted four times a day for 12 h each time.

Model Construction
We used the WRF model to simulate the hourly wind fields of the 38 typhoons in the study area listed in Table 1. After the simulations, the wind field analysis data, such as the x-and y-plane axes and the z height axis of each typhoon event, were obtained. The output data of the WRF model constituted the surface wind field data (including u10 and v10, the x-and y-axis wind speed, respectively) and were used as the input data for the coastal wind-wave simulations [81]. Therefore, we output the wind field data at a surface height of 10 m, obtaining the simulation values of 2436 pieces of hourly data. To establish the NUM-AI-based model, data partitioning was conducted according to the sequence of the typhoons. The training set was composed of the data of 24 typhoons from 2005 to 2012 (accounting for 63% of the total number of typhoons, with 1562 pieces of hourly data). The validation set comprised the five 2013 typhoons (accounting for 13% of the total typhoons, with 270 pieces of hourly data), and the testing set consisted of nine typhoons occurring between 2014 and 2019 (accounting for 24% of the total typhoons, with 604 pieces of hourly data).
We employed the Python programming language to establish the models, with the Tensorflow (version 2.3) and Keras libraries of Python used for model computation. When constructing an NUM_AI-based model, feature extraction processing is first initiated, and the 2D matrices of the wind fields are passed through the four-block architectures of mV16, HRNet, and mVHRNet (Figures 3-5). The 2D matrices of 128 × 128 pixels generated by the WRF model (using Domain 2) served as the inputs for the model calculations. These 2D wind field inputs were then encoded into two channels, u10 and v10. The settings of the three convolutional feature extractors are described as follows. For the modified VGGNet, the parameter settings were as follows. The kernel size was 3, the padding method was "same", the max pooling involved a pool size of 2, and the activation function was the rectified linear unit (ReLU). The number of filters in blocks 1-5 was 32, 64, 128, and 256, respectively.
For HRNet, the parameter settings were as follows. The kernel size is 3 in each convs, the padding method was "same", the max pooling involved a pool size of 4, and the activation function was ReLU. The stride size of the strided convolution was 2. The resolution increase was implemented through the bilinear upsampling approach [56]. The number of filters in the four convs was 32, 64, 128, and 256, respectively.
For mVHRNet, the parameter settings were as follows. The kernel size was 3 in blocks 1 to 4, the padding method was "same", the max pooling involved a pool size of 2, and the activation function was ReLU. The stride size of the strided convolution was 2, and the upsampling approach was the same as that employed in HRNet. The number of filters in blocks 1-4 was 32, 64, 128, and 256, respectively.
After the 2D feature maps and 1D attributes were flattened, both flattened arrays were combined using a concatenate layer. Next, the GRU layers were applied under the following settings. The activation function was ReLU, the dropout rate of the dropout layer was 0.2, min-max normalization was used to rescale the range of features in (0, 1), the loss function was the mean squared errors, the metric was the root-mean-square errors (RMSEs), the epoch size was 300, and the optimizer was an adaptive moment estimation optimizer with the standard parameter values (l = 10 −3 , b 1 = 0.9, b 2 = 0.999, e = 10 −8 ) recommended by [84]. This provided sufficient training time for even the slowest-learning architectures to reach a validation plateau.
The hyperparameters of the number of GRU layers and the number of neurons in a GRU layer were calibrated; their calibration ranges were 1 to 5 and 200 to 2000, respectively. The parameter calibrations of five models, mV11_GRU, mV16_GRU, HR_GRU, mV11HR_GRU, and mV16HR_GRU, with feature extractors connecting to GRU, are listed in Tables 3 and 4. The calibration results of the optimal parameters obtained using a vali-dation set from the Longdong and Guishandao buoys are displayed. Possible parameter combinations were searched, and those with the smallest RMSEs were retained as the optimal parameters. The RMSE is defined as where N is the total number of observations, P i is the predicted wave height at record i, and O i is the observed wave height at record i.

Typhoon Testing and Prediction
We used multiple typhoons in the testing set for model prediction and evaluation. Figures 8 and 9 illustrate the predicted wave height values within a lead time of 1 to 6 h from the Longdong and Guishandao buoys in relation to the typhoon test set. The optimal parameters obtained in Section 4 (i.e., mV11_GRU, mV16_GRU, HR_GRU, mV11HR_GRU, and mV16HR_GRU) were applied for prediction. Furthermore, the TSWP model was employed to compare the differences in performance of models employing observation data only versus employing both observation data and NWP wind simulation data. In these figures, we found that prediction accuracy was inversely related to the lead time. This is because when the prediction time length was longer, the uncertainty of the windwave readings increased. However, the waveform changes were still discernible overall. Moreover, when the lead time reached 4 h and above, the prediction values tended to underestimate peak values and add a delay. A possible cause is that the rapidly changing typhoon circulation structure leading to the WRF weather simulation increasing the higher wind errors when future time t > = 4 h. J. Mar. Sci. Eng. 2021, 9, x FOR PEER REVIEW 13 of 24 Figure 8. Wave heights predicted and measured using mV11_GRU, mV16_GRU, HR_GRU, mV11HR_GRU, mV16HR_GRU, and TSWP at the Longdong buoy at the following lead times: (a) t + 1, (b) t + 2, (c) t + 4, and (d) t + 6.   Wave heights predicted and measured using mV11_GRU, mV16_GRU, HR_GRU, mV11HR_GRU, mV16HR_GRU, and TSWP at the Guishandao buoy at the following lead times: (a) t + 1, (b) t + 2, (c) t + 4, and (d) t + 6.

Model Performance Levels
To examine the prediction errors of each model, evaluation indicators, namely the RMSE, relative RMSE (rRMSE), mean absolute error (MAE), and relative mean absolute error (rMAE), were measured. These are defined as follows: where O is the average of all the observed wave heights at a specific buoy. The MAE measures the average magnitude of the errors in a set of forecasts, indicating that all the individual differences are weighted equally in the average calculation. The RMSEs are squared before they are averaged, conferring a relatively high weight to large errors; thus, this measure is most useful when large errors are particularly undesirable [85]. Figure 10 presents the RMSEs, rRMSEs, MAEs, and rMAEs of the data from both buoys. The performance of HR_GRU, mV11HR_GRU, and mV16HR_GRU was more satisfactory than those of mV11_GRU, mV16_GRU, and the TSWP model. In sum, the three HR-based structures exhibited relatively few prediction errors at high values and less bias compared with the actual observation values.    Table 5 presents a summary of the mean RMSE and MAE over a lead time of 1 to 6 h. Compared with the observation values from the Longdong buoy, the mean RMSE and MAR of mV16HR_GRU were the most satisfactory among the remaining models, as were the mean RMSE and MAE of mV11HR_GRU for the observed values at the Guishandao buoy. On the basis of the error indicators of the TSWP model, we defined the improvement rate (IR) for the RMSE and MAE of the models as follows: where X k is the metric (i.e., RMSE and MAE) of a specific model k and X TSWP is the metric of the TSWP model. The results of the calculation (Table 5) are as follows.
(1) With regard to the observation values from the Longdong buoy, the RMSE of the three HR-based models improved by 10% to 15% compared to the TSWP model, whereas the MAE improved by 12% to 19%. The RMSE and MAE of the mV11_GRU and mV16_GRU models improved by 6% to 9% and by 4% to 5%, respectively. (2) In terms of the observation data from the Guishandao buoy, the IR in the RMSE (IR RMSE ) of the three HR-based structures was 10% to 12%, and the IR in the MAE (IR MAE ) was 14% to 18%. The IR RMSE and IR MAE of the mV11_GRU and mV16_GRU models were 5% to 7% and 6% to 8%, respectively. Compared with the VGG-based models, the HR-based models more effectively reduced the error between the observed and predicted values. This may be attributable to the HR-based models' integration of feature maps with high-to-low resolution; the VGG-based models only integrated images with a single spatial resolution. Therefore, the HR-based models generated superior calculation results.
In addition, the efficiency coefficient CE, which indicates the goodness-of-fit, was used, as was the correlation coefficient r. They are defined as where P is the average of all the predicted wave heights at a specific buoy. Figure 11 plots the CE and r of the data from both buoys. Figure 11a,b illustrates the reduction in the CE and r along with the increase in the forecast horizon. The three HR-based models had higher prediction efficiency. The increasing forecast horizon verified that the HR-based models maintained higher prediction efficiency than did the VGG-based models and TSWP model. The CE of the HR-based models was not substantially reduced.
where � is the average of all the predicted wave heights at a specific buoy. Figure 11 plots the CE and r of the data from both buoys. Figure 11a,b illustrates the reduction in the CE and r along with the increase in the forecast horizon. The three HRbased models had higher prediction efficiency. The increasing forecast horizon verified that the HR-based models maintained higher prediction efficiency than did the VGGbased models and TSWP model. The CE of the HR-based models was not substantially reduced.  Table 6 lists the mean CE and r and the evaluation data for improvement efficiency. The analytical results indicated that mV16HR_GRU and mV11HR_GRU had the highest CE and r compared with the observation data from the Longdong and Guishandao buoys. The prediction efficiency of the HR-based models (HR_GRU, mV11HR_GRU, and mV16HR_GRU) differed according to the testing site. mV16HR_GRU requires more layers of convolution operations than does mV11HR_GRU; therefore, a more complex network structure was required for prediction at the Longdong buoy to generalize problems and effectively represent the testing site. In terms of the marine environment, the Longdong buoy is closer to the coast (approximately 1 km away) than the Guishandao buoy  Table 6 lists the mean CE and r and the evaluation data for improvement efficiency. The analytical results indicated that mV16HR_GRU and mV11HR_GRU had the highest CE and r compared with the observation data from the Longdong and Guishandao buoys. The prediction efficiency of the HR-based models (HR_GRU, mV11HR_GRU, and mV16HR_GRU) differed according to the testing site. mV16HR_GRU requires more layers of convolution operations than does mV11HR_GRU; therefore, a more complex network structure was required for prediction at the Longdong buoy to generalize problems and effectively represent the testing site. In terms of the marine environment, the Longdong buoy is closer to the coast (approximately 1 km away) than the Guishandao buoy (approximately 9 km away). Because the winds and waves are more susceptible to the topographic factors of the nearshore region, a more complex model structure was required to facilitate the processing of complex spatiotemporal data. Here, the IR for the CE and r is calculated as follows: where Y k is the metric (i.e., CE and r) of a specific model k and Y TSWP is the metric of the TSWP model.
As detailed in Table 6, compared with the TSWP model, the IR CE of the three HRbased structures were 13% to 18% and 13% to 15% at the Longdong and Guishandao buoys, respectively; in addition, their IR r was 3% to 6% and 5% to 7%, respectively.
To compute a term-by-term comparison of the relative errors in the prediction with respect to the actual value of the variable, the mean absolute percentage errors (MAPEs) and root mean squared percentage errors (RMSPEs) were calculated. Their formulas are presented as follows: The MAPE is an unbiased statistic for measuring the predictive capability using a term-by-term comparison of each record. The RMSPE provides the same RMSE properties as a percentage, also through the use of a term-by-term comparison of each record [86]. Figure 12 displays the MAPE and RMSPE of the data from both buoys. Figure 12a,b show the MAPE rising with the increased forecast horizon. As indicated in Figure 12c,d, when the forecast horizon lengthened, the RMSPE of the six models differed considerably; the TSWP model exhibited the largest slope, followed by mV11_GRU and mV16_GRU. The rise in the slopes of the HR_GRU, mV11HR_GRU, and mV16HR_GRU were relatively small. The results demonstrated that the HR-based models performed favorably in terms of the RMSPE; that is, such models were less likely to underestimate peak values under more intense wind-wave conditions. The IR MAPE of the three HR-based models at the Longdong and Guishandao buoys was 20% to 24% and 12% to 14%, respectively, and their IR RMSPE was 70% to 77% and 71% to 73%, respectively ( IRRMSPE was 70% to 77% and 71% to 73%, respectively ( Table 7). The definition and calculation of IRMAPE and IRRMSPE are provided in Equation (5).

Comparison with the SWAN Simulations
Herein, we used the results of the SWAN numerical analysis conducted by [87]. To initialize the numerical model, the wind field outcomes (at a 10-m elevation) derived from the WRF simulation analysis were considered. The various typhoons were simulated to acquire sufficient data for the grids. The grid setting was consistent with the grid range of the WRF simulation model (see Section 4.1). The wind field data used in each simulation were the wind field simulation data obtained from the WRF model at a lead time ranging from 1 to 6 h. Hsieh [87] applied the SWAN model to simulate the six typhoons that occurred between 2014 and 2016 (i.e., Matmo, Fung-wong, Chan-hom, Soudelor, Malakas, and Megi) for comparison. To ensure a balanced evaluation, we averaged the prediction values with a lead time ranging from 1 to 6 h obtained using the AI-based models and compared them with the simulation values obtained using the SWAN model. Figure 13 presents the wave height simulation results of the SWAN model at the Longdong and Guishandao buoys. Compared with the observation data from the Longdong buoy, the SWAN model yielded overestimations for the Malakas and Megi typhoons but underestimated results for Soudelor. Compared with the observation data from the Guishandao buoy, the SWAN model yielded overestimations for the Matmo, Fung-wong, Malakas, and Megi typhoons but underestimated results for Chan-hom and Soudelor. The peak values were sometimes overestimated or underestimated because of the initial conditions of the wind field and the numerical methods applied. Differing from the AIbased model in which data-driven methods were used to estimate the peak values, the AI-based model was insufficiently trained because the peak value records were too few, which subsequently resulted in inaccurate estimations. The peak values were sometimes overestimated or underestimated because of the initial conditions of the wind field and the numerical methods applied. Differing from the AI-based model in which data-driven methods were used to estimate the peak values, the AI-based model was insufficiently trained because the peak value records were too few, which subsequently resulted in inaccurate estimations. Figure 13. Wave heights predicted and measured using mV11_GRU, mV16_GRU, mV11HR_GRU, mV16HR_GRU, and SWAN at the: (a) Longdong buoy, (b) Guishandao buoy. Figure 14 plots comparisons of the SWAN and AI-based models in relation to the RMSE and CE metrics of the Longdong and Guishandao buoys. The SWAN model produced greater RMSEs and a lower CE compared with those of the AI-based mV11_GRU, mV16_GRU, mV11HR_GRU, and mV16HR_GRU models. Although the SWAN model had less accuracy compared to AI-based models, a numerical based SWAN model could output reasonable results by adjusting the boundary and initial conditions. Therefore, the SWAN model remained reliable and reasonable in empirical applications.  Figure 14 plots comparisons of the SWAN and AI-based models in relation to the RMSE and CE metrics of the Longdong and Guishandao buoys. The SWAN model produced greater RMSEs and a lower CE compared with those of the AI-based mV11_GRU, mV16_GRU, mV11HR_GRU, and mV16HR_GRU models. Although the SWAN model had less accuracy compared to AI-based models, a numerical based SWAN model could output reasonable results by adjusting the boundary and initial conditions. Therefore, the SWAN model remained reliable and reasonable in empirical applications. By contrast, the establishment of the AI-based model was based on the statistical regression values of past data. To some extent, the prediction values were obtained through extrapolation analysis. Performance evaluation of the AI-based model could be regarded as an investigation into the suitability of the applied extrapolation techniques. Accordingly, comparing the size of the errors of the numerical models and AI-based models was unnecessary. The initial results obtained using the AI-based models could be used as the initial solutions for numerical models to accelerate their calculation convergence, enabling them to provide more accurate simulation data.

Conclusions
Taiwan is subject to frequent typhoons in summer and autumn, with waves caused by strong typhoons larger and more damaging than those driven by typical seasonal winds. Therefore, accurate wind and coastal wave forecasts during typhoon periods are necessary. Conventionally, typhoon wind field and wind-wave prediction is conducted using numerical meteorological and wave models. The advantage of numerical models is that the forecast is based on physical processes; however, the simulation process is timeconsuming, and such models cannot provide real-time data. AI techniques can increase calculation efficiency but cannot provide sufficient descriptions of physical processes.
The main findings are presented as follows. First, according to the error indicators, HR_GRU, mV11HR_GRU, and mV16HR_GRU outperformed mV11_GRU, mV16_GRU, and TSWP. This indicated that the network architectures of the high-resolution network (i.e., HR_GRU model) and the convolutional layers using high-to-low resolution structure with combined recurrent GRU layers (i.e., mVHR_GRU-based models) were more suitable for extracting spatial wind field features than those of traditional models of VGGNets and the TSWP (a two-step wind-wave neural network-based prediction model).
Moreover, they improved the spatial data required for wind-wave estimation. Compared with the TSWP model, the IR of the CE of mVHR_GRU at the Longdong and Guishandao buoys was 13% to 18% and 13% to 15%, respectively. Second, for the various HR-based models (HR_GRU, mV11HR_GRU, and mV16HR_GRU), various numbers of convolution operation layers were required at the two testing sites to optimize the predic-

Conclusions
Taiwan is subject to frequent typhoons in summer and autumn, with waves caused by strong typhoons larger and more damaging than those driven by typical seasonal winds. Therefore, accurate wind and coastal wave forecasts during typhoon periods are necessary. Conventionally, typhoon wind field and wind-wave prediction is conducted using numerical meteorological and wave models. The advantage of numerical models is that the forecast is based on physical processes; however, the simulation process is timeconsuming, and such models cannot provide real-time data. AI techniques can increase calculation efficiency but cannot provide sufficient descriptions of physical processes.
The main findings are presented as follows. First, according to the error indicators, HR_GRU, mV11HR_GRU, and mV16HR_GRU outperformed mV11_GRU, mV16_GRU, and TSWP. This indicated that the network architectures of the high-resolution network (i.e., HR_GRU model) and the convolutional layers using high-to-low resolution structure with combined recurrent GRU layers (i.e., mVHR_GRU-based models) were more suitable for extracting spatial wind field features than those of traditional models of VGGNets and the TSWP (a two-step wind-wave neural network-based prediction model).
Moreover, they improved the spatial data required for wind-wave estimation. Compared with the TSWP model, the IR of the CE of mVHR_GRU at the Longdong and Guishandao buoys was 13% to 18% and 13% to 15%, respectively. Second, for the various HR-based models (HR_GRU, mV11HR_GRU, and mV16HR_GRU), various numbers of convolution operation layers were required at the two testing sites to optimize the prediction efficiency. mV16HR_GRUand mV11HR_GRU had the highest CE at the Longdong and Guishandao buoys, respectively. Third, comparison between the NUM-AI-based prediction model and the numerical SWAN model revealed that the SWAN model produced greater errors than the AI-based mV11_GRU, mV16_GRU, mV11HR_GRU, and mV16HR_GRU models. We conclude that our NUM-AI-based wind-wave prediction models yield results of reasonable accuracy.
In the future, this study suggests that the wind speed could be generated from WRF and NUM-AI models, and then the SWAN-based wave results obtained using the inputs of WRF-based and NUM-AI-based results could be compared against the buoy's measurements. Second, in this paper the length of the forecast horizon was from 1 to 6 h. However, the forecast horizon is short. Being close to a wind-wave nowcast, the length of horizon was recommended to extend to three-day or one-week forecasts. Third, regarding input attributes of formulating NUM-AI-based model, this study used the WRF winds (u10, v10) and some in situ observational data (i.e., typhoon information, wind data from gauges, and wave data from buoys). In a future study, the wave variables (such as significant wave height, the peak period, and the mean direction at the peak period) and the whole wave spectrum could be used as extra inputs in order to improve the wave predictions.

Data Availability Statement:
The related data were provided by Taiwan's CWB, which are available at https://rdc28.cwb.gov.tw/ (accessed on 1 October 2021). The Final Operational Global Analysis data were obtained from the Global Data Assimilation System of the US government's National Center for Environmental Protection, which is available at https://rda.ucar.edu/datasets/ds083.2/ (accessed on 21 August 2020).