Next Article in Journal
Research on Time–Frequency Joint Equalization Algorithm for Underwater Acoustic FBMC/OQAM Systems
Previous Article in Journal
DNN Predictive Model for Estimating the Metacetric Height of Small Fishing Vessels in South Korea at the Early Design Stages
 
 
Font Type:
Arial Georgia Verdana
Font Size:
Aa Aa Aa
Line Spacing:
Column Width:
Background:
Article

Spatiotemporal Deep Learning to Forecast Storm Surge Water Levels and Storm Trajectory: Case Study Hurricane Harvey

1
Department of Mechanical and Manufacturing Engineering, Tennessee State University, Nashville, TN 37209, USA
2
School of Physics and Electronic Information, Huaibei Normal University, Huaibei 235000, China
3
Department of Computer Science, Tennessee State University, Nashville, TN 37209, USA
4
Department of Physics & Mathematics, Tennessee State University, Nashville, TN 37209, USA
*
Author to whom correspondence should be addressed.
J. Mar. Sci. Eng. 2025, 13(9), 1780; https://doi.org/10.3390/jmse13091780
Submission received: 14 August 2025 / Revised: 2 September 2025 / Accepted: 12 September 2025 / Published: 15 September 2025
(This article belongs to the Section Physical Oceanography)

Abstract

Using Hurricane Harvey as a case study, this paper uses the hurricane track, wind velocity and pressure, bathymetry, Manning’s n coefficients, tidal forcing, and storm surge results generated by the ADCIRC+SWAN model as input to construct a uniform spatiotemporal deep learning model for storm surge forecasting. The model transforms inputs into embeddings and performs feature fusion and extraction. The regression layer of the model outputs the predicted values of storm surge water elevation, station water level time series, and hurricane tracks with attributes. To analyze the model’s adaptability and robustness as a surrogate model to ADCIRC, ablation experiments are conducted on up to 10 input variables to investigate the impact of various inputs on the results. Heat maps between 3, 6, 9, and 12 h horizon prediction and targets revealed excellent performance for the large scale of nodes and multiple inputs on the training set, validation set, and test set as the surrogate model. When the model is used to forecast water levels of 12 observation stations, the 9 h forecasting horizon is generally equal to or better than the ADCIRC simulation results. When the model is used to predict hurricane tracks and attributes, the 12 h forecast horizon is relatively close to the observed values, achieving satisfactory results. This model is developed and tested using Hurricane Harvey data and storm surge results as a case study. To develop a generalized prediction model would require a large amount of data and storm surge results from many hurricanes.

1. Introduction

Hurricane Harvey was the second costliest and one of the most devastating natural disasters in U.S. history. The unprecedented rainfall and widespread flooding resulted in 107 deaths nationwide and $158.8 billion in damages, according to the National Oceanic and Atmospheric Administration [1]. Before landfall, the National Hurricane Center (NHC) had accurately predicted the track and initial landfall north of Corpus Christi [2]. Even so, it was challenging for models to predict its final strength, although the improved methods using satellite data later correctly forecasted hurricane strength [3]. Medium-range forecast models also forecasted a week in advance that Harvey would stall along the Texas coast and produce extreme rainfall [4]. Forecasting the intensity, duration, and extent of associated rainfall was challenging. Various factors, such as wave height in coastal areas, river discharges, and rainfall, determine the degree and scope of flooding, which is also the leading cause of loss of life and property. Therefore, using Hurricane Harvey data to explore a model to study the trajectory and water levels of storm surges caused by hurricanes is of great significance.
The prediction of hurricane storm surges and trajectories span a broad spectrum of approaches: statistical, numerical, and machine learning models. Statistical models, pioneered by extreme value theory [5] and later extended through combined probability methods [6], quantify surge risk from factors such as wind speed, direction, and pressure.
Predominant numerical hydrodynamic modeling is the high-resolution physics-based simulations of surge, waves, and coastal hydrodynamics [7,8]. Representative models such as SLOSH (Sea, Lake, and Overland Surges from Hurricanes) [9], ADCIRC (ADvanced CIRCulation) [10], Delft3D [11], and SCHISM (Semi-implicit Cross-scale Hydroscience Integrated System Model) [12]. These models solve fluid dynamics equations with finite element or difference methods for operational forecasting and hindcasts with numerous case studies (e.g., Ike, Harvey), validating performance against observations [13,14]. Event-based analyses complement these efforts by linking observed coastal flooding and atmospheric dynamics [15]. Other studies focus on nonlinear tide–surge–wave interactions and coupled atmosphere–ocean frameworks, both critical for accurately resolving surge extremes [16,17]. Advances in data assimilation further enhance operational forecast skill by integrating buoy and tide gauge observations [18].
Machine learning models have been used as real-time surrogates for high-fidelity numerical models such as ADCIRC, which require substantial computational resources [19,20]. They have also been integrated with models such as SLOSH and ADCIRC to enhance reliability [21]. Surrogate modeling methods simplify the mapping between input parameters and outputs, mimicking the results of full numerical simulations. Gaussian process regression (Kriging models) [22], for example, has been applied to water level prediction in the Venetian Lagoon [23], real-time hurricane scenario assessment [24], and coastal flood peak estimation [25].
A wide variety of deep learning models have been developed for hurricane track forecasting, storm surge prediction, and surrogate modeling of computationally intensive numerical systems. In time-series forecasting approaches, various Recurrent Neural Networks (RNNs) and their variants have been applied to hurricane track and water-level prediction tasks. Alemany et al. developed an RNN to forecast hurricane tracks every six hours using data from the National Hurricane Center (NHC), including latitude, longitude, wind speed, and pressure [26]. To mitigate the vanishing gradient problem common in RNNs, Long Short-Term Memory (LSTM) networks have been explored, as seen in the prediction of groundwater responses to storm events in Norfolk, Virginia [27]. Hybrid models combining RNNs with Deep Neural Networks (DNNs) have been used for fast tide predictions [28]. At the same time, XGBoost-LSTM architectures have been applied to predict wave heights and periods under varying lake conditions [29]. In spatial-temporal deep learning approaches, to capture spatial dependencies alongside temporal dynamics, researchers have increasingly employed Convolutional Neural Networks (CNNs) and hybrid models. Combinations such as CNN-LSTM [30,31], LSTM-CNN, and ConvLSTM [32,33], improve storm surge prediction by modeling spatial-temporal interactions. Some models incorporate attention mechanisms to highlight critical features and enhance prediction accuracy [34]. Similarly, Temporal Convolutional Networks (TCNs), which use dilated convolutions to capture long-range dependencies, have been applied to sequence modeling tasks such as water quality prediction [35]. In graph-based neural networks approaches, when storm surge prediction involves multiple geographically distributed tidal stations, Graph Neural Networks (GNNs) offer a powerful framework. Zhang et al. (2023) proposed a Graph Convolutional Recurrent Network (GCRN) that jointly learns spatial and temporal dependencies to forecast tide levels across a network of stations, demonstrating improved performance over purely temporal models [36].
Despite these advances, most AI-based models remain constrained in scope, often focusing on predicting tidal levels or hurricane trajectories at specific stations or regions. To date, no AI approach has fully replicated or replaced high-resolution numerical models that simulate dynamics across thousands to millions of grid nodes with multiple interacting physical processes.
Furthermore, most existing approaches do not integrate numerical models with deep learning into a unified framework that can both approximate numerical simulations and predict water levels at observation stations, as well as hurricane tracks and properties. To address this gap, we use Hurricane Harvey as a case study. Hurricane Harvey is selected as the case study due to our extensive experience applying ADCIRC+SWAN to hindcast its storm surges and the availability of a comprehensive simulation dataset. Moreover, the water elevation time series data for buoy stations are readily available from NOAA, and hurricane track data are accessible from the National Hurricane Center, providing a solid data foundation for model development.
The study domain, shown in Figure 1, represents the surrogate region for ADCIRC, with the red box highlighting the area of interest and the enlarged inset displaying the distribution of twelve observation stations. These stations are used to forecast water level variations, while hurricane tracks and attributes are marked for trajectory prediction. Given the inherently temporal nature of the data, we introduce a novel model—Spatiotemporal Embeddings with Bottleneck ResNet and Channel Attention (STERCHA)—to capture and predict these complex dynamics. This approach is inspired by the streamlined and practical aspects of the STID model [37] in traffic forecasting. The modeling process begins by creating embeddings for all input data, followed by data fusion, feature extraction, and final regression analysis. This design aims to handle complex spatiotemporal data while maintaining computational simplicity efficiently. To achieve unified modeling, the model uses storm surge results from ADCIRC+SWAN and observational data from Hurricane Harvey to attempt to complete the following functions:
  • Establish a surrogate model of ADCIRC+SWAN (multiple nodes with multiple features). The model utilizes the storm surge results of Hurricane Harvey, generated by ADCIRC+SWAN, as data and selects an area in the Gulf of Mexico near Houston as the region of interest to predict up to 12 h of water elevation. Input time-varying meteorological forcing data (like past wind speed and pressure, water elevations, and velocities), static nodal geographic and topographic data (i.e., bathymetry and Manning’s n), and the output is the time-varying water elevation of each node. Compare the target with the predictions for the next 3, 6, 9, and 12 h.
  • Forecast water level changes at observation stations (multiple nodes with a single feature). The study selected water level changes at 12 observation stations in the waters near Texas for up to 9 h of prediction and compared them with the observed data, as well as with ADCIRC+SWAN model results.
  • Forecast the trajectory and attributes of Hurricane Harvey (single node with multiple features), including 1, 6, and 12 h prediction of latitude, longitude, wind velocity, atmospheric pressure, categories, storm speed, and storm direction, and compare with the observed data.
Section 2 describes the relevant research datasets and sources, feature engineering, and model details, and Section 3 contains the predicted results and analysis.

2. Methodology

2.1. Datasets

The performance of the unified model was evaluated using three datasets: (1) as a surrogate for ADCIRC+SWAN, it predicted water elevations based on ADCIRC+SWAN simulations of Hurricane Harvey; (2) as a water-level time series forecasting tool for observation stations, it generated predictions using historical buoy data recorded during Hurricane Harvey; and (3) as a predictive model for hurricane tracks and characteristics, it employed historical parameters derived from the storm’s trajectory.

2.1.1. Storm Surge Results from ADCIRC+SWAN

To generate Harvey storm surge results [38], the ADCIRC+SWAN [10] model was used. The ADCIRC+SWAN model was implemented using the standard, well-vetted Hurricane Surge On-Demand Forecasting System (HSOFS) mesh, which covers the U.S. Atlantic and Gulf coasts with approximately 3.6 million nodes and 1.8 million elements, offering coastal resolutions of ~150–500 m. The bathymetry data is included in the mesh file and is usually extracted from different sources, including Digital Elevation Models (DEMs) and Light Detection and Ranging (LiDAR) data. The model was run in spherical coordinates for 53.5 days to establish tidal equilibrium, with tidal forcing applied at the open Atlantic boundary. Wind and pressure forcing were provided by Ocean Weather Inc.’s (OWI) reanalysis using the IOKA method. By applying time-varying tidal conditions, zero-flux boundary conditions, and weir boundary conditions, the model produces time-dependent storm surge water elevation and velocity, maximum water elevation, significant wave height, average wave period, and other relevant parameters. The model details can be found in [38]. The model mesh comprises 267,597 nodes, each with geographical coordinates (longitude and latitude), bathymetry, and Manning’s n. Figure 2a displays the original grid points in the Gulf of Mexico around Houston. After eliminating dry nodes whose values were assigned the value −99,999 during the ADCIRC+SWAN simulation, 88,901 nodes were retained in the data set, as shown in Figure 2b. When making quick predictions, we selected 10% or 8810 nodes at equal intervals along with latitude, as shown in Figure 2c. A total of 180 instances of time series with a 1 h sampling period cover the period from the hurricane appearing in the window to landfall along the Texas coast, that is, from 23 August 2017, 12:00:00 PM (all times are UTC) to 28 August 2017, 12:00:00 AM. To avoid overfitting, spline interpolation with 3–7 points between two data points in chronological order is used to obtain the best possible prediction results.

2.1.2. Buoy Data

To predict water level changes at observation stations, we selected the historical records of the National Data Buoy Center [39] for the 12 observation stations near Texas where Hurricane Harvey passed by. Table 1 lists the station names and geolocation of these observatories. We have annotated this information in the zoomed-in area of Figure 1. The dataset spans a period of 48 h preceding the hurricane’s landfall. Specifically, the time series for each observation station spans from 23 August 2017, at 00:00:00 to 31 August 2017, at 23:54:00, with measurements recorded at 6 min intervals. We also used the 15 min sampling data generated by the ADCIRC+SWAN simulation to compare the prediction quality of our model.

2.1.3. Track Data

The data that predict Hurricane Harvey’s track and its attributes come from the International Best Track Archive for Climate Stewardship (IBTrACS) [40,41], which includes all data from Harvey starting on 16 August 2017, 06:00:00 and ending on 2 September 2017, 00:12:00. Sampling points every 3 h with hurricane severity are geographically marked in Figure 1. We selected 7 features of data, including Latitude, Longitude, Wind Speed, Atmosphere pressure, Saffir-Simpson Hurricane Wind Scale (SSHS), Storm Speed, and Storm Direction. Linear interpolation is applied to expand the data sample every hour to ensure sufficient training data.

2.2. Feature Engineering

All three datasets can be represented as multi-node, multivariate spatiotemporal series evolving with hurricane progression. Accordingly, a single model can be applied to them, with only input feature transformations required to match the desired outputs. The forecasting follows a time-series approach, leveraging historical data in each case to predict future outcomes. The spatiotemporal series inputs and outputs are processed using a sliding window technique [42], which involves moving the window forward one step at a time, creating a rolling sequence of input and output time steps. The input time steps, also called lag features [43], represent historical data points that have shifted backward in time, allowing the model to consider past values when making predictions. Conversely, the output time steps, known as the “horizon”, involve shifting the series forward to access future values [44]. While a single feature can yield reasonable predictions for spatiotemporal series, incorporating multiple features greatly improves accuracy [45,46]. Our study extends beyond time-series features to include spatial, static node, and temporal features, enabling a more comprehensive analysis and more robust predictions.
  • Time Series: The three datasets treat time series in distinct ways: (1) As a surrogate for ADCIRC+SWAN, the model relies primarily on historical water elevation, including wave contribution with it, to forecast future levels, while incorporating additional features—such as water velocity, wind velocity, and atmospheric pressure—to capture causal relationships and improve accuracy. (2) As an observation-station water level predictive model, it predicts future water levels from each station’s historical time series. Because all stations are affected by the same hurricane, they are connected, forming a multi-node, single-feature time series. (3) As a hurricane trajectory and property prediction model, the time-series features are sourced from historical and geographical data, including wind speed, pressure, forward speed, and direction. A hurricane is represented as a single node whose location and attributes evolve over time, with both inputs and outputs expressed as time series of this multi-feature node.
  • Spatial Feature: Only the datasets produced by the ADCIRC+SWAN simulation and the buoy records from observation stations contain this feature. As demonstrated in Figure 2, the ADCIRC mesh explicitly defines the geolocation of each node, whereas Table 1 clearly specifies the geographical coordinates of the observation stations. These location data cannot be directly incorporated into the model; instead, they must influence the model parameters through learnable embeddings [47].
  • Static Features: Only used in the storm surge data of ADCIRC+SWAN. The nodal attributes, such as bathymetry and Manning’s n are two essential static input conditions for accurate hydrodynamic simulations. These static factors do not change over time, we use them as coefficients to transform them into learnable embeddings through the fully connected layer.
  • Temporal Feature This feature is applied in the surrogate model of ADCIRC+SWAN and in forecasting water levels at observation stations, drawing on approaches originally introduced in traffic forecasting based on temporal variations in traffic volume [48]. Similarly, fluctuations in water level are partly driven by regular tidal cycles, which depend on the time of day. Experimental results demonstrate that model accuracy can be further enhanced by constructing a learnable matrix to generate temporal embeddings that capture the influence of tides on water levels across different times of the day.
Table 2 shows the above features in the different dataset for the model. In the results section, the contribution of varying input features will be analyzed in the input ablation experiments.

2.3. Model

Figure 3 shows the architecture of the unified model. Dataset-specific input and embedding layers (green) that were generated through feature engineering for the datasets listed in Table 2 feed into shared fusion and feature extraction layers, regression layer (orange), with outputs displayed at the top.
The embedding layers process inputs into distinct embeddings depending on the dataset, shown on the left, bottom, and right of Figure 3. In the ADCIRC surrogate model (bottom), inputs include time-series data (water elevation, currents, wind, and pressure), static features (Manning’s n and bathymetry), spatial node information, and tidal variations. Note that the water elevation already includes wave contributions since the data was generated from the ADICRC+SWAN model. These data are embedded into 512-, 64-, 128-, and 32-dimensional vectors, respectively, with sizes optimized to balance complexity and accuracy [49]. For water-level forecasting at observation stations (left), the smaller dataset of 12 nodes with a single feature reduces spatial and time-series embeddings to 16 and 32 dimensions, respectively. For hurricane tracks (right), a multi-layer CNN extracts temporal features directly without embedding.
The embeddings are processed by a residual-channel attention module (Figure 4, right), where residual blocks perform downsampling and channel attention emphasizes key features. This CNN-based design reduces memory demand and runs efficiently on GPUs. Graph-based approaches (e.g., STGCN [50], Graph WaveNet [51], BigST [52]) were tested but were found to be unsuitable for large-scale, multi-feature data.
Finally, the regression layer applies 2D convolution for decoding and dimensional transformation. Outputs (top of Figure 3) include accurate 12 h water elevation predictions, a 9 h water-level forecast at an observation station, and 1, 6, and 9 h hurricane trajectory forecasts. Figure 4 further outlines the program flow of the surrogate model.
Given a multi-variable time series X t T I 1 : t with T I time steps before time t , our goal is to predict the subsequent T O time steps in the future after t via training the parameters θ of the model M ( · ) . It can be denoted as
X t T I 1 ,   , X t   M θ Y t + 1 ,   ,   Y t + T O
where X i ϵ R T I × N × C I and Y i ϵ R T O × N × C O are the input and output of i t h time step, N is the number of nodes in the space, C I is the input feature of each node, and C O is the output feature of each node.

2.3.1. Embedding Layer

The left side of Figure 4 shows the embedding after different transformations or mappings of the inputs, and their features are described in detail in Section 2.2.
  • Time Series Embedding
This layer preserves the native information of the global time series. The six features of all nodes, including water elevation, water velocity ( u ,   v ) , wind velocity ( u ,   v ) , and atmospheric pressure, are mapped using a one-dimensional convolution with a 1 × 1 filter. The convolution input dimension, T I × C I , is the mixture of the input time step and the input features. combines the input time steps and features, while the output dimension corresponds to the embedding length L 1 , The resulting feature embedding for N nodes is given by Equation (2):
E I N P = C o n v 1 D X t T I + 1 : t ,     E I N P ϵ R L 1 × N
  • Topographic Embeddings
Bathymetry and Manning are the two static node-level factors—bathymetry and Manning’s n—are embedded through fully connected layers. The model transforms X B a t ϵ R N and X M a n ϵ R N into embedding of dimensions L 2 and L 3 respectively, as shown in Equations (3) and (4):
E B a t = F C X B a t ,     E B a t ϵ R L 2 × N
E M a n = F C X M a n ,     E M a n ϵ R L 3 × N
  • Temporal embedding
The temporal embedding encodes daily periodic effects from tidal cycles. For ADCIRC simulations, N d   = 24 corresponds to hourly sampling, while for observation-station water levels, N d = 240 reflects sampling every six minutes. For each time series t T I + 1 : t , all N nodes share the same temporal embedding, stacked as X T i d ϵ R N d × N . A learnable dictionary, T t i d ϵ R N d × L 4 , where L 4 is the length of embedding, maps timestamps to embeddings, producing E T i d ϵ R L 4 × N .
  • Spatial embedding
Spatial embeddings encode the geographic attributes of each node. Through continuous training, the model learns spatial relationships across nodes, represented as: E S p a ϵ R L 5 × N , where L 5 is the length of the embedding.
  • Concatenate
All embeddings are concatenated to form the unified input to the fusion and feature extraction layers, as expressed in Equation (5):
Z t i = E I N P E B a t E M a n E T i d E S p a ,     Z t i ϵ R L × N  
where L = L 1 + L 2 + L 3 + L 4 + L 5 .

2.3.2. Feature Fusion and Extraction Layer

In time-series analysis, 1D convolution aggregates information within a receptive field to capture local dependencies, trends, and patterns. To enhance hierarchical learning and expand the receptive field, we employ an l l a y e r residual network with attention. As shown in Figure 4 (right), the bottleneck structure compresses inputs into compact representations, emphasizing essential features. The l + 1 t h fusion layer is defined in Equation (6):
Z t i l + 1 = Z t i l + C h A t t n l B o t t l e n e c k l Z t i l
The module B o t t l e n e c k follows a ResNet design of three stacked 1D convolutions (Equation (7)): the first reduces dimensionality ( j = 1 ), the second learns spatial patterns ( j = 2 ), and the third restores dimensions ( j = 3 ):
Z t i l = L N j σ j C o n v 1 D j Z t i
Here, σ denotes the G E L U activation [53], and L N is layer normalization [54], which stabilizes training by reducing internal covariate shift. Dropout is applied in the first layer to prevent over-reliance on specific neurons.
Channel attention ( C h A t t n s ) further refines features by suppressing less informative channels [55]. Following the SENet structure [56], the operation is expressed as Equation (8):
Z t i = σ C o n v 1 D k 1 D G A P Z t i   Z t i
where ⊙ denotes element-wise multiplication, and 1DGAP is one-dimensional global average pooling [57]. GAP averages across nodes for each embedding dimension, generating global descriptors that are projected through a 1D convolution to capture cross-channel interactions. The kernel size k is empirically determined based on embedding dimension. The refined attention weights are then applied to the original embeddings to highlight critical features.

2.3.3. Regression Layer

The regression layer decodes embeddings into outputs Y ^ t : t + T O i R T O × N × C O , where T O is the forecast horizon, N is the number of nodes, and C O is the output feature. This is achieved by a two-dimensional convolution (Equation (9)):
Y ^ t : t + T O i = C o n v 2 D k Z t i ,     Z t i R L × N × 1
L   stands for embedding dimension. The 1 × 1 convolution kernel performs dimensional transformation and downsampling to align embeddings with the required output space.

2.4. Evaluation Metrics

In storm surge spatiotemporal series analysis, mean absolute error (MAE) is often used as the loss function, RMSE (Root Mean Square Error) quantifies how dispersed these residuals are, and MAPE (Mean Absolute Percentage Error) measures how far off predictions are on average to evaluate the model’s prediction performance.
M A E Y ^ , Y = 1 N T O i = 1 N j = 1 T O Y ^ j i Y j i
R M S E Y ^ , Y = 1 N T O i = 1 N j = 1 T O Y ^ j i Y j i 2
M A P E Y ^ , Y = 1 N T O i = 1 N j = 1 T O Y ^ j i Y j i Y j i × 100 %
where output Y ^ and the target Y , N is the number of nodes and T O is the output time steps.
In addition, an ADAM optimizer [58] is used to fine-tune a model’s parameters when minimizing the loss function because the decoupled weight decay can help better generalize complex, high-dimensional data. Meanwhile, a ReduceLROnPlateau callback function [59] is used as a decaying learning rate schedule, which reduces the learning rate after a certain number of steps without decreasing loss. An early stopping mechanism is also used to avoid overfitting. When validation does not improve with sufficient patience, it can be considered that the model’s performance is beginning to decline, and training should be stopped.

2.5. Experiment

The model is implemented with the deep learning frame Pytorch 2.4 [60]. To ensure the model has sufficient data for training while also balancing the model’s predictive capability with actual results, the datasets are split into the training set, validating set, and testing set according to the 3:3:4 ratio for the storm surge results from ADCIRC+SWAN, 3:3:4 ratio for the buoy data, 5:3:2 ratio for the track data and trained on an Intel (R) Xeon (R) Gold 5218R CPU @ 2.10GHz, 384G RAM computing server equipped with 4 NVIDIA RTX 3090Ti graphics cards. To improve the model’s prediction capability, sufficient data is needed before the peak surge arrives to train the model correctly. In the present study, we utilized the available data. The data are split as shown in Table 3.

3. Results and Discussion

3.1. Surrogate Model of ADCIRC+SWAN

The results of the ADCIRC+SWAN hurricane Harvey storm surge are used as the data for our model (see Table 3). Considering the computing capability of our servers, we implemented 7 interpolated points between two data points for the data of 88,091 nodes, reducing the time step interval from one hour to 8.6 min each time. At the same time, 84 (12 h) time steps of lags and horizons are used to test the long-term prediction capability. Note that to establish the model as a general prediction tool, it will require a large amount of storm surge results and data from many hurricanes, which is beyond the scope of the current study.
Figure 5a is the validation set on 26 August 2017, 00:00:00. The hurricane landed at 28.06° N, 96.78° W near San José Island, causing a surge height of 1.88 m. Figure 5b shows that the 6 h horizon prediction of Figure 5a, the highest water level is located at 28.03° N, 96.81° W, and the predicted value is 1.52 m. Judging from the difference map, Figure 5c, the significant difference is up to 0.39 m in the lower right corner of the eye of the storm due to the different geolocation between the target and prediction. In addition, an abnormal area that the model incurred appeared in the deep sea. This is the output zero mutation caused by the inappropriate initialization parameters of the model, which needs to be mitigated by selecting an appropriate learnable parameter initialization function. When the values change relatively smoothly, the predicted values are almost the same as the target; for example, the difference near the coast is almost close to zero.
Figure 5d is the test set on 28 August 2017, 19:00:00. Hurricane Harvey circled inland and moved southeastward back to the sea, reaching the center of the west waters of Houston at 28.4° N, 95.9° W, causing a water height of more than 0.5 m along the coast near Houston. Comparing the target Figure 5d and the 9 h horizon of predicted values Figure 5e, there is an average difference of 0.1 m in the areas of Figure 5f close to the coast and up to 0.3 m in some areas. In the deep sea, the difference in water height is close to 0.17m, water elevation was overpredicted in most areas.
Figure 5g is the test set on 29 August 2017, at 18:00:00. The hurricane moved to the location of 28.5° N, 94.2° W and was about to make landfall again, causing a water height of more than 0.5 m along the coast east of Houston. Comparing the target Figure 5g and the 12 h horizon of heat maps Figure 5h, Except for the coastal areas where the water level was slightly underestimated, most areas were 0.17–0.24 m above the target value.
Figure 5j shows the high water marks [61] during the hurricane’s movement on all time frames. From a direct comparison of the target Figure 5j and the 12 h prediction horizon Figure 5k, along the hurricane’s track, water elevations are mostly overestimated, with some areas even exceeding 0.5 m, except for the area near the landfall where water levels were underestimated by 0.1–0.3 m. Judging from the difference map, Figure 5l, on the right, the prediction is 0.1–0.2 m higher than the target in most areas. The extreme case occurred in the area where the hurricane first landed, with many places exceeding 0.3–0.6 m of difference, and a few places had a difference of 0.9 m. As analyzed above, the reason for the significant difference is that long-term forecasts will cause the eye of the storm to shift more. When we use 6 or 9 h forecasts for difference analysis, we can see that the shorter the forecast, the smaller the difference, which means that the prediction of high water marks is more accurate. This behavior is expected, and those results are not presented here for brevity.
However, from the perspective of actual forecasting needs, accurately predicting the location and magnitude of high water marks in coastal areas can help correctly infer the flooded areas, which is of great significance for organizing emergency evacuations, planning flood prevention preparations in advance, and planning the post-disaster recovery mission.

3.1.1. Ablation Experiment of the Input Layer

As the model is a surrogate model, the inputs and outputs of the ADCIRC+SWAN simulation model and other embeddings representing time and space features are applied to the inputs. All features are denoted as numbers to facilitate the following statements and the expression of feature combinations in Figure 6, as follows—0: Water Elevation, 1: Water Velocity U, 2: Water Velocity V, 3: Wind Velocity U, 4: Wind Velocity V, 5: Atmospheric Pressure, 6: Bathymetry, 7: Manning’s n, 8: Time in Day. Meanwhile, to ensure that the results are significant and comparable after inputting different input combinations, the training parameters were uniformly set after repeated experiments: Node Number = 88,091, Optimizer = ADAM, Scheduler = ReduceLROnPlateau Learning rate = 0.001, Learning rate decay = 0.97, Early stopping patience = 50, Input/output time steps = 36/36 (12 h), Batch = 8, Input embedding dim = 512, Spatial embedding dim = 128, Time in day embedding dim = 32, Bathymetry embedding dim = 48, Manning’s n embedding dim = 48, Interpolations = spline, 3 interpolated points. In addition, most experiments included spatial embedding as default except marked * after combining numbers. To observe the effect of each input on the model, we evaluate the contribution of multivariate inputs to the output (feature water elevation only) through the MAE and MAPE produced by different input combinations, as described below.
First, while predicting the future water elevation, the historical elevation plays the most crucial role. Therefore, when the input combination (345, 3456, 34567, 345678) on the right side of the figure does not include past water elevation, the MAPE on the test set reaches the highest error of 8%, indicating that the prediction effect is relatively poor. In addition, the input wind velocity (3, 4) and air pressure (5) are directly related to the output water elevation (0), which means that the output can still predict the output, but the accuracy will be significantly reduced. Therefore, when observing the combination (0345, 03457, 03458, 034578) with water elevation in inputs, its output accuracy is improved compared with the single water elevation in both the test set and the entire data set, especially the combination of 03458 has the best MAPE, which also shows that the historical water elevation, wind velocity, and atmospheric pressure, the tidal influence and spatial features play a significant role in the accuracy of the prediction. Another combination that performs well in MAE and MAPE is 0123458, in which a strong causal relationship exists between the historical water velocities (1,2) and the current water elevation.
Second, the learnable static embeddings bathymetry (6) and Manning’s n (7) did not positively contribute to the model with the present data set. By comparing the three groups of 012 and 01267, 012345 and 01234567, 0123458 and 012345678, 345 and 34567, the MAE and MAPE are worse when the combination has static features 6 and 7. In addition, compared to the combination of 012345678*, 0123458*, and 012345*, which are removed spatial embedding, they have the closest MAE and MAPE. This means that static features and spatial embedding are less significant in our model, data set, and region of interest near the coast. The most likely explanation is that both Manning’s n and bathymetry parameters were already included in the ADCIRC+SWAN simulation, and their effects are embedded in the water elevation data used in the study.
Third, let us look at the contribution of Time in Day (8). By comparing the three groups of 012345 and 0123458, 01267 and 012678, and 0345 and 03458, we can see that the addition of temporal embedding (8) makes a small contribution to improving accuracy. The contribution comes from the tidal hourly variations, which enters the data set though ADCIRC+SWAN results.
Finally, taking the single feature zero as the standard and adding other features does not worsen the model. It reflects the model’s stability, anti-interference, and robustness. On the other hand, due to the interdependence of the input variables, it becomes difficult to determine the contribution of each input variable to the model, an area that requires further exploration. It should be noted that the effects of all input variables are already embedded in the water elevation data through ADCIRC+SWAN simulation results. However, they may not have the best possible combination. Therefore, incorporating other input variables appears to enhance the overall results in the present model.

3.1.2. Module Substitution Experiments

The MAE and MAPE in Figure 6 represent the average error across all nodes. The details of the error of a single node can be presented in the form of a heat map (Figure 5) or a regression plot (Figure 7). To assess the performance of the fusion and extraction layer in our model, we retain the input/embedding layers and regression layer in the STERCHA, then substitute the layers with other modules in Figure 7a, including ConvNeXt V2 [62], STID [37] and ConvCANeXt. ConvNeXt V2 is a fully convolutional masked autoencoder framework with a new Global Response Normalization (GRN) layer. One part of this module consists of a depthwise convolution layer with a kernel of 7 and a pointwise fully connected layer, which can extract features from nodes in depth and breadth. GRN aims to increase the contrast and selectivity of channels. Compared to other models, STID has a straightforward and effective structure that only consists of a multiple-layer perceptron, the lightweighted structure fits the fast prediction very well. ConvCANeXt is another proposed module combining ConvNeXt, Bottleneck ResNet, and Channel Attention. The heavy structure make the prediction slow but it has the better long-term prediction. For details on the structure and performance of each model, readers are referred to the respective papers.
Figure 7 shows the regression plot for the target and 1, 6 and 12 h prediction horizon, including all time frames on the entire dataset. The dotted line is the ideal regression line, and the solid line is the predicted regression line; the angle between the two lines reflects the model’s bias. Prediction accuracy is reflected in the concentration of points around the regression line: the closer the points cluster to the line, the higher the accuracy; conversely, greater dispersion from the line indicates poorer predictive performance. Since most sea level values are concentrated near zero, nonlinear effects cause noticeable fluctuations around this baseline, resulting in the olive-shaped distribution observed in the 1 h forecasts across all models. As the forecast horizon extends, the model’s nonlinear fit gradually weakens, and the predicted values become increasingly scattered, reflecting the growing uncertainty in long-term sea level behavior. By analyzing our model STERHCA in Figure 7b, it is evident that most of the nodes in the 1 h lead forecast are concentrated near the regression line, indicating that the predicted value is very close to the target value. In the 6 and 12 h horizon of predictions, the regression line gradually tilts towards the horizontal line. This shows that larger positive and smaller negative values deviate more from the target, which is consistent with our previous analysis of the high water marks in Figure 5g–i. From the fit of the regression line, all models’ 1 h horizon predictions are close to the targets, but the 6 and 12 h horizon predictions deviate significantly. Especially the prediction with a 12 h horizon, when the target value tends toward the maximum and minimum values, the majority of the predicted values of model ConvNeXt V2 and STID tend toward zero. From the data distribution, STID best fits the 1 h horizon of prediction because all nodes are close to the regression line, it illustrates that this model has very good short-term prediction performance. In the 6 and 12 h prediction horizons, STERCHA and ConvCANeXt outperform ConvNeXt V2 and STID because the data distribution is concentrated along the regression line. In addition, the modules STERCHA and ConvCANeXt fluctuate considerably around zero in the 6 and 12 h horizon predictions, which is related to the unreasonable initialization of the model parameters and needs to be gradually adjusted to improve.

3.2. Water Level Prediction of Observation Stations

In the input of the observation station water level prediction on the left side of the model structure in Figure 3, we considered that the influence of the water level at each station mainly comes from the historical water level, the station’s geolocation, and the time point of daily sampling. The 2160 data points sampled every 6 min obtained by the National Buoy Center [39] were split into 0.3:0.3:0.4 (see Table 3). The purpose of this data split is to evaluate the model’s ability to predict peak surge during hurricane landfall more accurately. After repeated experimental tests, the model has a good prediction ability for the 9 h water level change. Figure 8 illustrates the changes in water level over the past 9 h, and other influencing factors are used to jointly predict the water level change in the next 9 h. We also added ADCIRC+SWAN simulation results for comparison.
The prediction starts from 25 August 2017, at 16:48:00, Hurricane Harvey made its first landfall at 27.8° N, 96.8° W on 26 August 2017, 00:00:00, causing several nearby observation stations such as Aransas, Port Aransas, Aransas Wild Refugee, and Ball Hall Pier to have water levels exceeding 1 m, and even Port Aransas approaching 2 m. After that, a more than 1.5 m surge also appeared at the Seadrift station. Aransas experienced a sensor failure during this period, resulting in missing records. Our model followed the missing records because the prediction was based on historical water level data. But ADCIRC+SWAN can simulate surge changes during this period because the mode is based on wind speed and pressure records. The nearby station, Rockport, rose slightly by 0.4 m before the hurricane landed and then dropped to −0.2 m after it landed. From the comparison of model predictions, the predictions at the Aransas Wild Refugee and Ball Hall Pier stations were more than 0.5 m lower than the actual values. In contrast, the prediction at Port Aransas was about 1 m different from the actual peak. The peak value of the Seadrift station model was also 0.5 m lower than the actual value, and the peak values of other stations also had a deviation of 0.1–0.4 m. Comparing the 1, 3, 6, and 9 h horizon of predictions, the longer the time steps, the worse the prediction accuracy and the peak following.
After that, the hurricane continued to move northward after making landfall, circled southeast of San Antonio, turned away from the coast, and returned to the sea. The hurricane’s landing and departure caused two floods in Port O’Connor. The two peaks of the 9 h prediction horizon are 0.25–0.4 m lower than the actual value, which was slightly better than the ADCIRC simulation result. When the hurricane moved northeastward from the sea and made landfall again, it passed the middle of Texas Point and Calcasieu Pass. Calcasieu Pass was in the direction of the hurricane’s advance, so the water level peaked at more than 0.7 m and then dropped sharply to less than −0.5 m. Our model followed the trend of rising and falling water levels from 30 August to 31 August; however, the 9 h prediction horizon is still 0.3–0.4 m lower than the actual value.
In contrast, the ADCIRC+SWAN model accurately captures the rising and falling water levels. Texas Point is opposite to the hurricane’s advance, so the water level kept falling. Our model has followed this station well, but the ADCIRC+SWAN model has deviated significantly. Looking at the Freeport Harbor and Galveston Bay Entrance, between the hurricane’s departure and re-landing, there are regular fluctuations in water levels when the hurricane arrives, and a significant drop occurs when it leaves. Our model follows the regular fluctuation trend well; however, the water level changes are lower than expected in the 9 h prediction horizon.

3.3. Prediction of Hurricane Track and Attributes

We divided the 415 interpolated data into three parts by the ratio 0.5:0.3:0.2 (see Table 3). The purpose of splitting the data set in this ratio is to create validation and test sets that fall within the range after landfall, ensuring that the predictions made are more meaningful. Figure 9 compares the forecast with 1, 6, and 12 h horizons and observations of the hurricane trajectory. When the hurricane first landed, the forecast with a 12 h horizon was close to the observation. The hurricane’s position while hovering inland varies greatly, with the 1 and 6 h forecasts being closer to San Antonio than the observed. In its subsequent movement, the 12 h forecast entered the sea further than the observation, and the hurricane’s trajectory was more southeasterly. Overall, the observed trajectory is closer to the average of the 1, 6, and 12 h forecasts. The hurricane track prediction model utilizes the longitude and latitude of the hurricane’s eye locations, which are trained separately in our model. Considering the model works like a black box, the variables may have unintentional nonlinear relations among them, so estimating which one is best in the 1, 6, and 12 h forecasts is meaningless.
Figure 10 compares the forecast with the 1, 6, and 12 h horizons and their original attributes, including wind speed, atmospheric pressure, category, storm speed, storm direction, latitude, and others. In general, the forecasts are very close to the observed values. The 1 h horizon forecast was close to the observed value, the 6 h horizon forecast was slightly higher than the actual value, and the 12 h horizon forecast was somewhat lower than the observed value. In the predictions of the Saffir–Simpson hurricane wind scale (SSHWS) and direction, there were oscillations or overshoots at the lowest and highest points of the sharp changes. The more time the forecast extends, such as the 12 h horizon, the stronger the oscillation or overshoot, indicating the model’s instability in capturing changes in long-term forecasts. This oscillation is also evident in the forecast of storm speed. For example, in the trough part of the 23rd, the minimum value for the 12 h horizon of the forecast is about 2.5 knots/s lower than the observed value.

3.4. Summary of Innovation

The application of three entirely different datasets to a single model represents the most significant innovation of this study, while also introducing several unprecedented challenges. As a surrogate model of ADCIRC+SWAN, our model is lightweight yet incorporates more nodes and input features than any existing spatiotemporal series models [63,64,65]. The causal relationships among these nodes and features are demonstrated in Figure 6 through extensive experimental comparisons. Nevertheless, fully uncovering these correlations requires novel methodologies and substantial experimental support, as they cannot be adequately represented using simple error metrics such as MAE and MAPE.
To provide a more intuitive assessment of model performance, we employed complementary visualization methods. Figure 5 presents a heatmap distribution comparison that highlights the effectiveness of forecasting, while Figure 7 illustrates the influence of different model structures on long-term predictions. These approaches offer a more precise and comprehensive perspective than traditional evaluations, which are limited to overall performance metrics such as MAE, MAPE, and accuracy. However, constructing large-scale node graphs of this nature remains technically challenging, which likely explains their rarity in existing model designs [66,67].
When applied to station-level water level predictions, the model accounts for the causal relationship between hurricane parameters and observed water levels by converting all inputs into embeddings, thereby enabling the learning of these dependencies. Although the accuracy achieved at the station level is somewhat lower than that of hurricane track predictions, it remains close to the performance level of ADCIRC+SWAN simulations [37]. For hurricane path and attribute predictions, the model utilizes a multi-layer CNN architecture, demonstrating high predictive accuracy.

4. Conclusions

This study successfully developed a surrogate model for the ADCIRC+SWAN hurricane simulation by constructing a spatiotemporal series model capable of forecasting both water levels around the hurricane area and at observation stations, as well as the hurricane tracks. Using Hurricane Harvey as a case study, the model’s accuracy was evaluated through simulation and observational data. The storm surge simulation results were generated using the ADCIRC+SWAN model with hurricane wind velocity and pressure, bathymetry, Manning’s n coefficients, tidal forcing, and wave interactions.
The proposed STERCHA model consists of an embedding layer, feature fusion and extraction layers, and a regression layer, with application-specific variations occurring only in the embedding layer. As a surrogate for ADCIRC+SWAN, STERCHA utilizes up to 10 input variables, including time series, temporal, spatial, and static embeddings, to predict water elevations up to 12 h in advance, visualizing the results as heat maps for different forecast horizons. The predictions closely align with observed water elevations, including high-water marks. An ablation study on input variables revealed that historical water elevations are the most critical predictor, while wind speed, atmospheric pressure, and past water speed also contribute to enhanced accuracy. Meanwhile, adding spatial embeddings and static attributes introduced feature redundancy, as those are already embedded in the water elevation data obtained from the ADCIRC+SWAN model. The study also tested alternative feature fusion and extraction modules, such as STID, ConvNeXt, and ConvCANeXt, confirming that STERCHA outperforms other models, particularly in long-term extreme-value predictions. Evaluating water level forecasts near the hurricane track showed accurate 9 h predictions, although peak values were slightly underestimated.
For hurricane track and attribute predictions, the model directly processed raw input rather than embeddings, using pre-landfall data to predict 12 h future tracks and storm attributes. This approach demonstrated good alignment with observed values, except in regions with abrupt data changes. In general, the model can complete training and prediction for large-scale nodes in a relatively short period and is well-suited for various scenarios in hurricane time series prediction, which has significant practical value. However, the study also found that in such short-term predictions, using limited data to achieve long-term predictions as much as possible is still a very challenging problem to explore. There is still much research to be conducted on the trade-offs in the input amount for prediction, further optimization and adjustment of the model structure, as well as for selecting a reasonable time step, which are the focus of future research.
To establish the surrogate model as a general prediction tool, a large amount of storm surge results and observational data from multiple hurricanes is required, which is beyond the scope of the current study.

Author Contributions

Conceptualization, J.H. and M.K.A.; methodology, J.H.; software, J.H.; validation, J.H., M.K.A., and M.D.S.; formal analysis, J.H., M.K.A.; investigation, J.H., M.K.A.; resources, M.K.A., L.O.; data curation, J.H., M.K.A.; writing—original draft preparation, J.H., M.K.A.; writing—review and editing, J.H., M.K.A.; visualization, J.H.; supervision, M.K.A., M.D.S.; project administration, M.K.A.; funding acquisition, M.K.A., L.O., All authors have read and agreed to the published version of the manuscript.

Funding

This research was partially funded by the (1) National Science Foundation—Excellence in Research (Grant No. 2000283); (2) National Aeronautics and Space Administration (NASA) under Grant No. 80NSSC23M0060, issued through the University Leadership Initiative Program; and (3) Department of Energy—NNSA, award #DE-NA 0004051.

Data Availability Statement

The original contributions presented in this study are included in the article. Further inquiries can be directed to the corresponding author.

Acknowledgments

The authors thank Ouyang for providing access to the High-Performance Computing (HPC) environment.

Conflicts of Interest

The authors declare no conflicts of interest.

References

  1. NOAA’s National Weather Service. Major Hurricane Harvey—25–29 August 2017. Available online: https://www.weather.gov/crp/hurricane_harvey (accessed on 26 January 2025).
  2. Freedman, A. As Texas Flooded, Meteorologists Felt Helpless as Dire Forecasts Were Realized. Mashable. 30 August 2017. Available online: https://mashable.com/article/harvey-flood-left-forecasters-helpless-accurate (accessed on 11 September 2025).
  3. Kubarek, D. Data Assimilation Method Offers Improved Hurricane Forecasting. PennState. 15 August 2019. Available online: https://www.psu.edu/news/research/story/data-assimilation-method-offers-improved-hurricane-forecasting (accessed on 23 March 2025).
  4. Schumacher, R. What Made the Rain in Hurricane Harvey so Extreme? The Conversation. 29 August 2017. Available online: https://theconversation.com/what-made-the-rain-in-hurricane-harvey-so-extreme-83137 (accessed on 23 March 2025).
  5. Jenkinson, A.F. The frequency distribution of the annual maximum (or minimum) values of meteorological elements. Q. J. R. Meteorol. Soc. 1955, 81, 158–171. [Google Scholar] [CrossRef]
  6. Hsu, C.-H.; Olivera, F.; Irish, J.L. A hurricane surge risk assessment framework using the joint probability method and surge response functions. Nat. Hazards 2018, 91, 7–28. [Google Scholar] [CrossRef]
  7. Fleming, J.G.; Fulcher, C.W.; Luettich, R.A.; Estrade, B.D.; Allen, G.D.; Winer, H.S. A Real Time Storm Surge Forecasting System Using ADCIRC. In Estuarine and Coastal Modeling (2007); American Society of Civil Engineers: Reston, VA, USA, 2008; pp. 893–912. [Google Scholar]
  8. Hope, M.E.; Westerink, J.J.; Kennedy, A.B.; Kerr, P.C.; Dietrich, J.C.; Dawson, C.; Bender, C.J.; Smith, J.M.; Jensen, R.E.; Zijlema, M.; et al. Hindcast and validation of Hurricane Ike (2008) waves, forerunner, and storm surge. J. Geophys. Res. Oceans 2013, 118, 4424–4460. [Google Scholar] [CrossRef]
  9. Jelesnianski, C.P.; Chen, J.; Shaffer, W.A. SLOSH: Sea, Lake, and Overland Surges from Hurricanes; U.S. Department of Commerce’s National Oceanic and Atmospheric Administration’s National Weather Service: Silver Spring, MD, USA, 1992; Volume 48. [Google Scholar]
  10. Luettich, R.A., Jr.; Westerink, J.J.; Scheffner, N.W. ADCIRC: An Advanced Three-Dimensional Circulation Model for Shelves Coasts and Estuaries, Report 1: Theory and Methodology of ADCIRC-2DDI and ADCIRC-3DL; Dredging Research Program Technical Report DRP-92-6; US Army Engineers Waterways Experiment Station: Vicksburg, MS, USA, 1992; 137p. [Google Scholar]
  11. Roelvink, J.A.; Van Banning, G. Design and development of DELFT3D and application to coastal morphodynamics. In Hydroinformatics; Balkema: Rotterdam, The Netherlands, 1994; pp. 451–455. [Google Scholar]
  12. Zhang, Y.J.; Ye, F.; Stanev, E.V.; Grashorn, S. Seamless cross-scale modeling with SCHISM. Ocean Model. 2016, 102, 64–81. [Google Scholar] [CrossRef]
  13. Fernández-Cabán, P.L.; Alford, A.; Bell, M.J.; Biggerstaff, M.I.; Carrie, G.D.; Hirth, B.; Kosiba, K.; Phillips, B.M.; Schroeder, J.L.; Waugh, S.M.; et al. Observing Hurricane Harvey’s Eyewall at Landfall. Bull. Am. Meteorol. Soc. 2019, 100, 759–775. [Google Scholar] [CrossRef]
  14. Wing, O.E.; Sampson, C.C.; Bates, P.D.; Quinn, N.; Smith, A.M.; Neal, J.C. A flood inundation forecast of Hurricane Harvey using a continental-scale 2D hydrodynamic model. J. Hydrol. X 2019, 4, 100039. [Google Scholar] [CrossRef]
  15. Costa, W.; Bryan, K.R.; Stephens, S.A.; Coco, G. A regional analysis of tide-surge interactions during extreme water levels in complex coastal systems of Aotearoa New Zealand. Front. Mar. Sci. 2023, 10, 1170756. [Google Scholar] [CrossRef]
  16. Olabarrieta, M.; Warner, J.C.; Armstrong, B.; Zambon, J.B.; He, R. Ocean–atmosphere dynamics during Hurricane Ida and Nor’Ida: An application of the coupled ocean–atmosphere–wave–sediment transport (COAWST) modeling system. Ocean Model. 2012, 43, 112–137. [Google Scholar] [CrossRef]
  17. Spicer, P.; Huguenard, K.; Ross, L.; Rickard, L.N. High-frequency tide-surge-river interaction in estuaries: Causes and implications for coastal flooding. J. Geophys. Res. Oceans 2019, 124, 9517–9530. [Google Scholar] [CrossRef]
  18. Butler, T.; Altaf, M.U.; Dawson, C.; Hoteit, I.; Luo, X.; Mayo, T. Data Assimilation within the Advanced Circulation (ADCIRC) Modeling Framework for Hurricane Storm Surge Forecasting. Mon. Weather Rev. 2012, 140, 2215–2231. [Google Scholar] [CrossRef]
  19. Gharehtoragh, M.A.; Johnson, D.R. Using surrogate modeling to predict storm surge on evolving landscapes under climate change. NPJ Nat. Hazards 2024, 1, 1–9. [Google Scholar] [CrossRef]
  20. Kim, S.-W.; Melby, J.A.; Nadal-Caraballo, N.C.; Ratcliff, J. A time-dependent surrogate model for storm surge prediction based on an artificial neural network using high-fidelity synthetic hurricane modeling. Nat. Hazards 2014, 76, 565–585. [Google Scholar] [CrossRef]
  21. Ramos-Valle, A.N.; Curchitser, E.N.; Bruyère, C.L.; McOwen, S. Implementation of an Artificial Neural Network for Storm Surge Forecasting. J. Geophys. Res. Atmos. 2021, 126, e2020JD033266. [Google Scholar] [CrossRef]
  22. Krige, D.G. A statistical approach to some basic mine valuation problems on the Witwatersrand. J. S. Afr. Inst. Min. Metall. 1951, 52, 119–139. [Google Scholar]
  23. Babovic, V.; Keijzer, M. A Gaussian process model applied to prediction of the water levels in Venice lagoon. In Proceedings of the Congress-International Association for Hydraulic Research, Beijing, China, 16–21 September 2001; pp. 509–513. [Google Scholar]
  24. Jia, G.; Taflanidis, A.A. Kriging metamodeling for approximation of high-dimensional wave and surge responses in real-time storm/hurricane risk assessment. Comput. Methods Appl. Mech. Eng. 2013, 261–262, 24–38. [Google Scholar] [CrossRef]
  25. Yang, K.; Paramygin, V.A.; Sheng, Y.P. A Rapid Forecasting and Mapping System of Storm Surge and Coastal Flooding. Weather. Forecast. 2020, 35, 1663–1681. [Google Scholar] [CrossRef]
  26. Alemany, S.; Beltran, J.; Perez, A.; Ganzfried, S. Predicting Hurricane Trajectories Using a Recurrent Neural Network. Proc. AAAI Conf. Artif. Intell. 2019, 33, 468–475. [Google Scholar] [CrossRef]
  27. Bowes, B.D.; Sadler, J.M.; Morsy, M.M.; Behl, M.; Goodall, J.L. Forecasting Groundwater Table in a Flood Prone Coastal City with Long Short-term Memory and Recurrent Neural Networks. Water 2019, 11, 1098. [Google Scholar] [CrossRef]
  28. Igarashi, Y.; Tajima, Y. Application of recurrent neural network for prediction of the time-varying storm surge. Coast. Eng. J. 2021, 63, 68–82. [Google Scholar] [CrossRef]
  29. Hu, H.; van der Westhuysen, A.J.; Chu, P.; Fujisaki-Manome, A. Predicting Lake Erie wave heights and periods using XGBoost and LSTM. Ocean Model. 2021, 164, 101832. [Google Scholar] [CrossRef]
  30. Chen, R.; Wang, X.; Zhang, W.; Zhu, X.; Li, A.; Yang, C. A hybrid CNN-LSTM model for typhoon formation forecasting. GeoInformatica 2019, 23, 375–396. [Google Scholar] [CrossRef]
  31. Zhen, Z.; Li, Z.; Wang, F.; Xu, F.; Li, G.; Zhao, H.; Ma, H.; Zhang, Y.; Ge, X.; Li, J. CNN-LSTM Networks Based Sand and Dust Storms Monitoring Model Using FY-4A Satellite Data. IEEE Trans. Ind. Appl. 2024, 60, 5130–5141. [Google Scholar] [CrossRef]
  32. Adeli, E.; Sun, L.; Wang, J.; Taflanidis, A.A. An advanced spatio-temporal convolutional recurrent neural network for storm surge predictions. Neural Comput. Appl. 2023, 35, 18971–18987. [Google Scholar] [CrossRef]
  33. Shi, X.; Chen, Z.; Wang, H.; Yeung, D.-Y.; Wong, W.-K.; Woo, W.-C. Convolutional LSTM network: A machine learning approach for precipitation nowcasting. In Proceedings of the NIPS’15: Proceedings of the 29th International Conference on Neural Information Processing Systems, Montreal, QC, Canada, 7–12 December 2015; Volume 28, pp. 802–810. [Google Scholar]
  34. Yang, J.; Zhang, T.; Zhang, J.; Lin, X.; Wang, H.; Feng, T. A ConvLSTM nearshore water level prediction model with integrated attention mechanism. Front. Mar. Sci. 2024, 11, 1470320. [Google Scholar] [CrossRef]
  35. Tian, Q.; Luo, W.; Guo, L. Water quality prediction in the Yellow River source area based on the DeepTCN-GRU model. J. Water Process. Eng. 2024, 59, 105052. [Google Scholar] [CrossRef]
  36. Zhang, X.; Wang, T.; Wang, W.; Shen, P.; Cai, Z.; Cai, H. A multi-site tide level prediction model based on graph convolutional recurrent networks. Ocean Eng. 2023, 269, 113579. [Google Scholar] [CrossRef]
  37. Shao, Z.; Zhang, Z.; Wang, F.; Wei, W.; Xu, Y. Spatial-temporal identity: A simple yet effective baseline for multivariate time series forecasting. In Proceedings of the 31st ACM international Conference on Information & Knowledge Management, Atlanta, GA, USA, 17–21 October 2022; pp. 4454–4458. [Google Scholar]
  38. Shamsu, M.; Akbar, M. Understanding the Effects of Wind Intensity, Forward Speed, and Wave on the Propagation of Hurricane Harvey Surges. J. Mar. Sci. Eng. 2023, 11, 1429. [Google Scholar] [CrossRef]
  39. NOAA National Data Buoy Center (NDBC). Historical Meteorological Data Search. Available online: https://www.ndbc.noaa.gov/histsearch.php (accessed on 7 October 2024).
  40. Kenneth, R.; Howard, J.; James, P.; Michael, C.; Carl, J. International Best Track Archive for Climate Stewardship (IBTrACS) Project, Version 4; NOAA National Centers for Environmental Information: Asheville, NC, USA, 2019. [Google Scholar] [CrossRef]
  41. Knapp, K.R.; Kruk, M.C.; Levinson, D.H.; Diamond, H.J.; Neumann, C.J. The International Best Track Archive for Climate Stewardship (IBTrACS): Unifying tropical cyclone best track data. Bull. Am. Meteorol. Soc. 2010, 91, 363–376. [Google Scholar] [CrossRef]
  42. Chu, C.-S.J. Time series segmentation: A sliding window approach. Inf. Sci. 1995, 85, 147–173. [Google Scholar] [CrossRef]
  43. Frank, R.J.; Davey, N.; Hunt, S.P. Time series prediction and neural networks. J. Intell. Robot. Syst. 2001, 31, 91–103. [Google Scholar] [CrossRef]
  44. Girard, A.; Rasmussen, C.; Candela, J.Q.; Murray-Smith, R. Gaussian process priors with uncertain inputs application to multiple-step ahead time series forecasting. In Proceedings of the Advances in Neural Information Processing Systems 15 (NIPS 2002), Vancouver, BC, Canada, 9–14 December 2002; Volume 15. [Google Scholar]
  45. Liu, S.; Poccia, S.R.; Candan, K.S.; Sapino, M.L.; Wang, X. Robust Multi-Variate Temporal Features of Multi-Variate Time Series. ACM Trans. Multimed. Comput. Commun. Appl. 2018, 14, 1–24. [Google Scholar] [CrossRef]
  46. Stockinger, N.; Dutter, R. Robust time series analysis: A survey. Kybernetika 1987, 23, 3–88. [Google Scholar]
  47. Zhou, J.; Liu, L.; Wei, W.; Fan, J. Network Representation Learning: From Preprocessing, Feature Extraction to Node Embedding. ACM Comput. Surv. 2022, 55, 1–35. [Google Scholar] [CrossRef]
  48. Pavlyuk, D. Feature selection and extraction in spatiotemporal traffic forecasting: A systematic literature review. Eur. Transp. Res. Rev. 2019, 11, 6. [Google Scholar] [CrossRef]
  49. Yin, Z.; Shen, Y. On the Dimensionality of Word Embedding. arXiv 2018, arXiv:1812.04224. [Google Scholar] [CrossRef]
  50. Haoyu, H.; Mengdi, Z.; Min, H.; Fuzheng, Z.; Zhongyuan, W.; Enhong, C.; Hongwei, W.; Jianhui, M.; Qi, L. STGCN: A spatial-temporal aware graph learning method for POI recommendation. In Proceedings of the 2020 IEEE International Conference on Data Mining (ICDM), Sorrento, Italy, 7–20 November 2020; pp. 1052–1057. [Google Scholar]
  51. Wu, Z.; Pan, S.; Long, G.; Jiang, J.; Zhang, C. Graph WaveNet for Deep Spatial-Temporal Graph Modeling. arXiv 2019, arXiv:1906.00121. [Google Scholar]
  52. Jiang, W.; Zhang, J.; Li, Y.; Zhang, D.; Hu, G.; Gao, H.; Duan, Z. Advancing storm surge forecasting from scarce observation data: A causal-inference based Spatio-Temporal Graph Neural Network approach. Coast. Eng. 2024, 190, 104512. [Google Scholar] [CrossRef]
  53. Hendrycks, D.; Gimpel, K. Gaussian error linear units (gelus). arXiv 2016, arXiv:1606.08415. [Google Scholar] [CrossRef]
  54. Ba, J.L.; Kiros, J.R.; Hinton, G.E. Layer normalization. arXiv 2016, arXiv:1607.06450. [Google Scholar] [CrossRef]
  55. Bastidas, A.A.; Tang, H. Channel attention networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition Workshops (CVPRW), Long Beach, CA, USA, 16–17 June 2019. [Google Scholar]
  56. Hu, J.; Shen, L.; Sun, G. Squeeze-and-excitation networks. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 18–23 June 2018; pp. 7132–7141. [Google Scholar]
  57. Hsiao, T.-Y.; Chang, Y.-C.; Chou, H.-H.; Chiu, C.-T. Filter-based deep-compression with global average pooling for convolutional networks. J. Syst. Archit. 2019, 95, 9–18. [Google Scholar] [CrossRef]
  58. Kingma, D.P.; Ba, J. Adam: A method for stochastic optimization. arXiv 2014, arXiv:1412.6980. [Google Scholar] [CrossRef]
  59. ReduceLROnPlateau Pytorch. Available online: https://docs.pytorch.org/docs/stable/generated/torch.optim.lr_scheduler.ReduceLROnPlateau.html (accessed on 12 April 2025).
  60. Paszke, A. Pytorch: An imperative style, high-performance deep learning library. arXiv 2019, arXiv:1912.01703. [Google Scholar] [CrossRef]
  61. Liu, X.; Xia, J.; Wright, G.; Arnold, L. A state of the art review on High Water Mark (HWM) determination. Ocean Coast. Manag. 2014, 102, 178–190. [Google Scholar] [CrossRef]
  62. Sanghyun, W.; Shoubhik, D.; Ronghang, H.; Xinlei, C.; Zhuang, L.; So, K.I.; Saining, X. Convnext v2: Co-designing and scaling convnets with masked autoencoders. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), Vancouver, BC, Canada, 17–24 June 2023; pp. 16133–16142. [Google Scholar]
  63. Shao, W.; Jin, Z.; Wang, S.; Kang, Y.; Xiao, X.; Menouar, H.; Zhang, Z.; Zhang, J.; Salim, F. Long-term Spatio-temporal Forecasting via Dynamic Multiple-Graph Attention. arXiv 2022, arXiv:2204.11008. [Google Scholar]
  64. Cini, A.; Marisca, I.; Bianchi, F.M.; Alippi, C. Scalable Spatiotemporal Graph Neural Networks. Proc. AAAI Conf. Artif. Intell. 2023, 37, 7218–7226. [Google Scholar] [CrossRef]
  65. Mallick, T.; Balaprakash, P.; Rask, E.; Macfarlane, J. Graph-Partitioning-Based Diffusion Convolutional Recurrent Neural Network for Large-Scale Traffic Forecasting. Transp. Res. Rec. J. Transp. Res. Board 2020, 2674, 473–488. [Google Scholar] [CrossRef]
  66. Li, Y.; Yu, D.; Liu, Z.; Zhang, M.; Gong, X.; Zhao, L. Graph Neural Network for spatiotemporal data: Methods and applications. arXiv 2023, arXiv:2306.00012. [Google Scholar] [CrossRef]
  67. Shao, Y.; Li, H.; Gu, X.; Yin, H.; Li, Y.; Miao, X.; Zhang, W.; Cui, B.; Chen, L. Distributed Graph Neural Network Training: A Survey. ACM Comput. Surv. 2024, 56, 1–39. [Google Scholar] [CrossRef]
Figure 1. Hurricane Harvey storm surge region of interest, hurricane track, and distribution of twelve (12) observation stations.
Figure 1. Hurricane Harvey storm surge region of interest, hurricane track, and distribution of twelve (12) observation stations.
Jmse 13 01780 g001
Figure 2. ADCIRC+SWAN model generated mesh grids: (a) Original fine mesh (267,597 nodes); (b) Outliers removed – coarse mesh (88,901 nodes); (c) 1/10 sampled – coarser mesh (8810 nodes).
Figure 2. ADCIRC+SWAN model generated mesh grids: (a) Original fine mesh (267,597 nodes); (b) Outliers removed – coarse mesh (88,901 nodes); (c) 1/10 sampled – coarser mesh (8810 nodes).
Jmse 13 01780 g002
Figure 3. Architecture of uniform model (input/embedding layers, feature fusion and extraction layers, regression layer).
Figure 3. Architecture of uniform model (input/embedding layers, feature fusion and extraction layers, regression layer).
Jmse 13 01780 g003
Figure 4. Model Spatiotemporal Embeddings with Bottleneck ResNet and Channel Attention (STERCHA).
Figure 4. Model Spatiotemporal Embeddings with Bottleneck ResNet and Channel Attention (STERCHA).
Jmse 13 01780 g004
Figure 5. Heat maps for comparing the target with the prediction on the validation set, test set, and high water marks: (a) Target on the validation set (26 August 2017, 00:00:00); (b) 6 h horizon of prediction on the same time as (a); (c) Difference between (a,b); (d) Target on the validation set (28 August 2017, 9:00:00); (e) 9 h horizon of prediction on the same time as (d); (f) Difference between (d,e); (g) Target on the test set (30 August 2017, 18:00:00); (h) 12 h horizon of prediction on the same time as (g); (i) Difference between (g,h); (j) High water marks for all nodes on the entire set; (k) 12 h prediction horizon of entire set; (l) Difference between (j,k); After training the model with the training set, Figure 5 shows the comparison on the validation set, test set and High-Water Marks of the entire dataset. The left, middle, and right columns of Figure 5 show the targets, predictions, and differences in the water elevation heat map, respectively, when the center of Hurricane Harvey moves to different geolocations.
Figure 5. Heat maps for comparing the target with the prediction on the validation set, test set, and high water marks: (a) Target on the validation set (26 August 2017, 00:00:00); (b) 6 h horizon of prediction on the same time as (a); (c) Difference between (a,b); (d) Target on the validation set (28 August 2017, 9:00:00); (e) 9 h horizon of prediction on the same time as (d); (f) Difference between (d,e); (g) Target on the test set (30 August 2017, 18:00:00); (h) 12 h horizon of prediction on the same time as (g); (i) Difference between (g,h); (j) High water marks for all nodes on the entire set; (k) 12 h prediction horizon of entire set; (l) Difference between (j,k); After training the model with the training set, Figure 5 shows the comparison on the validation set, test set and High-Water Marks of the entire dataset. The left, middle, and right columns of Figure 5 show the targets, predictions, and differences in the water elevation heat map, respectively, when the center of Hurricane Harvey moves to different geolocations.
Jmse 13 01780 g005
Figure 6. Relationship between features’ combination and MAE, MAPE (Input Features: 0: Water Elevation, 1: Water Velocity U, 2: Water Velocity V, 3: Wind Velocity U, 4: Wind Velocity V, 5: Atmospheric Pressure, 6: Bathymetry, 7: Manning’s n, 8: Time in Day, *: Without Spatial Feature, Output features = 0, digital combination represents the different combinations of the features).
Figure 6. Relationship between features’ combination and MAE, MAPE (Input Features: 0: Water Elevation, 1: Water Velocity U, 2: Water Velocity V, 3: Wind Velocity U, 4: Wind Velocity V, 5: Atmospheric Pressure, 6: Bathymetry, 7: Manning’s n, 8: Time in Day, *: Without Spatial Feature, Output features = 0, digital combination represents the different combinations of the features).
Jmse 13 01780 g006
Figure 7. Regression between target and prediction in all time frames for all nodes on the entire dataset (Conditions: 8810 nodes, all features (012345678), 7 interpolated points, 84 input/84 output time steps). (a) Module: ConvNeXt V2, STID and ConCANext. (b) STERCHA. (c) ConvNeXt V2. (d) STID. (e) ConCANext.
Figure 7. Regression between target and prediction in all time frames for all nodes on the entire dataset (Conditions: 8810 nodes, all features (012345678), 7 interpolated points, 84 input/84 output time steps). (a) Module: ConvNeXt V2, STID and ConCANext. (b) STERCHA. (c) ConvNeXt V2. (d) STID. (e) ConCANext.
Jmse 13 01780 g007aJmse 13 01780 g007b
Figure 8. Comparison of the observed, ADCIRC simulation, and model prediction of 12 observation stations.
Figure 8. Comparison of the observed, ADCIRC simulation, and model prediction of 12 observation stations.
Jmse 13 01780 g008
Figure 9. Comparison of Hurricane Harvey Tracks between observation and 1-, 6- 12-horizon of prediction.
Figure 9. Comparison of Hurricane Harvey Tracks between observation and 1-, 6- 12-horizon of prediction.
Jmse 13 01780 g009
Figure 10. Hurricane Harvey tracks and its attributes.
Figure 10. Hurricane Harvey tracks and its attributes.
Jmse 13 01780 g010
Table 1. National Oceanic and Atmospheric Administration (NOAA) observation stations for the study.
Table 1. National Oceanic and Atmospheric Administration (NOAA) observation stations for the study.
Station IDNOAA IDLonLatBathymetry (m)Name
18775870−97.216727.58−3.02Bob Hall Pier, TX, USA
28775296−97.3927.812−10.46USS Lexington, Corpus Christi Bay, TX, USA
38775241−97.038327.837−10.23Aransas, Aransas Pass, TX, USA
48775237−97.073327.841−12.05Port Aransas, TX, USA
58774770−97.046128.022−2.35Rockport, TX, USA
68774230−96.794528.222−0.87Aransas Wildlife Refuge, TX, USA
78773037−96.713128.402−0.76Seadrift, TX, USA
88773701−96.39528.447−3.11Port O’Connor, TX, USA
98772471−95.29528.935−9.49Freeport SPIP, Freeport Harbor, TX, USA
108771341−94.724529.354−10.25Galveston Bay Entrance, North Jetty, TX, USA
118770822−93.841729.69−5.3009Texas Point, Sabine Pass, TX, USA
128768094−93.343329.768−5.9747Calcasieu Pass, LA, USA
Table 2. Input and output features of the uniform model.
Table 2. Input and output features of the uniform model.
Model Component N C I C O Input FeaturesOutput Features
Surrogate model of ADCIRC+SWAN88,091101Time Series (Water elevation, water velocity u, water velocity v, wind velocity u, wind velocity v, atmospheric pressure), Static Features (bathymetry, Mannings’ n), Spatial embedding, Temporal embeddingWater elevation
Water levels of observation stations1241Time Series (Water levels), Spatial embedding, Temporal embeddingWater levels
Hurricane tracks and attributes177Longitude, latitude, wind speed, storm speed, atmospheric pressure, hurricane category, storm directionSame as input features
N : number of nodes C I : number of input features C o : number of output features.
Table 3. Data arrangement for the uniform model.
Table 3. Data arrangement for the uniform model.
Model Component Data Split
Ratio
TrainingValidationTest
Surrogate model of ADCIRC+SWAN0.3:0.3:0.454 h; from 23 August 2017, 12:00:00 to 25 August 2017, 18:00:0054 h; from 25 August 2017, 19:00:00 to 28 August 2017, 00:00:0072 h; from 28 August 2017, 01:00:00 to 31 August 2017, 00:00:00
Water levels of observation stations0.3:0.3:0.4648 data points; from 23 August 2017, 00:00:00 to 25 August 2017, 16:42:00648 data points; from 25 August 2017, 16:48:00 to 28 August 2017, 09:30:00864 data points; from 28 August 2017 09:36:00 to 31 August 2017 23:54:00
Hurricane tracks and attributes0.5:0.3:0.2208 data; from 16 August 2017, 06:00:00 to 24 August 2017, 22:00:00124 data; from 24 August 2017, 22:00:00 to 30 August 2017, 02:00:0083 data; from 30 August 2017 03:00:00 to 2 September 2017 12:00:00
Disclaimer/Publisher’s Note: The statements, opinions and data contained in all publications are solely those of the individual author(s) and contributor(s) and not of MDPI and/or the editor(s). MDPI and/or the editor(s) disclaim responsibility for any injury to people or property resulting from any ideas, methods, instructions or products referred to in the content.

Share and Cite

MDPI and ACS Style

Hou, J.; Akbar, M.K.; Samad, M.D.; Ouyang, L. Spatiotemporal Deep Learning to Forecast Storm Surge Water Levels and Storm Trajectory: Case Study Hurricane Harvey. J. Mar. Sci. Eng. 2025, 13, 1780. https://doi.org/10.3390/jmse13091780

AMA Style

Hou J, Akbar MK, Samad MD, Ouyang L. Spatiotemporal Deep Learning to Forecast Storm Surge Water Levels and Storm Trajectory: Case Study Hurricane Harvey. Journal of Marine Science and Engineering. 2025; 13(9):1780. https://doi.org/10.3390/jmse13091780

Chicago/Turabian Style

Hou, Junqin, Muhammad K. Akbar, Manar D. Samad, and Lizhi Ouyang. 2025. "Spatiotemporal Deep Learning to Forecast Storm Surge Water Levels and Storm Trajectory: Case Study Hurricane Harvey" Journal of Marine Science and Engineering 13, no. 9: 1780. https://doi.org/10.3390/jmse13091780

APA Style

Hou, J., Akbar, M. K., Samad, M. D., & Ouyang, L. (2025). Spatiotemporal Deep Learning to Forecast Storm Surge Water Levels and Storm Trajectory: Case Study Hurricane Harvey. Journal of Marine Science and Engineering, 13(9), 1780. https://doi.org/10.3390/jmse13091780

Note that from the first issue of 2016, this journal uses article numbers instead of page numbers. See further details here.

Article Metrics

Back to TopTop