Flood Forecasting Using Hybrid LSTM and GRU Models with Lag Time Preprocessing

: Climate change and urbanization have increased the frequency of ﬂoods worldwide, resulting in substantial casualties and property loss. Accurate ﬂood forecasting can offer governments early warnings about impending ﬂood disasters, giving them a chance to evacuate and save lives. Deep learning is used in ﬂood forecasting to improve the timeliness and accuracy of ﬂood water level predictions. While various deep learning models similar to Long Short-Term Memory (LSTM) have achieved notable results, they have complex structures with low computational efﬁciency, and often lack generalizability and stability. This study applies a spatiotemporal Attention Gated Recurrent Unit (STA-GRU) model for ﬂood prediction to increase the models’ computing efﬁciency. Another salient feature of our methodology is the incorporation of lag time during data preprocessing before the training of the model. Notably, for 12-h forecasting, the STA-GRU model’s R-squared (R2) value increased from 0.8125 to 0.9215. Concurrently, the model manifested reduced root mean squared error (RMSE) and mean absolute error (MAE) metrics. For a more extended 24-h forecasting, the R2 value of the STA-GRU model improved from 0.6181 to 0.7283, accompanied by diminishing RMSE and MAE values. Seven typical deep learning models—the LSTM, the Convolutional Neural Networks LSTM (CNNLSTM), the Convolutional LSTM (ConvLSTM), the spatiotemporal Attention Long Short-Term Memory (STA-LSTM), the GRU, the Convolutional Neural Networks GRU (CNNGRU), and the STA-GRU—are compared for water level prediction. Comparative analysis delineated that the use of the STA-GRU model and the application of the lag time pre-processing method signiﬁcantly improved the reliability and accuracy of ﬂood forecasting.


Introduction
As urbanization and climate change intersect, the flood risks escalate [1][2][3][4][5][6][7].One primary reason is the swifter water runoff from surfaces that are impervious to water absorption [8].This phenomenon is closely tied to land-use patterns, which play a pivotal role in flood predictions.The surge in urbanization contributes to the proliferation of these impervious surfaces, amplifying rainwater runoff.In tandem, factors like the dwindling of vegetation and forests, shifts in agricultural land management, and modifications to rivers and wetlands all influence the volume and velocity of water flow.Given these interdependencies, rendering accurate flood predictions mandates an integrative approach, taking into account these land-use dynamics alongside other pertinent data [9][10][11].These flooding events are more than mere natural phenomena; they pose grave threats to human safety and have the potential to cause significant economic damages, especially in regions more susceptible to inundation [12][13][14].Recognizing the gravity of these threats, governments have heavily invested in early flood warning and forecasting systems [15].These systems do more than just signal potential dangers; they are critical assets in both safeguarding lives and substantially reducing property damage by facilitating the timely implementation of preventive protection measures such as sandbags [16,17].
Traditional methods of assessing flood risks, while foundational, are no longer adequate on their own.The paradigm has shifted towards predictive models that can proactively alert communities about impending flood threats [18,19].An essential feature of these systems is the provision of varying lead times, which are invaluable for both managing and preemptively addressing the risks associated with imminent flood events and other related disasters [20,21].
But how do these systems work, and what makes them so effective?They are meticulously designed to provide insights into the expected scale, onset, locale, and potential repercussions of a flood event [22][23][24].These predictions are not based on guesswork; they are underpinned by data diligently collected throughout the year from strategically placed sensors in water basins, inclusive of lakes and rivers, as well as from flood deterrent structures like dams, dikes, and embankments.Moreover, purpose-built infrastructures for flood prediction and monitoring play a pivotal role in data collection, emphasizing that the quality of the dataset is directly proportional to the forecasting model's efficacy [25,26].
In the realm of flood prediction, three variables stand out in their significance: precipitation, river flow and water levels.The data on rainfall offer insights into its intensity and duration, which in turn affects the volume of water flowing into the river system [27][28][29].Concurrently, the river's current water level acts as a barometer for its capacity to accommodate incoming water surges.An accurate flood prediction hinges on a nuanced understanding of the dynamics between these two factors.As soon as the soil's moisture levels or the river's capacity reach critical thresholds, flood risks amplify [30][31][32][33].Through continuous monitoring and data analysis of both precipitation and river water levels, these sophisticated forecasting models can identify and highlight patterns indicative of potential flood events.

The Flood Prediction Models and Lag Time Preprocessing
Long Short-Term Memory (LSTM) is a form of Recurrent Neural Network (RNN) intended to address the long-term dependency issue encountered by RNNs when processing extensive sequence data [34,35].In recent years, LSTM has achieved considerable success across multiple domains, including natural language processing, speech recognition, and time series prediction [36].
LSTM exhibits significant potential in flood forecasting, a prototypical time series prediction problem [37][38][39].This process requires the handling and understanding of continuous meteorological and hydrological data (such as precipitation, river water levels, soil moisture, etc.), forming the basis for future flood prediction.The unique internal structure of LSTM, capable of processing and memorizing long-term sequential dependencies, makes it an ideal candidate for solving such problems [40][41][42][43][44].
The application of the LSTM model in flood prediction continues to evolve.Initial research primarily focused on employing LSTM to model and predict rainfall and river water levels at individual sites [45].As deep learning technology advanced, researchers began exploring more complex models, such as integrating Convolutional Neural Networks (CNN) with LSTM, to handle meteorological and hydrological data across multiple geographical locations, thereby further enhancing the accuracy and timeliness of flood predictions [46][47][48].
It is crucial to note that flood prediction is not only a data-driven problem, but also requires understanding and consideration of various complex influencing factors such as geography, climate, and human activities (ref.[49]).Currently, hybrid models are receiving significant attention because they enhance the generalizability, and stability of single models.Although LSTM, the Convolutional Neural Networks LSTM (CNNL-STM), the Convolutional LSTM (ConvLSTM) and other deep learning models have shown tremendous potential in flood prediction, ongoing optimization and improvements of these models are needed in practice to better address the various challenges inherent in flood prediction [47,[50][51][52][53].Moreover, since the flooding prediction dataset contains not only time series but also spatial series, with a large amount of data over a long span of time, the attention mechanism can assist the model to deal with long-term dependencies more effectively.The attention mechanism can help the model extract useful information from the input data more efficiently, thereby improving the accuracy of prediction.The spatiotemporal Attention LSTM (STA-LSTM) model has been used in flood forecasting and has achieved good results, as demonstrated in Table 1 [46,49,54].Prediction models should not only ensure accuracy but also strive for the highest possible computational efficiency.Developed in 2014, the Gated Recurrent Unit (GRU) is a prediction model founded on similar principles as the LSTM model [56].While GRU and LSTM research findings share similarities, key distinctions exist between the two models.GRU, for example, boasts a superior numeration ability, signifying its capacity to effectively capture and retain essential information over longer sequences [57,58].This ability is vital for tasks involving long-term dependencies, where the model must take into account past information to make precise predictions.
The GRU model incorporates a gating mechanism that enables the model to selectively update its hidden state based on the input data [59].In particular, GRU employs an update gate, combining the roles of LSTM forget and input gates.This combination simplifies the architecture, and reduces the number of parameters, leading to computational efficiency and quicker training times.Furthermore, GRU's streamlined design, merging the cell state and hidden state, fosters efficient information flow within the model.This architecture empowers the GRU to capture relevant information and discard unnecessary details, rendering it particularly suitable for tasks involving sequential data analysis.In the last two years, both the GRU and the Convolutional Neural Networks GRU (CNNGRU) models have been explored for their utility in flood prediction.The GRU model has proven to be more effective for short-term flood forecasts compared to LSTM [57].While the CNN-GRU model has shown promise in flood prediction, enhancements in its performance for long-term forecasting are still necessary [60,61].
In hydrology, the lag time is the catchment response time between the rainfall and the runoff response [62].With the increase in urban land taking over the previously rural land, infiltration rates can decrease and have adverse effects on flood risk for people living in the vicinity of a flood zone.Accurate modeling of flood events contribute to improved watershed management and mitigation of potential flood hazards [63,64].
The lag time is defined as the delay between the time a rainfall event over a watershed begins until runoff reaches its maximum peak [65,66].The lag time of a catchment indicates the speed at which the river will react to increased precipitation and can be influenced by several parameters.These are slope, length and roughness of flow path, size of the basin, soil type and land use [67].The estimation of lag time can be determined both empirically using formulas and by use of hydrological data [68].This method utilizes data from an upstream precipitation station and a downstream flow monitoring site.The lag time of the stream is ascertained from the time difference between the peak precipitation and the peak runoff.Various studies have proposed the use of both hydrological data and empirical equations and have achieved success [69].
Furthermore, in the field of flood forecasting, the integration of spatiotemporal data is commonly adopted to enhance prediction accuracy.Within this context, the temporal delay between upstream and downstream hydrological stations emerges as a critical factor.This delay is attributed to a combination of factors, including the river's natural flow rate, channel morphology and length, topographic gradients, and human interventions [70,71].In the preprocessing of spatiotemporal data, we can determine the specific delay between the upstream and downstream by examining historical data, focusing on the time difference between peak values observed at each upstream station and the target downstream station [66,72,73].The lag time of a catchment plays a significant role in stream flow model performance.With the addition of lag time to the stream flow-driven applications, the accuracy of the models' travel time accuracy will improve significantly [67].

Contribution
When employing deep learning methods, gradient vanishing is a common issue, which can be even worse when dealing with long sequence data.It leads to minute weight updates, thus causing the network to learn very slowly, or even fail to learn.LSTM and GRU models may serve as a good choice to attenuate such an issue.While the STA-LSTM model has been successfully employed in flood forecasting and yielded satisfactory results, such a model demands large computational efforts to perform the training and is often time-consuming.
Based on the above observation, our research augmented time series prediction with spatial information to improve forecasting capabilities.To overcome challenges in handling spatiotemporal datasets, the data are preprocessed before training, in which stage the lag time between rainfall volumes and the target station is determined, and the lag time between each hydrological station and the target station is determined.To better extract the features of data that contain both temporal and spatial information, the attention mechanism is used to deal with long-term dependencies effectively.Then based on the STA-LSTM model, the spatiotemporal Attention GRU (STA-GRU) model is constructed to reduce the model complexity and improve the computational efficiency.Flood forecasting models with high computational efficiency are capable of providing more timely warnings, thereby facilitating the faster implementation of emergency measures and mitigating the impact of disasters.Compared with the STA-LSTM, STA-GRU has similar mechanisms in data processing but a much simpler model architecture, and therefore comparable performance can be achieved while less computational effort is needed.Finally, the performance of seven models is compared, that is, Convolutional Neural Networks LSTM (CNN-LSTM), Convolutional LSTM (Conv-LSTM), spatiotemporal Attention LSTM (STA-LSTM), Convolutional Neural Networks GRU (CNN-GRU), and spatiotemporal Attention GRU (STA-GRU).These hybrid models synergize the unique strengths of their individual components, aiming to intricately capture the spatiotemporal dynamics inherent in flood prediction.

Materials and Methods
Originating from Orangeville, the Credit River winds its way through the landscapes of southern Ontario, Canada, meandering through towns such as Brampton, before gracefully merging with Lake Ontario in Mississauga.Complementing the river's natural allure, the surrounding areas boast multi-functional parks and verdant open spaces, inviting enthusiasts for activities ranging from fishing and hiking to immersive wildlife observation.However, with all its serene beauty, the Credit River is not without its perils.In times of torrential rain or during the spring melt, its tranquil waters can surge, posing flood threats.As a cautionary note, those residing or venturing near its banks are advised to be vigilant, heeding local weather updates and flood advisories.
Located in Mississauga, Credit River's station 02HB029 plays a pivotal role in flood forecasting for this bustling metropolitan area.Given that the Credit River courses directly through the heart of downtown Mississauga, accurately predicting the discharge in the southern part of the river is vital for safeguarding both lives and property.Although the real-time rainfall monitoring network in the Credit River watershed is limited, one precipitation monitor is situated near station 02HB025.With an aim to strike a balance between simplicity and precision in the flood forecasting system, we have incorporated the data from this precipitation monitor into the purview of our manuscript.Ideally, a flawless early flood forecasting system would harmonize the objectives of governmental bodies, affected residents, and the insurance sector, facilitating a shared understanding of flood loss implications.Considering the escalating trend of insured catastrophic losses annually, it is imperative that a highly accurate early flood forecasting system is available.
Figure 1 shows that stations 02HB025, 02HB018, 02HB001, 02HB013, and 02HB031 are strategically located in the headwaters of the Credit River Watershed.These are positioned upstream of the vital station 02HB029, which is nestled in the flood-sensitive regions of downtown Mississauga, close to the basin of the Credit River watershed.Rainfall station 25 is located near station 02HB025, while rainfall station 18 is situated close to water station 02HB018.Both of them are upstream of the vital station 02HB029.Our hydrological prediction models have exhibited exceptional performance on spatiotemporal data, prompting our endeavor to further enhance their capabilities.Recognizing the notable success of the STA-LSTM in flood prediction using this type of data, the research attempted to further bolster the model's generalization capability and computational speed.To gain a comprehensive understanding, we have embarked on a comparison of STA-LSTM performance against other models, namely LSTM, CNN-LSTM, ConvL-STM, GRU, CNNGRU, and STA-GRU, specifically in the realm of flood forecasting with spatiotemporal data.The rationale behind spatially coupling LSTM and GRU-based models lies in their superior proficiency in handling spatiotemporal series data sequences, particularly when contrasted with their traditional counterparts.
Taking into account the urbanization levels and the expanse of the Credit River watershed, the catchment's response time generally oscillates between three and eight hours.This variance is contingent upon the nature and duration of the rainfall event, ranging from abrupt yet intense summer thunderstorms to more prolonged rainfalls paired with snowmelt during spring.Flood warnings for the Credit River watershed cater to diverse users and objectives.Among these are the mobilization of operational teams and emergency responders, alerting the public about the specifics of the impending event, and, in severe instances, initiating evacuation and emergency protocols.In light of these requirements, our models were trained and tested for both 12-h and 24-h forecast scenarios, with subsequent evaluations of their accuracy.

The Correlation of Water Level, Discharge and Precipitation
The variables under consideration present a distinct positive correlation, as illustrated in Figure 2. The correlation coefficients, ranging from 0 to 1, further underline this observation.Such a trend indicates a deep-seated interconnectedness and mutual influence among the watershed stations, suggesting that changes or events in one station might resonate in others.This interrelation is not merely an interesting observation but holds practical implications.Precisely due to this pronounced correlation, utilizing these data as test or benchmark datasets for evaluating model performance gains increased weight.A model that can accurately predict under such conditions of high interrelatedness is likely to be robust and reliable.

The Water Level Lag Time between Each Station
The lag time between the upstream rainfall station and the target water station is ascertained by plotting the lag time graphs for nine different flood events.The average lag time between the rainfall events and runoff responses is used in this study.Similarly, the lag time between the upstream water level station and the target water level station is also determined.Additionally, the distances between the upstream water level station and the target water level station are measured, as shown in Table 2.As the Euclidean distance increases, the lag time also tends to increase.Such a data preprocessing approach aims to ensure optimal correlation between each upstream station and the target station 02HB029, enhancing the predictive model's performance.

Theoretical Background of the Models and Performance Metrics
In deep learning research for flood forecasting, common models like LSTM, GRU, CNNLSTM, ConvLSTM, and CNNGRU have been widely adopted.These models integrate temporal characteristics with convolutional features to process spatiotemporal data.LSTM and GRU emphasize capturing long-term sequence patterns, while CNNLSTM and CNNGRU combine the feature extraction capabilities of convolutional neural networks with the temporal modeling strengths of recurrent networks.In contrast, the STA-LSTM and STA-GRU models, which are more intricate in structure and specifically designed to capture spatiotemporal relationships, have not yet been extensively utilized.To cater to our spatiotemporal dataset, we have made adaptive modifications to these existing STA-LSTM and STA-GRU models, enhancing their efficacy in flood prediction.

STA-LSTM Model
The STA-LSTM model is tailored for spatiotemporal analyses.It is adept at processing datasets that intertwine time and spatial elements.While maintaining the foundational LSTM elements such as the forget, input, and output gates shown in Figure 3, the STA-LSTM integrates advanced structures like convolutional layers or attention mechanisms to discern spatial patterns more effectively as Figure 4.The main output of the STA-LSTM model is given as where β t is the result of the temporal attention part, z represents the summation of h 1 to h t , y shows the output of the model, and W t is the weight.The Leaky ReLU [74] activation function is used before output.
In the Temporal Attention (TA) part, the equations are provided as follows: {β 1 , . . ., where H as the concatenation of hidden states h 1 to h t , and W TA is the weight.The softmax activation function is defined as softmax(x) i = e x i ∑ n j=1 e x j and ReLU activation function is defined as ReLU = max(0, x).
In the Spatial Attention (SA) part, the given equations are where W SA is the weight, and α t represents the result after applying the softmax operation.The softmax activation function and tanh activation function are utilized in the result range of (−1, 1).The resulting S t will lie in the range between −1 and 1.
For the LSTM cell, the equations are provided as where x t means input matrix, x t represents the x t is modulated, means the Hadamard product, c t means cell state (long memory), and h t means hidden state (short memory).

STA-GRU Model
The STA-GRU model is meticulously crafted for spatiotemporal data processing too.Suited for complex datasets with overlapping spatial and temporal attributes.It retains the fundamental GRU mechanisms, notably the reset and update gates, ensuring effective sequence dependency tracking.Moreover, to augment its spatial pattern comprehension, STA-GRU may integrate sophisticated elements like FC layers or attention frameworks, as shown in Figure 5.
Additionally, methods such as Grid Search and Random Search were employed for the optimization of the model's hyperparameters, to further enhance the model's performance and reliability.In the STA-GRU model, a GRU cell is used in place of the LSTM cell from STA-LSTM.The structure of the GRU cell is illustrated below as Figure 6.
The GRU cell is provided as follows: where the result of the update gate is u t , the result of the reset gate is r t , and W u , W r and W are the weights of the update gate, reset gate and cell state, respectively.Then the input data for each GRU cell comprises a 1 × 14 vector.This vector comprises water level and discharge for stations 01, 13, 18, 25, 29, 31 and precipitation at stations 18 and 25 values.

Performance Metrics
Evaluating the performance of flood prediction models involves a crucial decision in selecting the appropriate metrics.The combination of Root Mean Square Error (RMSE), Mean Absolute Error (MAE), and R-square provides a comprehensive model assessment [75].

1.
RMSE emphasizes large errors by squaring the differences, making the model sensitive to significant deviations in predicting flood quantities, thus ensuring robustness and accuracy.The formula of RMSE is given as where O i is the observation value, P i is the prediction value, and n is the number of observations/predictions. 2.
MAE assigns equal weight to each error, aiding in evaluating the model's average predictive precision in general scenarios.The MAE can be represented by the following equation 3. R-square offers a measure of how well the model explains the variability in flood flow, where higher R-square values indicate better capability to account for observed fluctuations, enhancing the model's interpretability and reliability.R-square is defined by where O i is the average of the observation value.
Considering these three metrics collectively, they provide a wealth of information from different angles-RMSE and MAE focus on error magnitude and mean accuracy, while R-square emphasizes model explanatory power-resulting in a well-rounded evaluation that helps accurately gauge and refine the performance of flood prediction models.

Description of Validation Case
During the model training, the test data comprise 20% of the total data, while the training data constitute 80% of the total data and the validation data make up 10% of the training data.The batch size is set to 128, the learning rate is from 0.001 to 0.0001 and the number of epochs is 200.The training time for the STA-GRU model averages about 3 s per epoch, in contrast to the STA-LSTM model, which takes roughly 5 s per epoch.This result unequivocally demonstrates a notable enhancement in computational efficiency when employing GRU models.As shown in Figure 7, a loss trajectory was employed to scrutinize the effects of two data processing strategies on the performances of different models.The graphs distinctly illustrate that before the 'lag time' preprocessing, the training loss curve of the STA-GRU model has better numerical stability than the STA-LSTM model, and the validation loss curve of the STA-GRU model has greater accuracy and fitting.Moreover, after undergoing 'lag time' preprocessing, both the training loss and validation loss curves exhibit superior performance in comparison to data not subjected to this preprocessing.Both training loss and validation loss are pivotal metrics in evaluating the proficiency of machine learning models, with lower loss values indicating enhanced predictive accuracy and generalization capabilities.(g) (h) The 'lag time' preprocessing might have captured spatiotemporal dependencies or other salient features within the data, enabling the model to learn the data's inherent structures and patterns more effectively.In contrast, data not subjected to this preprocessing may lack these essential cues, leading to challenges in model fitting and consequently manifesting higher loss values during both training and validation phases.In summation, 'lag time' preprocessing evidently furnishes the model with a richer and more accurate data representation, thereby bolstering its fitting and generalization prowess.

Discussion of Results
In this study, we have employed a range of advanced sequential models for the task of time series forecasting.These models include LSTM, GRU, CNNLSTM, CNNGRU, ConvLSTM, STA-LSTM, and STA-GRU, and their performance metrics have been evaluated across various prediction time intervals, encompassing Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and the coefficient of determination (R-square).
In light of the collated results, a discernible trend emerges irrespective of whether the data undergo lag time preprocessing.As the prediction time interval lengthens, both RMSE and MAE values manifest a progressive increase, whereas the R-square values exhibit a decline.This pattern accentuates that, over extended prediction time horizons, the predictive efficacy of models tends to wane, leading to a broadening of prediction errors.
From a holistic perspective, as seen Table 3, the STA-GRU model and STA-LSTM model consistently excel in longer-term forecasts, with the majority of their R 2 values comfortably surpassing the 0.8 benchmark.For initial performance, STA-LSTM, ConvLSTM, and STA-GRU emerge as front runners for the 6-h forecast, all boasting an impressive R 2 value of 0.93-0.94.This suggests that these models capture the immediate temporal dependencies in the data with remarkable precision.Then at the 12-h midpoint, the STA-GRU, STA-LSTM, and ConvLSTM continue to dominate with commendable R 2 values of 0.81, 0.81, and 0.80, respectively.This accentuates their stability in medium-term forecasting.Extending the forecast to 24 h, STA-GRU maintains its supremacy with an R 2 of 0.62-the highest value of the evaluated models.On the other end of the spectrum, CNNLSTM lags, registering the lowest R 2 of 0.54.This positions the STA-GRU model as a relatively stable long-term forecaster.Moreover, the most pronounced dip in performance is observed in the models between the 12th and 24th-h forecasts-a decline of 0.2.This might hint at challenges the models face in accommodating certain temporal shifts or cyclic patterns beyond the 6-hour mark.As we progress through the forecast horizon, certain models witness a steeper attrition in performance.Case in point, the CNNLSTM's R 2 value plummets from 0.90 (at 6 h) to 0.55 (at 24 h)-a descent markedly steeper than the STA-GRU's slide from 0.93 to 0.61.Typically, a slower decline in a model's R 2 over the forecast period is emblematic of its generalization prowess and stability.Gauging from the data at hand, STA-GRU and STA-LSTM emerge as frontrunners in this regard.In addition, The foundational GRU and LSTM models show perceptible performance disparities compared to their advanced counterparts like STA-GRU and STA-LSTM.This indicates the tangible benefits brought about by sophisticated features such as spatial attention.
In flood prediction models, based on our spatiotemporal dataset, accounting for the lag time between upstream hydrological and precipitation stations and downstream target stations is crucial to further enhance the model's long-term predictive accuracy.This stems from the intrinsic spatiotemporal dynamics of hydrological processes, wherein a clear time lag exists between precipitation events and subsequent river level elevations.By accounting for this lag time between upstream and downstream stations, the model can achieve a more precise data alignment, bolster the capture of causal relationships, factor in the influences of terrain and soil conditions, and encapsulate the dynamic characteristics of flood events.Furthermore, the inclusion of lag time furnishes the model with enhanced spatiotemporal sequence features, facilitating a deeper contextual understanding and thereby significantly enhancing prediction accuracy, as shown in Table 4. Therefore, after undergoing lag time preprocessing, it is evident that as the prediction horizon extends from 6 h to 12 h, and further to 24 h, the R-value of each predictive model decreases by approximately 0.1 less than that of models without lag time preprocessing.[49,53,55].However, a salient discovery of this study is the significant enhancement in prediction accuracy and model performance observed after applying lag time preprocessing to spatiotemporal data, even when operating under identical forecast durations.Furthermore, in a bid to augment computational efficiency and extend the prediction horizon to 24 h, our developed STA-GRU model demonstrated superior performance compared to existing models documented in the literature.These findings not only affirm the pivotal role of lag time preprocessing in improving the precision of spatiotemporal data predictions but also highlight the potential of the STA-GRU model in flood forecasts.
In our research, we employed a bar chart to juxtapose the effects of two data processing methodologies on the R 2 values of our models.In Figure 8, the orange bars represent data subjected to a 'lag time' preprocessing, while the blue bars symbolize data that were not processed in this manner.The R 2 , or the coefficient of determination, is a statistical metric used to quantify the goodness of fit of a regression model, with its value ranging between 0 and 1.A value closer to 1 indicates superior predictive prowess of the model.As the prediction timeline extended, the R 2 values derived from the 'lag time' preprocessed data consistently surpassed those from the non-preprocessed data, with this disparity widening over time.This suggests that 'lag time' preprocessing not only enhances the overall goodness of fit of the model but also accentuates its advantages in long-term forecasting scenarios.This offers robust theoretical support for future data preprocessing endeavors, signifying that in certain applications, 'lag time' preprocessing could be a pivotal step, especially when extended forecasting is requisite.
In summary, the STA-GRU model exhibited superior performance over other models in each predictive time frame on data without "lag time" preprocessing.However, following the "lag time" preprocessing of the data, the STA-GRU model not only sustained its comparative advantage but also achieved a higher performance with reduced forecast error statistics.This demonstrates the STA-GRU model's outstanding adaptability and efficiency when dealing with spatiotemporal data collected from a network of real-time hydrometric stations for rapid response flood early warning applications.

Conclusions
The realm of flood forecasting has greatly benefited from the integration of deep learning techniques, which have emerged as transformative tools for enhancing prediction accuracy.In this context, our study examined the performance of several deep learning models to improve flood forecasting.These models included Long Short-Term Memory (LSTM) and its spatial derivatives such as CNNLSTM, ConvLSTM, and STA-LSTM, as well as the Gated Recurrent Unit (GRU) and its associated models like CNNGRU and STA-GRU.These models were methodically compared, analyzing their capacities to process complex hydrological data and forecast floods.Given the geographic and climatic influences on floods, a comprehensive approach to data analysis and modeling is essential.By harnessing spatial information and integrating it with time series data, we determined a more holistic flood prediction model.Among our key results, it is found that models incorporating the spatiotemporal attention mechanism, like the STA-LSTM and STA-GRU, exhibit an enhanced ability to manage long-term dependencies.Particularly, the STA-GRU model improves computational efficiency while maintaining prediction performance at a level not lower than that of the STA-LSTM model.Elevating computational efficiency is crucial in the context of flood forecasting, as it allows the predictive system to rapidly process and analyze extensive datasets, thereby enabling swift responses and real-time surveillance of flood incidents.This not only contributes to the prompt issuance of warnings but also facilitates the contemporaneous updating of the predictive models, enhancing the accuracy of the alerts.Furthermore, when the datasets are preprocessed with lag time, the R 2 value of the STA-GRU model increases from 0.6181 to 0.7232, RMSE decreases from 0.1220 to 0.1039, and MAE reduces from 0.0625 to 0.0534.These results indicate that the prediction

Figure 1 .
Figure 1.The network of real-time hydrometric monitoring stations in the Credit River Water.

Figure 2 .
Figure 2. Matrix plot of correlation between the precipitation, water level and discharge.

Figure 8 .
Figure 8. Comparing the R-square of each model before and after handling lag time.

Table 1 .
The performance of flood prediction models.

Table 2 .
The Lag time between each upstream water station and station 29.

Table 3 .
The proposed models' performance statistics before the lag time.

Table 4 .
The proposed models' performance statistics after the lag time.Upon comparing the performance metrics of various flood prediction models with and without lag time preprocessing, it becomes evident that preprocessing substantially bolsters the efficacy of all models.Our initial findings, prior to the implementation of lag time preprocessing, indicated that our predictive model exhibited RMSE and MAE values comparable to those reported by Liu et al. (2023) , Dehghani et al. (2023), and Ding et al. (2020)