1. Introduction
The Laurentian Great Lakes basin is the largest surface freshwater system on Earth, covering a drainage area of about
across the U.S.–Canada border [
1]. This region contains roughly 20% of the planet’s freshwater and is home to Lake Superior, Lake Michigan, Lake Huron, Lake Erie, and Lake Ontario. These five lakes serve a critical role in supplying drinking water to more than 15 million people throughout North America [
2,
3]. Effective simulation of this large transboundary freshwater system is crucial for water resource management [
4]. Despite the significant importance of this transboundary freshwater system, developing a comprehensive and reliable model for the Great Lakes region remains challenging due to data heterogeneity and scarcity [
5].
The Great Lakes region has a long tradition of process-based hydrological modeling. Early efforts include GLERL’s Large Basin Runoff Model (LBRM) for simulating runoff from watersheds draining to the lakes [
6,
7]. More recently, community systems such as MESH-SVS-Raven and GEM-Hydro-Watroute have been applied across the basin [
1,
8]. These models represent snow, soil moisture, and lake–atmosphere exchanges, and they typically require calibration to observed flows. While local calibration yields good fits at tuned gauges, performance often degrades when applied beyond those basins. Furthermore, large-area calibration is computationally intensive, and evaluation can be confounded by input-data dependencies [
8,
9]. Case studies echo these challenges; for example, SWAT struggled with snowmelt/baseflow trade-offs under contrasting conditions, and MESH faced subgrid and coupling limitations in complex terrain [
10,
11]. In parallel, hybrid and physics-guided ML approaches have emerged to blend mechanistic insight with data-driven skill, including LSTM postprocessing of the National Water Model [
12], DL-hydrodynamic coupling for Great Lakes forecasts [
13], and broader physics-guided ML frameworks [
14]. Our work complements the literature in these areas by evaluating a single, cross-border EA-LSTM for basin-wide daily streamflow without basin-specific calibration.
In recent years, machine learning (ML), particularly deep learning models like Long Short-Term Memory (LSTM) networks, has emerged as a powerful alternative for hydrological simulation. LSTM models are well-suited to capture the non-linear, temporal patterns in rainfall–runoff relationships, and they excel at learning from large datasets [
15,
16]. Numerous studies have shown that LSTM networks can outperform traditional hydrologic models in predicting river discharge. Kratzert et al. [
17] trained an LSTM network on a dataset of 531 basins (CAMELS dataset) and achieved unprecedented accuracy in ungauged basins. The single LSTM, trained on 30 years of data per basin, obtained a superior median Nash–Sutcliffe Efficiency (NSE) of 0.69 on test basins not used in training, compared to 0.64 for a calibrated conceptual model (SAC-SMA) and 0.58 for the NOAA National Water Model [
17]. Sabzipour et al. [
18] directly compared an LSTM neural network with a physically based hydrological model and demonstrated that the LSTM model delivered significantly lower forecast errors with a median MAE of 25 m
3/s on day 1 (versus 115
/s for the conceptual model) and higher KGE values for up to 7–9 days, without the need for explicit data assimilation. Kratzert et al. [
19] also found that an LSTM network, by efficiently learning long-term hydrological dependencies, including crucial storage effects like snow accumulation and melt, significantly outperformed the physically based SAC-SMA+Snow-17 model in rainfall–runoff simulations across a large regional sample. Gauch et al. [
20] emphasized that a single LSTM can learn to predict runoff in hundreds of catchments and dominate the performance of the SOTA conceptual models in benchmarks.
Complementary to LSTM-based rainfall–runoff studies, several ANN pipelines have emphasized input preprocessing and structure-aware signals, including decomposition with Fisher’s ordered clustering and MESA [
21], flow-pattern recognition for daily forecasting [
22], and long-term forecasting with preprocessing-only inputs [
23]. These works underscore the value of careful data conditioning and sequence structure, while our contribution focuses on a single, basin-wide deep learning model for daily streamflow across the bi-national Great Lakes.
The Great Lakes region has begun to see applications of these data-driven techniques. For example, Xue et al. [
13] integrated LSTM with a hydrodynamic model for forecasting the spatiotemporal distribution of lake surface temperature (LST) across all five Great Lakes, and Kurt [
16] trained an LSTM model to simulate the mean water level of the five lakes. Despite the success of this data-driven approach, a model for simulating streamflows of the Great Lakes basins remains elusive due to data heterogeneity and scarcity. Previous works found that LSTM networks perform best when trained on large, diverse datasets [
17,
18,
19,
20]. Since the Great Lakes basin is a transboundary system spanning the U.S. and Canada, collecting sufficient data regulated by multiple agents is the main challenge for developing an LSTM model for streamflow prediction in the Great Lakes region. The scarcity of properly measured data from such a large area makes gathering data even more challenging. Consequently, previous works trained a model with a dataset like CAMELS, which comprises U.S. basins only [
17], or developed a model targeting Canadian catchments [
18]. We address these issues by compiling data from 975 basins across the entire Great Lakes region to build a unified LSTM-based hydrological model. To our knowledge, while initiatives such as GRIP-GL [
8] provided valuable basin-wide comparisons and other studies have applied deep learning to lake variables [
13,
16,
24], our work is the first to train and evaluate a single deep learning model for daily river discharge across the entire bi-national basin. A summary comparison of these different modeling paradigms is provided in
Table 1.
In this paper, we first describe the study area in
Section 2, providing a detailed overview of the Laurentian Great Lakes region where the hydrological processes are examined. Following this,
Section 3 outlines the data collection, processing, and LSTM modeling setup. A detailed analysis of the result of our model is presented in
Section 4, and the presented result is further discussed in
Section 5.
Section 6 concludes this paper and further suggests future research directions derived from this work.
2. Study Area
We studied the Laurentian Great Lakes basin, the most extensive system of surface freshwater on Earth, situated across the transboundary region of the United States and Canada. This investigation focuses on the hydrological responses of 975 gauged river sub-basins that contribute runoff to the five Great Lakes: Superior, Michigan, Huron, Erie, and Ontario. These sub-basins were selected to represent the diverse physiographic, climatic, and land-use characteristics inherent to this complex region. The overall geographical extent of these combined sub-basins defines the study domain boundary, as illustrated by the red dashed line in
Figure 1a. Land cover across the basins is diverse, as depicted by the dominant land cover type for each basin in
Figure 1a. Agriculture is the dominant land cover in approximately 37% of these basins, primarily concentrated in the southern portions. Forest is dominant in 35% of the basins, largely found in the northern regions. Open water (representing lakes within the sub-basins) is dominant in 20%, urban land cover in 6%, and wetlands in approximately 1% of the basins. This mosaic of land cover significantly influences hydrological processes, including evapotranspiration, infiltration, and runoff generation throughout the region [
25,
26].
The physiography of the Great Lakes basin is predominantly a legacy of Pleistocene glaciation, which shaped its topography of generally low to moderate relief, influencing drainage patterns and soil development [
27,
28].
The climate is predominantly humid continental [
29]. Based on the studied basins, mean annual air temperatures range from approximately −18.4 °C to 38.2 °C, with a median of 19.0 °C and a basin-set average of 17.7 °C. Mean annual precipitation totals for these sub-basins vary from 718 mm/year to 1311 mm/year, with a median of 873 mm/year and a basin-set average of 879 mm/year. A notable spatial gradient in precipitation is observed, generally increasing from west/northwest to east/southeast across the basin. Snowpack accumulation and subsequent melt are critical components of the hydrological cycle in many of these catchments [
30].
The drainage areas of the basins vary significantly, ranging from 4.1 km
2 to 16,388 km
2, with a median area of 304 km
2 (
Figure 1b). This scale heterogeneity is mirrored in their streamflow regimes. Mean annual discharge, calculated from the average of daily discharge time series (m
3/s) for each basin, ranges from as low as 0.038 m
3/s in small headwater catchments to over 204 m
3/s in larger river systems. The median of these mean annual discharges across the basins is 3.60 m
3/s, while the average of these means is 14.15 m
3/s (
Figure 1c).
3. Method
3.1. Data Collection
To support our analysis of hydrologic behavior across the Great Lakes basin—a region notable for its bi-national extent spanning both the United States and Canada—we compiled a comprehensive dataset comprising streamflow observations, meteorological forcings, and static catchment attributes. Streamflow data were collected from a total of 975 gauge stations: U.S. gauges were obtained from the United States Geological Survey (USGS), while Canadian gauges were sourced from Environment and Climate Change Canada (ECCC). This dual-sourced dataset provides consistent and coordinated coverage across the international boundary and spans the period from 1 January 1980 to 31 December 2023. For each gauge station, a corresponding sub-basin was delineated, resulting in a total of 975 sub-basins simulated in this study.
Meteorological forcing data were derived from the Daymet v4 dataset [
31], which offers daily gridded meteorological variables at 1 km spatial resolution. The variables include daily precipitation, minimum and maximum temperature, solar radiation, vapor pressure, day length, and snow water equivalent (SWE). Areal averages of these variables were extracted for each sub-basin and temporally aligned with the streamflow records, resulting in a coherent set of dynamic input–output time series for each basin.
Table 2 summarizes the dynamic variables used in this study for model training and testing.
To capture spatial variability in basin characteristics, we incorporated static catchment attributes from the HydroATLAS database [
32], which integrates information from BasinATLAS, RiverATLAS, and LakeATLAS. A diverse suite of hydro-environmental variables was selected to represent key aspects of hydrology, climate, topography, land cover, geology, soils, and anthropogenic influence. These attributes were spatially aggregated at the sub-basin level to construct a standardized static feature set. Together, these data support a robust and interpretable modeling framework that reflects both the physical and human-induced heterogeneity across this large, bi-national watershed.
3.2. Data Preprocessing
Streamflow gauge observations across a large and complex region like the Great Lakes basin often suffer from missing or inconsistent records. For our streamflow dataset, any dates with missing discharge values were excluded from model training, and basins lacking discharge measurements altogether were omitted from the analysis. To further improve data quality, we implemented anomaly detection and removal procedures at both the point-wise and basin-wise levels.
At the point level, we flagged and removed periods with prolonged constant discharge values, which are typically indicative of sensor errors or data reporting issues. To detect instrument flat-lining, we treated runs of identical discharge values longer than a threshold
k days as anomalous and removed them (main experiments used
). This follows standard quality-control practice for attenuated/flat signals in hydrometric time series [
33,
34]. We assessed sensitivity by varying
and computing the fraction of non-missing rows excluded; the exclusion rate changed only marginally: from
(
) to
(
), with
at
, a spread of
percentage points (Appendix
Table A1). Given this <1 percentage point variation,
balances removing flat-line artifacts and retaining data. Alternative detectors (e.g., seasonal low-variance tests; change-point procedures) could be substituted with minimal changes [
35,
36,
37,
38,
39]; because they also target prolonged quasi-constant segments, similar exclusions are expected in practice.
To flag implausible magnitudes at the basin scale, we computed the mean discharge-to-surface-area ratio for each basin (
m/s) and applied an interquartile-range (IQR) rule: a basin was flagged if its ratio was below
or above
, where
, and
represent first and third quartiles, respectively. The empirical distribution and exact thresholds are reported in Appendix
Table A2. Using the resulting cutoffs (lower
, upper
), we discarded 49 extreme-ratio basins during training.
After anomaly filtering, discharge values were normalized by basin area to reduce scale differences and then log-transformed to address skewness. Similar log transformation was applied to precipitation and snow water equivalent (SWE). All remaining dynamic climate variables were standardized using the global mean and variance computed from the training set only. This approach ensures generalizability to ungauged basins, where local statistics are unavailable. Model predictions were made in the transformed space and then back-transformed to obtain physical units. The preprocessing steps for each dynamic variable are summarized in
Table 3.
3.3. LSTM Architecture and Implementation
Recurrent Neural Networks (RNNs) are powerful for modeling sequential data due to their ability to retain information from previous inputs [
40,
41]. However, traditional RNNs face challenges in capturing long-term dependencies because of the vanishing gradient problem [
19,
41]. To overcome this, Long Short-Term Memory (LSTM) networks are employed. LSTM replaces the conventional RNN hidden layer with a memory cell that uses three gates—forget, input, and output—to regulate information flow [
42].
In this study, we employed both LSTM and Entity-Aware-LSTM (EA-LSTM) Kratzert et al. [
43] to predict streamflow runoffs in the Great Lakes basin. The EA-LSTM architecture retains the essential gating mechanisms of the traditional LSTM model (
Figure 2). EA-LSTM processes inputs using the same fundamental gating equations as the standard LSTM. However, it distinguishes between static inputs
and dynamic inputs
at time
t, enabling a more efficient handling of these different input types.
Here,
and
represent the input, forget, and output gates, respectively. The vector
refers to static inputs (invariant with respect to time
t), and the vector
refers to dynamic inputs that change over time
t. Notice that in EA-LSTM, the input gate exclusively receives static inputs
, and the remaining two gates manipulate
. Here, the forget gate
and the output gate
depend on both
and
. Consequently, even though the forget gate and the output gate are computed primarily from dynamic inputs at time
t, they are implicitly influenced by static inputs through
. The input gate, on the other hand, is completely independent of the dynamic inputs. All models used in experiments in this study were implemented in
Python 3.12.3 and trained and evaluated with the open-source library
NeuralHydrology (v1.12.0) developed by Kratzert et al. [
44].
3.4. Model Configuration and Hyperparameters
In this study, we evaluate model performance using two data-splitting strategies for training and testing setup: a time-based (temporal) split and a basin-based (spatial) split.
For the time-based split, the dataset is divided chronologically into a training period (1 January 1980–31 December 2012) and a test period (1 January 2013–31 December 2023). Rather than reserving a fixed validation window, validation is performed at each training epoch by randomly sampling a subset of basins and evaluating their performance over the test period. This strategy allows the model to continually learn from the full range of training years while still assessing performance on a diverse set of catchments. For the basin-based split, the full observation record (1 January 1980–31 December 2023) is used, but basins are partitioned into training and test sets. To prevent large basins from dominating the evaluation, all basins are first grouped into five equipopulated bins based on drainage area. From each bin, 20% of basins are randomly selected to form the test set, and the remaining basins are used for training. This stratified sampling ensures that all basin size classes are proportionally represented in both the training and test sets.
Hyperparameters were selected through a targeted random search, with choices guided by common practices in hydrological deep learning. The search space for key hyperparameters included the following: hidden size
, output dropout
, and learning rate
. The Adam optimizer and Mean Squared Error (MSE) loss function were used for all experiments. The final model configuration, listed in
Table 4, was chosen based on the set of hyperparameters that yielded the highest median Nash–Sutcliffe Efficiency (NSE) on the validation set. During training, we employed an early stopping protocol with a patience of 5. Training was halted if the validation loss did not improve for five consecutive epochs, and the model with the best validation performance was retained.
We set the input sequence length to 365 days to span one full hydrological year, so the network can resolve slow storage types (snowpack, soil moisture, groundwater) and seasonal hysteresis that drive delayed runoff. This choice is common and effective for daily rainfall–runoff LSTM networks, including studies that use a 365 d window explicitly [
20,
45,
46]. Longer daily windows increase computational cost and are typically handled via multi-timescale architectures rather than by extending a single daily sequence indefinitely [
20]. Conversely, shorter input windows may truncate hydrologic memory contained in snowpack and soil moisture, which can persist for months to years and thereby reduce skill, particularly in snow-affected basins [
19].
To assess model stability, the final temporal-split experiment was repeated with three different random seeds (2004, 2025, 142589). The median NSE across the 632 test basins remained consistent, yielding a mean performance of , confirming that the model’s high performance is robust to initialization. The main experiment seed (142589) produced the best median NSE and is used for all subsequent analyses in this paper.
5. Discussion
This study represents the first time a comprehensive, cross-border hydrometeorological dataset has been compiled for the Laurentian Great Lakes basin and used to develop a deep learning model for regional streamflow prediction. Leveraging data from 975 U.S. and Canadian catchments, we trained an Entity-Aware Long Short-Term Memory (EA-LSTM) network that consistently outperformed both the operational NOAA National Water Model (NWM) and a standard LSTM architecture (
Section 3). Notably, this high performance was achieved without basin-specific calibration, demonstrating strong generalization across the large, heterogeneous, and politically divided Great Lakes basin.
EA-LSTM’s ability to integrate heterogeneous datasets from two countries and produce consistent predictions across the entire basin addresses a long-standing challenge in transboundary water management. These results align with a growing body of literature showing that deep learning models can outperform traditional calibrated hydrological models in large-sample settings [
20,
43] but extend this evidence to one of the world’s most complex freshwater systems.
5.1. Model Robustness and Generalization
To ensure the model performs well beyond a single training dataset and to address potential overfitting, we evaluated the model’s generalization capabilities using multiple spatial cross-validation experiments. In addition to our primary spatial split, which was stratified by drainage area, we conducted two further experiments where the test basins were selected by stratifying across key climatic regimes: mean annual precipitation and mean annual temperature.
The model demonstrated consistent performance across these varied splits, as detailed in
Table 10. The median NSE for the primary area-based split was 0.569, while the splits stratified by precipitation and temperature yielded highly comparable median NSE values of 0.527 and 0.524, respectively. This stability, with median NSE scores consistently in the 0.52–0.57 range, confirms that the model generalizes well across catchments with different hydroclimatic characteristics and that the high performance reported is not an artifact of a single, arbitrary train–test configuration. Furthermore, the model architecture incorporates an output dropout rate of 0.3, a standard and effective regularization technique that mitigates overfitting by preventing the co-adaptation of neurons during training, which further enhances the model’s robustness.
5.2. Comparison with Process-Based Approaches
The outperformance of the NWM by EA-LSTM in this study was noteworthy, suggesting that data-driven approaches could provide substantial improvements in predictive accuracy for operational forecasting [
12]. This aligned with findings from other regional intercomparisons where LSTM networks have matched or exceeded the performance of suites of traditional models [
8]. Narrowing down to the study domain of our interest, the Great Lakes Runoff Intercomparison Project Phase 4 (GRIP-GL) evaluated 13 models including physically based and conceptual models. Many process-based models in GRIP-GL performed well in calibration but degraded when applied regionally, particularly under the strong lake–atmosphere feedback and across national borders [
8]. For instance, during the most challenging spatiotemporal validation, the best locally calibrated process-based models (Blended-lumped and Blended-Raven) achieved a median KGE of 0.59, and the best regionally calibrated model (WATFLOOD-Raven) achieved a median KGE of 0.53 [
8]. In contrast, our EA-LSTM maintained high accuracy across all basins, with a median KGE of 0.685, outperforming the top GRIP-GL models by a margin of
KGE ≈ 0.095–0.155. This indicates superior transferability without costly, expert-led recalibration for each sub-basin, a critical factor in vast, complex systems like the Great Lakes.
A common concern with deep learning is that unlike process-based models, it does not explicitly enforce physical laws. Consequently, hybrid approaches, integrating the strengths of both data-driven and process-based models, were proposed for future pathways [
14]. However, recent work by Klotz et al. [
50] found that LSTM internal cell states could correlate with physical hydrological stores like soil moisture and snowpack. This discovery suggested that a well-trained LSTM model could learn meaningful hydrological dynamics. In our case, EA-LSTM’s forecasts were often more physically plausible than those of the NWM when compared with observed streamflows, indicating that despite the absence of explicit process constraints, the model captured realistic hydrodynamic behavior.
5.3. Why LSTM for This Application?
As time-series forecasting is a popular topic in the deep learning community, numerous deep learning architectures have been proposed to capture the complex patterns of time-series data [
51]. While LSTM remains a traditionally popular choice for tasks like time-series forecasting and rainfall–runoff modeling, other recurrent models like the vanilla RNN and Gated Recurrent Unit (GRU) have been applied in the hydrology domain [
52]. Additionally, recent works on transformer-based time-series architectures consistently demonstrate superior performance over RNN-based methods [
51,
53]. Zhang et al. [
52] simulated reservoir operation with vanilla RNN, GRU, and LSTM algorithms. By direct comparison on the same task, Zhang et al. [
52] showed that LSTM outperformed other RNN variants. Waqas and Wannasingha Humphries [
54] also compared LSTM networks to different architectures of RNNs and GRUs, and they proved the advantage of LSTM in hydrological time-series data. The scenario is not yet so different with a more advanced architecture, transformers. Originally proposed for Natural Language Processing (NLP), the complex internal design of transformers effectively captures sequential patterns in the data, making it a promising candidate for hydrological forecasting. However, without a sufficient amount of data to fit such a complex architecture of transformers, the simpler design of LSTM proved to be easier to generalize and thus produce preferable predictions [
51]. Aligned to the limitation of transformer models, Liu et al. [
55] compared the performance of transformers to LSTM networks on streamflow prediction with the CAMELS dataset, and discovered that the vanilla transformer failed to match the predictive skill of an LSTM model, particularly for high-flow metrics. Nevertheless, by carefully redesigning the attention mechanism and the internal layers, transformer-based models could indeed outperform LSTM-based models [
24,
55].
Given our goal of creating the first basin-wide Great Lakes dataset and the need for a robust, generalizable baseline model, the LSTM architecture offered the best balance between complexity and predictive performance. Future research could investigate whether transformer-based models, potentially adapted with hydrology-specific attention mechanisms, can further improve predictions for this transboundary system.
5.4. Is the Entity Awareness Advantageous?
Our results indicated the benefit of employing entity awareness explicitly in LSTM by manipulating the static attributes. However, Heudorfer et al. [
56] claimed that adding static features gave no out-of-sample benefit in experiments with 108 German groundwater wells. Similarly, Heudorfer et al. [
57] showed that deep learning hydrological models are entity-aware by themselves, but the main driver of their entity awareness is the dynamic variables, not static attributes. The source of the inconsistency in our work requires further investigation, but we suggest two possible explanations. First, 108 samples is arguably a small number compared to the 975 samples of our work, and the benefit of static attributes may become truly evident when there are sufficient training data to learn the complex patterns from basins with a variety of characteristics. Although the CAMELS dataset consists of more training samples, its diversity is still behind that of our dataset. Second, the entity awareness gained from static features might not be significantly advantageous when applied to out-of-sample basins. Heudorfer et al. [
57] argued that when the EA models were tested on catchments unseen during training, the superior performance was primarily driven by meteorological data, while the contribution of static features was limited. This observation is consistent with our finding that the spatial-split EA-LSTM underperforms compared to the temporal-split EA-LSTM. However, the fact that our spatial EA-LSTM performed comparably to the temporal LSTM still supports the utility of EA-LSTM’s dedicated input gate for static attributes [
43]. Nevertheless, training the patterns of the static inputs and the dynamic response from a large sample of gauged basins is practically beneficial since the model can generate more skillful predictions for locations without historical streamflow data [
20,
45].
While these comparisons highlight the challenges in isolating the contribution of static information, our additional ablation experiments make it clear that static attributes are indeed significant for achieving high predictive performance. When we removed all static inputs, the standard LSTM degraded sharply, with mean NSE dropping to negative values (
Table 11), confirming that static descriptors provide essential contextual information for basin heterogeneity. For EA-LSTM, the architecture cannot function with zero static attributes due to its design of a dedicated static-input gate. Therefore, longitude and latitude were retained as minimal static descriptors in the no-static setting. Under this constrained setup, EA-LSTM showed less severe degradation than the standard LSTM, suggesting that its architecture is able to systematically leverage even weak static signals to improve prediction. Importantly, the strongest overall performance was observed when static attributes were fully available, underscoring that static descriptors are indispensable for reaching the high skill scores reported in our main experiments. These findings support our broader claim that entity-aware architectures provide a principled way to disentangle and exploit static versus dynamic information, enabling more transferable predictions across heterogeneous basins. Future work should investigate whether certain categories of static features (e.g., topographic vs. land cover vs. climatic indices) drive disproportionate gains, which would help clarify the mechanisms of entity awareness beyond what we present here.
5.5. Strengths, Limitations, and Future Directions
Both our spatial- and temporal-split EA-LSTM models performed well across diverse hydrological regimes, from snowmelt-dominated northern catchments to rainfall-driven southern systems. The performance of our spatial-split model is especially noteworthy as the spatially split training strategy makes EA-LSTM particularly relevant for Prediction in Ungauged Basins (PUB). However, consistent with other studies, we observed reduced predictive skill in very small catchments and for extreme flow events [
46,
58]. Small basins often exhibit rapid, localized responses not fully captured by daily inputs, while extreme events, being rare, are challenging for models optimized on overall performance metrics. These limitations highlight the need for careful model application and potentially specialized modeling approaches for such specific conditions. Challenges also remain in heavily managed basins where anthropogenic influences, not captured by standard inputs, dominate flow regimes [
18]. Investigating the factors driving this performance gap presents a clear direction for future research.
While many works, including ours, prove the advantage of employing deep learning for hydrologic modeling, deep learning is enabled only when there is sufficient data to power the model. In practice, obtaining such high-quality data to train a well-performing deep learning model is often the biggest challenge. The creation of this first integrated, cross-border Great Lakes hydrometeorological dataset was a non-trivial effort. In addition, the proliferation of large-sample datasets [
8,
59] and techniques like transfer learning and self-supervised learning offer pathways to improve predictions even with limited local data [
60].
6. Conclusions
This study developed and evaluated the first comprehensive Entity-Aware Long Short-Term Memory (EA-LSTM) model for streamflow prediction across the Laurentian Great Lakes basin, using an integrated cross-border dataset of 975 U.S. and Canadian catchments. The unified EA-LSTM framework consistently outperformed both the operational NOAA National Water Model and a standard LSTM baseline, demonstrating the potential of entity-aware architectures to serve as a basin-wide data-driven forecasting tool.
Despite these advances, several limitations remain. Model skill was reduced in small and highly urbanized basins, as well as during extreme events, underscoring challenges in representing rapid local hydrological responses. In addition, benchmark comparisons with process-based models highlight trade-offs between physical interpretability and predictive accuracy that warrant further investigation.
Looking forward, this work provides a foundation for the next generation of hydrological forecasting research:
Developing and sharing unified cross-border datasets and using EA-LSTM as a strong, generalizable baseline model for large-scale streamflow prediction;
Addressing limitations through enhanced treatment of small and urban catchments, improved extreme-event prediction, and continued benchmarking against conceptual and physically based models;
Advancing model design by integrating physics-guided deep learning, transfer learning for data-sparse regions, hydrology-aware transformer architectures, and explicit inclusion of anthropogenic and urban covariates.
Together, these directions outline a pathway toward more adaptive, interpretable, and resilient data-driven hydrological prediction for transboundary freshwater systems.