1. Introduction
The Laurentian Great Lakes—Superior, Michigan, Huron, Erie, and Ontario—form the largest unfrozen surface freshwater system in the world. With a collective surface area of around 244,000 sq. km, roughly equal to that of the United Kingdom, the Great Lakes contain 84% of North America’s surface freshwater [
1]. The Great Lakes are also sometimes referred to as inland seas as they exhibit many sea-like characteristics, such as distant horizons, great depths, and rolling waves.
Given their sheer size and the open water’s larger thermal inertia, lower surface friction and albedo than the surrounding watershed, the Great Lakes exert a significant influence on the regional climate [
2,
3,
4]. Lake surface temperature (LST) in particular plays a key role in defining the regional climate through various lake–atmosphere interactions [
5,
6,
7]. For example, during the cold season, the LST affects the magnitude of lake-effect snowfall by influencing moisture transfer from the warmer lake surface to the cooler overlying atmosphere. The spatial variability in LST, in particular, has also been shown to affect lake-effect snowfall through changes in the surface wind convergence and local vertical motions [
8]. During the warm seasons, the cooler lake surface relative to the overlying air promotes atmospheric stability resulting in diminished cloud cover and increased surface downward shortwave radiation flux [
4]. Great Lakes LST also plays an important role in the spatial distribution of the convective summer precipitation over the Great Lakes region [
6]. The evaporation of the Great Lakes, a significant component in the Great Lakes water balance and associated water level variation, is also a function of LST [
9,
10]. Furthermore, in the nearshore regions of the Great Lakes, the difference in LST and land temperature creates a sharp water–land gradient in the temperature and density of near-surface air, resulting in a strong lake breeze circulation [
11]. Therefore, an accurate historical record of the Great Lakes LST is crucial for understanding the Great Lakes climate and weather events.
The need for an accurate LST estimate is made even more pressing by the fact that Regional Climate Models (RCMs) encompassing the Great Lakes require accurate LST as the overlake surface boundary condition to accurately calculate the complex lake–atmosphere dynamics of the Great Lakes [
6,
7,
12,
13]. For example, Wang et al. [
6] demonstrated the sensitivity of the climate to LST by showing that two different LST datasets, when used as overlake surface boundary conditions, resulted in a significantly different simulation for the summer precipitation over the Great Lakes region through differences in the moisture transport and convective environment. Hence, from a modeling standpoint, an accurate dataset for the Great Lakes LST is vital to serve as a boundary condition for climate simulations.
Globally, multiple studies have made efforts to synthesize various sources of LST to produce a comprehensive LST dataset on a global scale [
14] and on sub-global scales such as for the lakes located in Europe [
15], United States (US) [
16], France [
17], Tibetan Plateau [
18,
19], North Slave region [
20], Alpine region [
21] and sub-Alpine region [
22]. The objective for creating these datasets was to offer the scientific community an alternative, extensive, and reliable LST dataset that could be used for multiple purposes such as enhancing the understanding of lake limnology and weather events, evaluating the effects of climate change, and validating models [
14,
16]. Some datasets were exclusively derived from satellite onboard sensors such as the Advanced Very High Resolution Radiometer (AVHRR), which was used to derive the LST for European lakes [
15,
21] and Tibetan Plateau lakes [
18]; while other datasets used a combination of satellite and in situ measurements, such as the global LST dataset produced by Sharma et al. [
14] which provides LST for 291 lakes during 1985–2009.
However, as of this writing, the historical LST of the Great Lakes has a limited observation record. A widely used historical LST dataset for the Great Lakes is the satellite-based Great Lakes Surface Environmental Analysis (GLSEA) from the National Oceanic and Atmospheric Administration (NOAA) Great Lakes Environmental Research Laboratory (GLERL) [
23]. With a horizontal spatial resolution of ~1.3 km, it is considered to be one of the most accurate available gridded datasets for the Great Lakes and has been used for lake heatwave analysis [
24], model validation [
7,
25], and overlake surface boundary conditions by several Great Lakes studies [
6,
8,
12]. However, GLSEA only dates back to 1995 and with fewer than three decades of data available, it is not ideally long enough to aid climate simulations or LST analysis at a climatic timescale that spans at least several decades or even centuries. For example, it is not long enough to calculate the climate normal of LST. A climate normal is a baseline value used to assess climate variability and, as recommended by the World Meteorological Organization (WMO), is calculated as the 30-year average of a variable’s observation. The Optimum Interpolation Sea Surface Temperature dataset version 2.1 (OISST) [
26] from NOAA is another gridded dataset that contains the LST for the Great Lakes. OISST is a global sea surface temperature (SST) dataset and compared to GLSEA, it covers a longer period: 1982 to present. OISST combines a wide range of SST measurements, including in situ SST measurements from ships and buoys from the International Comprehensive Ocean–Atmosphere Datasets Release 3.0.2 (ICOADS R3.0.2) [
27] and Argo floats [
28], as well as remotely sensed AVHRR observations from Pathfinder v5.0, Pathfinder v5.1, and the US Navy [
29], which are later bias-corrected using ship, buoy, and Argo float observations. OISST provides daily LST of the Great Lakes, but its global scope limits its data to a much coarser horizontal spatial resolution (0.25° × 0.25°), while also suffering from noticeable LST biases across the Great Lakes [
30]. The European Centre for Medium-Range Weather Forecasts atmospheric reanalysis of the global climate, version 5 (ERA5) [
31] also provides a gridded estimate for the Great Lakes LST but suffers from significant cold (warm) biases during pre-2014 winter (summer) due to erroneous treatment of data [
6,
32]. Furthermore, the Great Lakes have a scarce deployment of buoys, with nine moored buoys from the National Buoy Data Center (NDBC) deployed within the Great Lakes. These moored buoys have intermittent observation records as far back as 1979 but, by nature, only provide LST at a few deployed locations, which is insufficient to create a spatial map of the LST. Thus, an accurate historical estimate of the Great Lakes LST that spatially covers the lakes at a high spatial resolution and has a longer temporal coverage of over 30 years is greatly needed by the Great Lakes scientific community.
Hydrodynamic and climate modeling communities have made continuous efforts to address the lack of a long-term, high-resolution Great Lakes LST dataset by generating simulated LST data. From simple extrapolation of ocean surface temperature [
33] to widely used one-dimensional (1D) lake models, such as the freshwater lake model FLake [
34], to more recent and more accurate three-dimensional (3D) hydrodynamic lake models [
7,
12,
35], representations of the Great Lakes within RCMs have been constantly improving over the years. However, 1D lake models suffer from significant LST biases which have been extensively discussed in past studies [
12,
13]. 3D lake models, although an improvement upon the 1D models, are computationally expensive and require significant efforts to implement within the RCM framework. Therefore, an approach that can effectively and efficiently reconstruct the long-term LST at a high spatial resolution is highly desired.
In recent years, in parallel with the advancement of complex physics-based hydrodynamic modeling, deep learning, which is a specialized subset of machine learning (ML) that is based on neural networks, has become an attractive alternative to estimate the SST and LST in both marine and freshwater systems [
16,
36,
37,
38,
39,
40,
41,
42,
43,
44,
45,
46,
47,
48,
49,
50,
51]. Deep learning models based on Multi-layer Perceptron Neural Network (MLPNN), Physics-Guided Recurrent Neural Networks (PGRNN), Long Short-Term Memory (LSTM), Gated Recurrent Unit (GRU), and Transformer have all been used to predict the LST in various locations around the world such as Poland [
37,
48], US [
16,
38,
39,
49], China [
51], and Turkey [
47]. For instance, by introducing a PGRNN model that incorporated physical laws into an LSTM, Jia et al. [
39] focused on Lake Mendota in the US and demonstrated that training the PGRNN model with seven meteorological variables resulted in superior LST prediction performance compared to a physics-based model. On a larger scale, Willard et al. [
16] produced a daily LST dataset for 185,549 lakes located within the US using deep learning. More specifically, Willard et al. [
16] adapted an LSTM that was trained using static lake-specific variables (location, elevation, and surface area) and dynamic lake-specific variables (air temperature, wind speed, and radiation) to produce daily LST during 1980–2020 for lakes bigger than 40,000 m
2 within the conterminous US. It should be noted that Willard et al. [
16]’s dataset does not provide LST for the Great Lakes, likely due to the difficulty in capturing the spatial LST variability of such a large system. Deep learning techniques, especially the ones that consist of LSTM, which is adept at learning long-term dependencies in a time-series, have also been widely used to predict SSTs in the East China Sea [
42,
43], Korean Sea [
46], Indian Ocean [
44,
45], and the tropical Pacific Ocean region [
50].
While deep learning has been extensively used in various regions around the world, its implementation in the context of the Great Lakes remains relatively underdeveloped within the scientific community. One important reason for this underdevelopment is the fact that the Great Lakes exhibit sea-like characteristics and strong inter-lake and intra-lake LST variabilities. Only a small number of studies have employed deep learning or other ML techniques in the context of the Great Lakes. These studies utilized techniques such as LSTM, Extreme Gradient Boosting (XGBoost), and MLPNN to simulate streamflow [
52], waves [
53,
54], and pollutant concentrations [
55] within the Great Lakes. In regard to LST, Xue et al. [
56] showed that an LSTM model, trained on just 14 sampling locations per lake, can effectively and efficiently reproduce the spatiotemporal evolution of the entire Great Lakes LST. Their LSTM model was trained using seven meteorological features from global reanalysis data and LST from GLSEA, showcasing the potential of LSTM to generate a comprehensive LST dataset specific to the Great Lakes.
Therefore, in this study, we employed a deep learning model based on the LSTM neural network to reconstruct a daily gridded LST dataset for the Great Lakes that spans from 1979 to 2020. Our new LST dataset, which is available at an open-access data repository [
57], has a high spatial resolution (~1 km) and dates back 16 years prior to the oldest GLSEA record. It, therefore, supplements the current datasets and gives the Great Lakes scientific community a unique and viable alternate source of LST for overlake surface boundary condition in RCMs, lake model validations, and a long-term spatiotemporal LST analysis.
The remainder of this paper is organized as follows:
Section 2 gives an overview of the study area, the Great Lakes, and elaborates on the methodology used to predict the LST using the LSTM model as well as the development, training, and validation of the LSTM model.
Section 3 presents the comparison between the LSTM-predicted LST and the LST from currently available datasets.
Section 4 delves into a comprehensive discussion that highlights the significance of the present study, explores uncertainties associated with LST datasets, and discusses potential areas for future improvements. Finally,
Section 5 provides a summary and concluding remarks.