Opportunities and Limits of Using Meteorological Reanalysis Data for Simulating Seasonal to Sub-Daily Water Temperature Dynamics in a Large Shallow Lake

In lakes and reservoirs, physical processes control temperature dynamics and stratification, which are important determinants of water quality. In large lakes, even extensive monitoring programs leave some of the patterns undiscovered and unresolved. Lake models can complement measurements in higher spatial and temporal resolution. These models require a set of driving data, particularly meteorological input data, which are compulsory to the models but at many locations not available at the desired scale or quality. It remains an open question whether these meteorological input data can be acquired in a sufficient quality by employing atmospheric models. In this study, we used the European Centre for Medium-Range Weather Forecasts’ (ECMWF) ERA-Interim atmospheric reanalysis data as meteorological forcing for the three-dimensional hydrodynamic General Estuarine Transport Model (GETM). With this combination, we modelled the spatio-temporal variation in water temperature in the large, shallow Lake Chaohu, China. The model succeeded in reproducing the seasonal patterns of cooling and warming. While the model did predict diurnal patterns, these patterns were not precise enough to correctly estimate the extent of short stratification events. Nevertheless, applying reanalysis data proved useful for simulating general patterns of stratification dynamics and seasonal thermodynamics in a large shallow lake over the year. Utilising reanalysis data together with hydrodynamic models can, therefore, inform about water temperature dynamics in the respective water bodies and, by that, complement local measurements.


Introduction
Shallow lakes are widely distributed across the globe [1], which is reflected in the low global average depths of 3.5 m for lakes in the smallest size class of 0.1-1 km 2 and generally low average depths across different continents [2]. Given their low average water depth, the water column of shallow lakes heats up faster compared to a deeper lake with the same surface area in the same climatic region [3,4]. This results in both larger diurnal temperature fluctuations as well as larger seasonal temperature ranges in shallow lakes. The morphological characteristics make shallow lakes whole time period considered, resulting in a historically coherent dataset that is independent of methodological changes [48]. In addition, with the reanalysis approach being spatial, global coverage can be achieved [49]. The European Centre for Medium-Range Weather Forecasts (ECMWF, [48]) provides the ERA-Interim reanalysis consisting of a wide range of meteorological variables with a global coverage on a grid of 0.75 degrees (approx. 80 km) resolution. Only few lake-modelling studies have made use of reanalysis data. Layden et al. [50] used data from ERA-Interim to drive a global model application and estimate lake surface temperatures with the model FLake. Schmid et al. [51] used data from the NOAA NCEP-NCAR CDAS-1 reanalysis project to simulate CO 2 concentrations and temperature dynamics in a lake that is located in a data-scarce region. Piccolroaz and Toffolon made use of the consistency of reanalysis products and ran a long-term simulation for Lake Baikal [52]. Xue et al. [53] applied wind input from different sources as driving data for a three-dimensional model of Lake Superior. They found that wind input from a weather forecast model or from reanalysis data produced better modelling results than wind derived from local observations. This was due to the fact that spatial wind patterns, especially at the shore of the lake, were not captured by the observations [53].
In this study, we try to find an approach for modelling and analysing thermal and stratification dynamics in areas where only insufficient data is available. Sparse measurements often limit the validation of simulated overall circulation. This approach, therefore, requires some pragmatism in deriving information. It is our aim to explore the opportunities as well as the limits of using freely available reanalysis data for modelling seasonal to sub-daily temperature variability in large shallow lakes. The realistic reproduction of temperature dynamics and stratification periods is used as a criterion for evaluating the model performance. Analysing the hydrodynamic processes causing the stratification patterns remains out of the scope of the present study. As an exemplary test, we simulate the fifth-largest lake in China, Lake Chaohu, with the 3D hydrodynamic model GETM and spatially uniform atmospheric forcing extracted from the ERA-Interim reanalysis data set. We compare simulated with measured water temperatures at several locations within the lake over the course of one year.

Study Site
Lake Chaohu is located in the lower Yangtze River basin (31 • 43 -31 • 72 N, 117 • 29 -117 • 85 E, Figure 1). Its area covers 780 km 2 with a maximum length of 54.5 km from east to west and a maximum width of 21.0 km from south to north. Its average depth is approximately 3 m, and its maximum depth is 6 m [54]. Lake Chaohu is a major water body in the province of Anhui: the water of Lake Chaohu provides drinking water to approximately 250,000 people [55]. It serves for fisheries, transportation of goods, and as a recreational site. The two cities located at the shore of Lake Chaohu are Hefei in the west with approximately 7.6 million inhabitants [56] and Chaohu City in the east with approximately 1 million inhabitants [57].
Lake Chaohu is in a eutrophic state and each year from spring until late summer large cyanobacterial blooms develop in the lake and form surface scums, deteriorating the lake's water quality, leading to insecure drinking water supply and economic losses [58]. Lake Chaohu has 10 main tributaries that drain directly into the lake. Most of the inflows are located in the western part of the lake (Figure 1), and have small catchments and intermittent flows (for more information see Kong et al. [59]). Since the construction of a dam at the outflow in the eastern part of the lake in the early 1960s [59], the residence time in Lake Chaohu has varied between 160-210 days [59,60]. Location of Lake Chaohu in China (left) and model bathymetry for Lake Chaohu, including positions of the thermistor chains (right) and the main inflows; thermistor chain "E" and "K" are not shown because they were lost after a very short time of measurement and were not replaced; "mO" is the local meteorological station, "mR" is the grid point of the reanalysis; the outflow close to station J in the east of Lake Chaohu is regulated by a dam.

Lake Model
For the present study, we used the hydrostatic version of GETM to simulate Lake Chaohu on a spherical grid with approximately 500 m horizontal resolution. In the vertical, 7 terrain-following (sigma-) layers with a zooming towards the surface and bottom were utilised, resulting in a maximum layer thickness of 0.64 m. This provided sufficient vertical resolution to resolve the shear in boundary layers and the stratification throughout the water column, both required by GOTM to calculate vertical diffusivities from a well-calibrated k-epsilon closure. In order to support the accurate simulation of stratified water columns, the second-order TVD-Superbee scheme was used to reduce spurious numerical mixing of the thermocline [61]. Because of missing discharge data, the lake was treated as a closed basin of constant water volume. Heat loss through evaporation, however, was included in the simulation. We realised that this was a rather crude approximation, but it was our intention to follow a pragmatic approach that is easily applicable but also excluded uncertainties arising from poorly quantified or biased input data. A previous study by Chen and Liu [62] showed the limited influence of the inflows and the outflow on the whole lake thermodynamics and currents within the lake. In their modelling study, the outflow mainly influenced currents close to the dam in the eastern bay close to Chaohu City, while inflows had a negligible influence on the lake's hydrodynamics. Another study by Huang et al. [60] found that wind had a larger effect on the lake's hydrodynamics than changing in-and outflows. Additionally, while inflows could certainly affect nutrient dynamics in the lake, which were not considered here, its effect on thermodynamics and water temperatures would certainly be negligible, as energy exchange with the atmosphere had far more impact than energy exchange by in-and outflows.
For the calculation of longwave radiation, the equation of Idso and Jackson [63] was implemented into GETM. This equation had been used in other lake models (e.g., the General Lake Model (GLM; [64])) and showed a better model fit compared to a simulation with GETM's standard equation by Clark, et al. [65] (see Table S1). Sensible and latent heat fluxes were calculated via bulk formulae according to Kondo [66].
Attenuation of shortwave radiation in the water column due to algal biomass and suspended matter was described by Jerlov coefficients that equal an extinction coefficient of 4.0 or an equivalent Secchi depth of about 0.36 m, which was the annual average in Lake Chaohu [67].

Atmospheric Forcing Data
Meteorological input data were obtained from the global reanalysis project ERA-Interim, developed and maintained by the European Centre for Medium-Range Weather Forecasts (ECMWF, [48]). Data were downloaded from the public dataset web interface (http://apps.ecmwf.int/datasets/). ERA-Interim generates data with a resolution of 0.75 degrees (approximately 80 km horizontal Figure 1. Location of Lake Chaohu in China (left) and model bathymetry for Lake Chaohu, including positions of the thermistor chains (right) and the main inflows; thermistor chain "E" and "K" are not shown because they were lost after a very short time of measurement and were not replaced; "mO" is the local meteorological station, "mR" is the grid point of the reanalysis; the outflow close to station J in the east of Lake Chaohu is regulated by a dam.

Lake Model
For the present study, we used the hydrostatic version of GETM to simulate Lake Chaohu on a spherical grid with approximately 500 m horizontal resolution. In the vertical, 7 terrain-following (sigma-) layers with a zooming towards the surface and bottom were utilised, resulting in a maximum layer thickness of 0.64 m. This provided sufficient vertical resolution to resolve the shear in boundary layers and the stratification throughout the water column, both required by GOTM to calculate vertical diffusivities from a well-calibrated k-epsilon closure. In order to support the accurate simulation of stratified water columns, the second-order TVD-Superbee scheme was used to reduce spurious numerical mixing of the thermocline [61]. Because of missing discharge data, the lake was treated as a closed basin of constant water volume. Heat loss through evaporation, however, was included in the simulation. We realised that this was a rather crude approximation, but it was our intention to follow a pragmatic approach that is easily applicable but also excluded uncertainties arising from poorly quantified or biased input data. A previous study by Chen and Liu [62] showed the limited influence of the inflows and the outflow on the whole lake thermodynamics and currents within the lake. In their modelling study, the outflow mainly influenced currents close to the dam in the eastern bay close to Chaohu City, while inflows had a negligible influence on the lake's hydrodynamics. Another study by Huang et al. [60] found that wind had a larger effect on the lake's hydrodynamics than changing in-and outflows. Additionally, while inflows could certainly affect nutrient dynamics in the lake, which were not considered here, its effect on thermodynamics and water temperatures would certainly be negligible, as energy exchange with the atmosphere had far more impact than energy exchange by in-and outflows.
For the calculation of longwave radiation, the equation of Idso and Jackson [63] was implemented into GETM. This equation had been used in other lake models (e.g., the General Lake Model (GLM; [64])) and showed a better model fit compared to a simulation with GETM's standard equation by Clark, et al. [65] (see Table S1). Sensible and latent heat fluxes were calculated via bulk formulae according to Kondo [66].
Attenuation of shortwave radiation in the water column due to algal biomass and suspended matter was described by Jerlov coefficients that equal an extinction coefficient of 4.0 or an equivalent Secchi depth of about 0.36 m, which was the annual average in Lake Chaohu [67].

Atmospheric Forcing Data
Meteorological input data were obtained from the global reanalysis project ERA-Interim, developed and maintained by the European Centre for Medium-Range Weather Forecasts (ECMWF, [48]). Data were downloaded from the public dataset web interface (http://apps.ecmwf.int/datasets/). ERA-Interim generates data with a resolution of 0.75 degrees (approximately 80 km horizontal resolution). Based on spatial interpolation, a resolution of 0.125 degrees (approximately 14 km horizontal resolution) is also available. These interpolated data allowed the derivation of the meteorological states at locations between the original grid nodes and permitted us to obtain data from one location in the center of the lake (31 • 30 N and 117 • 30 E). For this study, the provision of non-uniform meteorological input data was not considered appropriate because the comparatively coarse spatial resolution of ERA-Interim, compared to the dimensions of the lake (ca. 30 × 50 km), could not properly resolve the orographic features around the lake. Spatially heterogeneous meteorological input data would, thus, increase the complexity while introducing additional uncertainty, e.g., by neglecting the "urban heat island effect" [68] of the large cities Hefei and Chaohu City that are located in the west and east of the lake. We therefore decided to test the feasibility of a uniform atmospheric forcing and applied data from the single grid point over the whole model domain. The downloaded data included the variables air temperature (K), dewpoint temperature (K), 10 m U and V wind component (m s −1 ), total cloud cover (-) and mean sea-level pressure (Pa). The data were available at 6-hourly time steps from the analysis (00:00, 06:00, 12:00 and 18:00) and 6-hourly time steps as forecast values (at 03:00, 09:00, 15:00 and 21:00), resulting in a temporal resolution of 3 h. Preceding the simulation, these meteorological data were linearly interpolated to hourly values. Air pressure values were taken as pressure at mean sea level, which complied with the low altitude of the lake of 8.4 m above sea level [59].
For a comparison of reanalysis data with locally measured data, daily averaged air temperature was downloaded from the NOAA data base (station Hefei, 31 • 52 N, 117 • 11 E; time period 2014-2016, [69]) and measured wind speed and direction were available from one station close to the lake (station Hefei, 31 • 52 N, 117 • 11 E) for part of the year 2015. Wind was measured at 10 m above ground, but gaps existed in the months May and August, and September was missing completely. Both air temperature and wind data were available at daily resolution.
Reanalysis data were compared to local measurements when available. For this comparative analysis, the downloaded reanalysis data, with the time step of 3 h, had to be averaged to daily data in order to achieve the same temporal resolution as the observed, daily averaged data.

In-Lake Measurements
From 1 November 2015 to 31 December 2016, thermistor chains were deployed and distributed evenly across the lake ( Figure 1, Table S2). Stations K and L were deployed later in the project. Stations E and K (not shown in the map, Figure 1) were lost very soon after deployment and were not replaced. Data from these two stations were, therefore, not considered in our analyses. This led to a dataset of up to 10 locations at the same time. Each thermistor string consisted of three loggers (HOBO Onset ® , TidbiT; accuracy ±0.2 • C), located at a depth of 0.3 m below the surface as well as 0.2 and 1.5 m above the bottom. The measurement interval was set to 15 min. Due to the loss of loggers, large data gaps existed. However, missing loggers in the thermistor chain were replaced with new loggers as soon as possible; sometimes the whole thermistor chain had to be replaced. Due to the large data gaps, we focused most of our analysis on those stations that had surface and bottom temperatures available over several weeks at one time (stations A, B, D, F and J).

Analysis
The quality of the model results was quantified by calculating the Nash-Sutcliffe efficiency (NSE), the root mean square error (RMSE), and the mean absolute error (MAE), as: where y i was the observed, andŷ i the simulated water temperatures; y was the mean observed water temperature at time i; and n was the overall number of samples. Sub-daily temperature measurements at nine different locations in the lake were used for validation, because a good agreement between model and observation at different sites in the lake could only be reached by correctly simulating heat exchange across the surface as well as advection and diffusion within the lake. Stratification duration in observed as well as simulated data was quantified by first calculating the temperature difference T diff between surface and bottom temperature per station. We defined the water column as being stratified whenever this temperature difference surpassed 0.5 K (compare [70]). The stratification duration was then simply the accumulated time with T diff above 0.5 K. We referred to the time between mixed conditions as a 'stratification event'. The calculations were done for both observation and simulation only at times where both surface and bottom data were available from observations. Observations and simulation results were compared for stratification duration and for the amount of stratification events that lasted longer than 1 day.

Meteorological Data
Reanalysis air temperatures (T rean ) matched well with observed ones (T obs ) ( Figure 2, T rean = 0.98 × T obs + 0.54, r 2 = 0.9828, p-value: <0.001). Wind speed from reanalysis data overestimated measured data on average by 0.27 m s −1 , i.e., 12%. Locally measured wind direction showed a higher variability than the reanalysed data. For the year 2015, reanalysis data showed the main wind direction from ENE, while local measurements did not display a main wind direction ( Figure 2). where yi was the observed, and ŷi the simulated water temperatures; was the mean observed water temperature at time i; and n was the overall number of samples. Sub-daily temperature measurements at nine different locations in the lake were used for validation, because a good agreement between model and observation at different sites in the lake could only be reached by correctly simulating heat exchange across the surface as well as advection and diffusion within the lake. Stratification duration in observed as well as simulated data was quantified by first calculating the temperature difference Tdiff between surface and bottom temperature per station. We defined the water column as being stratified whenever this temperature difference surpassed 0.5 K (compare [70]). The stratification duration was then simply the accumulated time with Tdiff above 0.5 K. We referred to the time between mixed conditions as a 'stratification event'. The calculations were done for both observation and simulation only at times where both surface and bottom data were available from observations. Observations and simulation results were compared for stratification duration and for the amount of stratification events that lasted longer than 1 day.

Field Observations
The observed water temperatures followed the seasonal course in air temperatures ( Figure 3). Over the whole measurement period water temperature ranged between 0.9 • C and 29.7 • C. It should be noted that no data were taken in midsummer, i.e., the maximum water temperature could be higher. Cooling and warming of the lake did not occur continuously. Instead, water temperature dropped or increased stepwise by several Kelvin over a few days (e.g., around day of the year 274 and day of the year 300, Figure 3 and 14th-18th of April, Figure 4). Exemplarily, the maximal gradients in the observation were a 3.2 K decrease over two days (1.

Field Observations
The observed water temperatures followed the seasonal course in air temperatures (Figure 3). Over the whole measurement period water temperature ranged between 0.9 °C and 29.7 °C. It should be noted that no data were taken in midsummer, i.e., the maximum water temperature could be higher. Cooling and warming of the lake did not occur continuously. Instead, water temperature dropped or increased stepwise by several Kelvin over a few days (e.g., around day of the year 274 and day of the year 300, Figure 3 and 14th-18th of April, Figure 4). Exemplarily, the maximal gradients in the observation were a 3.2 K decrease over two days (1.  Although the lake is shallow, vertical stratification was observed, which could last for a few days (e.g., [8][9][10][11] April and 12-14 April, Figure 4, station F was chosen as an example since it had both surface and bottom data available over a longer period both in spring and in autumn). The maximum difference between bottom and surface temperature was observed at station B with 6.9 K at 9 April 2016. With a temperature range of up to 6.5 K, surface temperatures showed stronger daily fluctuations compared to bottom temperatures having a maximum daily temperature range of 3.7 K.
During the year 2016, a maximum of 10 stratification events where the lake remained stratified for more than one day was observed at a single station (station D, Figure 5A). The longest observed stratification duration was 5.6 days (in April at station B). The median of the observed stratification duration was 3.5 h. Most observed stratification events had a short duration, i.e., 0.5-2 h ( Figure 5B). During the time observed in the year 2016, the lake was stratified on average for 22% of all observations (30-min measurement interval, Table 1). Note that this number is relative to the available measurements and cannot be related to the whole year 2016. Large data gaps exist, especially in summer.   Although the lake is shallow, vertical stratification was observed, which could last for a few days (e.g., [8][9][10][11] April and 12-14 April, Figure 4, station F was chosen as an example since it had both surface and bottom data available over a longer period both in spring and in autumn). The maximum difference between bottom and surface temperature was observed at station B with 6.9 K at 9 April 2016. With a temperature range of up to 6.5 K, surface temperatures showed stronger daily fluctuations compared to bottom temperatures having a maximum daily temperature range of 3.7 K. During the year 2016, a maximum of 10 stratification events where the lake remained stratified for more than one day was observed at a single station (station D, Figure 5A). The longest observed stratification duration was 5.6 days (in April at station B). The median of the observed stratification duration was 3.5 h. Most observed stratification events had a short duration, i.e., 0.5-2 h ( Figure 5B). During the time observed in the year 2016, the lake was stratified on average for 22% of all observations (30-min measurement interval, Table 1). Note that this number is relative to the available measurements and cannot be related to the whole year 2016. Large data gaps exist, especially in summer.    , right). Distribution of the length of stratification events lasting shorter than one day (B); cumulative sum of stratified time for those stratification events (C); for panel B and C data were taken from stations A, B, D, F and J. Note that the difference between stations in panel A can result from a different amount of data available from those stations. Table 1. Overall duration of water temperature observations at different stations in Lake Chaohu; duration of stratified hours derived from observation and simulation; difference in prediction of stratified conditions. Data gaps exist and the numbers do not cover a complete year. The comparison below was done for those times when observations were available, i.e., only a subset of the simulation was taken into account. Most of the time observed, surface water temperatures differed between stations across the lake (Figure 6), demonstrating the heterogeneity of the water body in the horizontal direction. During the observation, horizontal temperature differences reached a maximum of 4.9 K at the surface, over a distance of 21.5 km between stations H and J and a maximal difference of 3.7 K at the bottom over a distance of 24.7 km between stations D and F.  Most of the time observed, surface water temperatures differed between stations across the lake (Figure 6), demonstrating the heterogeneity of the water body in the horizontal direction. During the observation, horizontal temperature differences reached a maximum of 4.9 K at the surface, over a distance of 21.5 km between stations H and J and a maximal difference of 3.7 K at the bottom over a distance of 24.7 km between stations D and F.

Simulation Results
Overall, the simulation of water temperatures showed reasonable agreement with observations ( Figure 7, Tables S1 and S3, Figures S1-S3). Surface temperatures were slightly underestimated in winter and overestimated in summer. In the months November and October 2016, simulated temperature was on average 1.6 K lower than the observed temperature. In June 2016 (the summer month with available data), simulated temperatures were on average 0.5 K higher compared to observed ones. This bias can also be seen from the linear regression for the complete simulation run (surface temperatures: Tmeas = 1.04 × Tsim − 1.35, r 2 = 0.948, NSE = 0.92, RMSE = 1.61, MAE = 2.61, n = 71,541, Table S1 and Figure S1). Bottom temperatures were generally slightly underestimated, on average 1.  Table S1 and Figure S1).
The model reproduced the seasonal trend of surface and bottom temperature (Figures 7 and 8; here we exemplarily show station B, which had the least data gaps of all stations) as well as observed cooling events that lasted over a few days (Figure 8). Simulated temperatures showed diurnal stratification, too. However, the simulated stratification was more pronounced, occurred more frequently, and showed a larger temperature difference between bottom and surface temperature

Simulation Results
Overall, the simulation of water temperatures showed reasonable agreement with observations ( Figure 7, Tables S1 and S3, Figures S1-S3). Surface temperatures were slightly underestimated in winter and overestimated in summer. In the months November and October 2016, simulated temperature was on average 1.6 K lower than the observed temperature. In June 2016 (the summer month with available data), simulated temperatures were on average 0.5 K higher compared to observed ones. This bias can also be seen from the linear regression for the complete simulation run (surface temperatures: T meas = 1.04 × T sim − 1.35, r 2 = 0.948, NSE = 0.92, RMSE = 1.61, MAE = 2.61, n = 71,541, Table S1 and Figure S1). Bottom temperatures were generally slightly underestimated, on average 1.  Table S1 and Figure S1).
The model reproduced the seasonal trend of surface and bottom temperature (Figures 7 and 8; here we exemplarily show station B, which had the least data gaps of all stations) as well as observed cooling events that lasted over a few days (Figure 8). Simulated temperatures showed diurnal stratification, too. However, the simulated stratification was more pronounced, occurred more frequently, and showed a larger temperature difference between bottom and surface temperature compared to observations (Figure 4). While the amount of longer stratification events showed a good fit for the western part of the lake ( Figure 5A, stations A, B and D), the fit was poor for the eastern part of the lake ( Figure 5A, stations F and J). Combining information from all five stations, the simulated stratification duration lasted longer ( Figure 5B) and thus did not show the high number of very short stratification events (0.5-4 h) as was observed ( Figure 5B). Overall, the model overestimated the number of stratified hours in the lake by 11% ( Figure 5C, Table 1). This comparison showed a good fit for stations B and J, whereas the weakest fit was obtained at station F (Table 1). This result is reflected in the model fits separated by station (Table S3), where the linear model for bottom temperatures at station F appeared to be not significant.
Water 2018, 10, x FOR PEER REVIEW 10 of 17 compared to observations (Figure 4). While the amount of longer stratification events showed a good fit for the western part of the lake ( Figure 5A, stations A, B and D), the fit was poor for the eastern part of the lake ( Figure 5A, stations F and J). Combining information from all five stations, the simulated stratification duration lasted longer ( Figure 5B) and thus did not show the high number of very short stratification events (0.5-4 h) as was observed ( Figure 5B). Overall, the model overestimated the number of stratified hours in the lake by 11% ( Figure 5C, Table 1). This comparison showed a good fit for stations B and J, whereas the weakest fit was obtained at station F (Table 1). This result is reflected in the model fits separated by station (Table S3), where the linear model for bottom temperatures at station F appeared to be not significant.   Water 2018, 10, x FOR PEER REVIEW 10 of 17 compared to observations (Figure 4). While the amount of longer stratification events showed a good fit for the western part of the lake ( Figure 5A, stations A, B and D), the fit was poor for the eastern part of the lake ( Figure 5A, stations F and J). Combining information from all five stations, the simulated stratification duration lasted longer ( Figure 5B) and thus did not show the high number of very short stratification events (0.5-4 h) as was observed ( Figure 5B). Overall, the model overestimated the number of stratified hours in the lake by 11% ( Figure 5C, Table 1). This comparison showed a good fit for stations B and J, whereas the weakest fit was obtained at station F (Table 1). This result is reflected in the model fits separated by station (Table S3), where the linear model for bottom temperatures at station F appeared to be not significant.

Discussion
In this paper, a three-dimensional coastal ocean model was applied to simulate thermal and stratification dynamics of a large shallow lake in the Yangtze River basin in China. The model was driven by ERA-Interim reanalysis data. Measured by common model fit metrics (NSE, RMSE, MAE), the seasonal dynamics in surface and bottom temperature were reproduced well. The combination of data and model succeeded in reproducing synoptic-scale cooling and warming events. However, diurnal stratification patterns predicted by the model were too regular compared to that observed: stratification occurred more often in the model and, on average, lasted longer compared to observations. Stratification in shallow lakes is heavily affected by wind and by heat fluxes at the water surface [71]. A comparison of wind data obtained from the reanalysis data and local measurements showed that wind speed was overestimated in the reanalysis data (0.27 m s −1 , i.e., 12%) and that wind directions diverged from each other. However, wind is unlikely to be responsible for the mismatch in stratification. Overestimations in wind speed would rather weaken instead of intensifying stratification in the model. Although wind directions in the reanalysis data differed from measured directions, they were similar to what has been denoted as the main wind direction in Chaohu by others [54,60,62]. Huang et al. [60] denote an eastern wind as the main wind direction between 2011 and 2013, which is comparable to the reanalysis from 2015 ( Figure 2). Chen and Liu [62] state the main wind direction as south-east in summer. Dividing the reanalysis per month led to main wind directions either from south-south-west or east-north-east during the months June-August of the years 2014-2015 (data not shown). In winter, wind direction was more variable in the reanalysis with a stronger tendency towards the north-east as the main wind direction compared to the north-west stated as the main wind direction in winter by Chen and Liu [62]. In general, wind was found to be a meteorological variable that is well represented by reanalysis data [53] although the spatial resolution of the reanalysis does not account for local orography at small scale. It has to be noted that Xue et al. [53] used reanalysis data to simulate Lake Superior, which is more than 10 times the size of Lake Choahu. Whether a bias in wind data from the reanalysis is causing the observed mismatch of longer stratification events in the eastern part of the lake ( Figure 5A, stations F and J) can only be clarified through local measurements at various stations on and around the lake. The paper by Zhang et al. [54] hints at local differences in wind direction between Heifei and Chaohu City.
A difficulty in correctly simulating stratification in lakes arises from the surface-energy budget. Preliminary simulations revealed a high sensitivity of the model towards the equation used for calculating net longwave radiation fluxes. Furthermore, the model results showed a large diurnal fluctuation of surface water temperatures, which points to problems in the sub-daily energy budget. It is possible that cloud cover, which is part of the heat budget estimation, contributes a large error, since it is hard to simulate within global climate and forecast models [72].
A direct assessment of sub-daily patterns in reanalysis data could not be achieved since local measurements were only available as daily averages. Furthermore, it has to be kept in mind that local weather conditions can also be very heterogeneous around the lake leading to further complications. Daily land-lake wind patterns can develop due to different warming and cooling rates of land and water surfaces, and evaporation from the large surface area can have a buffering effect on local temperatures as well as changing local cloud cover and thus incoming irradiance. These processes are not included in the reanalysis model and would improve model performance [73].
Due to the unavailability of local data, we were not able to quantify the lake's water balance and neglected in-and outflows in our simulation. Our assumption is strengthened by previous simulation studies, which have shown the negligible effect of discharge on the lake's hydrodynamics [60,62]. We cannot completely exclude local effects of river inflows on stratification. An assessment of small-scale patterns would need reliable discharge data and several thermistor chains deployed close to the main inflows. Changes in water depth can potentially affect stratification. The water level in Lake Chaohu rises mainly in summer due to the rainy season (see e.g., [60]). If this had an effect, we would expect a strong seasonal bias in the model fit, which we did not observe. Other factors that could contribute to the spatial mismatch of longer stratification events are potential errors in the bathymetry used for the simulations, the spatial resolution of the model, the operation of the dam, and even ship traffic. The lake is deeper in the eastern part. Inaccuracies in the bathymetry could have caused the mismatch at station F. Station J is located in a narrow bay, close to Chaohu City. A higher resolution of the model, as well as data on the dam operation could result in a better prediction at this location. Finally, Lake Chaohu is strongly used for the transportation of goods, leading to continuous ship traffic that causes turbulence in the water column. Station J could especially be affected by ship and boat traffic, since it is located in a bay of the lake, near to Chaohu City.

Applicability of Reanalysis Data in Hydrodynamic Lake Simulations
Within an initiative of the International Association of Hydrological Sciences (IAHS), the project Predictions in Ungauged Basins (PUB) was launched in 2003 (http://iahs.info/pub/index.php). PUB aimed at developing methods to increase the process-understanding and reduce the uncertainty of hydrological predictions in ungauged basins. The group mainly evolved from the need to develop models that are capable of providing reliable and transferable prognoses for future changes and for areas with little measurement activity [74]. Furthermore, new possibilities for data acquisition, e.g., remote sensing, were supposed to be explored [75]. In essence, the lake modelling community is confronted with similar problems: several surface water resources around the globe are at risk concerning water quality and quantity. Where standards in environmental monitoring are below the input data requirements of the established models, an alternative approach enabling model applications to those lakes and reservoirs is needed.
Despite the restrictions mentioned above, the model results described seasonal and synoptic scale patterns of water temperatures well, both at the surface and bottom of the lake. The lake morphometry prescribes a strong influence of meteorology on water temperature dynamics so that changes in meteorological conditions induce a fast thermodynamic response of the lake. Due to the shallow water depth, the effect of thermal inertia in spring and autumn, as observed in deep lakes with a large water volume, is strongly reduced. A potential buffering effect may arise from heat storage in the sediments or from groundwater intrusions leading to cooling in early summer and warming in autumn. Our simulated and observed temperature, however, did not provide evidence that such buffering effects are important in Lake Chaohu. Also, inflow and outflow dynamics of the lake are obviously negligible for the thermodynamic budget since our model was able to simulate temperature dynamics accurately without including in-and outflows. In conclusion, the direct effect of local meteorological conditions will be the main driver of the lake thermal and stratification dynamics. The good fit between observation and simulation regarding seasonal patterns showed that reanalysis data are suitable for simulations unless sub-daily dynamics are of interest. Our approach of applying reanalysis data has the benefits of easy transferability and the potential for global applicability since reanalysis data are available worldwide.
An important measure for water managers are the currents in the lake. An accumulation of cyanobacterial scums in the western part of the lake has been observed in several studies (e.g., [54,76,77]). It would be of interest to assess the feasibility of reanalysis data for simulating current patterns in lakes. However, we argue that a full assessment of currents in the lake requires local measurements for validation to give an indication on the reliability of those simulation results.

Water Temperature Data
Through our measurements of water temperatures at the surface and bottom of the lake, we showed that the large polymictic Lake Chaohu is frequently stratified. This contrasts with Huang et al. [60], who assumed that the lake does not stratify. We observed that the lake was stratified on average in 22% of all observations. It has to be stressed that this number does not relate to the whole year, since large data gaps exist in our dataset. It is probable that the percentage per year is higher, because the largest data gap existed in summer when the water column is most likely to stratify. Stratification sometimes lasted over several days. As the lower layer of a stratified water body does not have direct contact with atmospheric oxygen, the huge oxygen demand of the sediment favors anoxic conditions in the bottom layer of the lake. The longer the stratification, the more likely the lake system develops anoxic bottom water. A precise simulation of stratification and the timing is, thus, mandatory for a realistic simulation of the bio-geochemistry of Lake Chaohu.
The frequent alteration between mixing and stratification could even increase the release of nutrients from the sediment to the overlying water. While the lake is stratified, nutrients may accumulate in the bottom layers. When the lake is mixed again, these nutrients will be diluted within the whole water column and can readily be taken up by primary producers. This "nutrient pump" can generate high pulses of nutrients, if the bottom waters approach anoxic conditions and low redox potentials persist [78]. In several studies, the occurrence of large anoxic zones, so called "black blooms", in the shallow lakes of the Yangtze basin has been identified [12]. A three-dimensional modelling study would be useful for analysing hydrodynamic processes leading to these phenomena. However, this will require additional information from local meteorological and probably hydrological measurements in combination with a well calibrated bio-geochemical model. Coupling GETM to a biogeochemical model is straightforward via the Framework for Aquatic Biogeochemical Models (FABM; [79]). The limiting factor of such a study remains the availability of water quality data to validate the bio-geochemical model thoroughly.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4441/10/5/594/s1, Table S1: Overall model fit. Figure S1: Model fit for the overall simulation. Table S2: Overview of data availability. Table S3: Model fit separated for stations. Figure S2: Water temperature at the surface. Figure S3: Water temperature at the bottom.
Author Contributions: M.F., B.B., W.H. and K.R. conceived and designed the study; M.F., P.H. and K.K. performed the modelling; M.F. analysed the data; W.H., Z.P. and J.Z. contributed data; M.F., K.K., Z.P. and K.R. wrote the first draft of the paper. All authors contributed in reviewing and editing the manuscript. Except for the lead-author and last author, authors are listed in alphabetic order.