ERA5 and ERA-Interim Data Processing for the GlobWat Global Hydrological Model

: The reproducibility of computational hydrology is gaining attention among hydrologists. Reproducibility requires open and reusable code and data, allowing users to verify results and process new datasets. The creation of input ﬁles for global hydrological models (GHMs) requires complex high-resolution gridded dataset processing, limiting the model’s reproducibility to groups with advanced programming skills. GlobWat is one of these GHMs, which was developed by the Food and Agriculture Organization (FAO) to assess irrigation water use. Although the GlobWat code and sample input data are available, the methods for pre-processing model inputs are not available. Here, we present a set of open-source Python and YAML scripts within the Earth System Model Evaluation Tool (ESMValTool) that provide a formalized technique for developing and processing GlobWat model weather inputs. We demonstrate the use of these scripts with the ERA5 and ERA-Interim datasets from the European Centre for Medium-Range Weather Forecasts (ECMWF). To demonstrate the advantage of using these scripts, we ran the GlobWat model for 30 years for the entire world. The focus of the evaluation was on the Urmia Lake Basin in Iran. The validation of the model against the observed discharge in this basin showed that the combination of ERA5 and the De Bruin reference evaporation method yields the best GlobWat performance. Moreover, the scripts allowed us to examine the causes behind the differences in model outcomes. illustrates that the GlobWat model is extremely sensitive to precipitation. Generally, a 10% increase in precipitation results in a 38% increase in discharge.


Introduction
The reproducibility of computational hydrology is gaining attention among hydrologists [1][2][3][4][5][6]. Several definitions of reproducibility have been proposed. The concept of these definitions includes access to all data, models, code, and the research environment. Access must be such that a new user can precisely reproduce the results given by the main study and that a set of new data can also be processed [7,8]. Nevertheless, a recent study showed that, on average, only 1.1% of the publications in hydrology and water resources are fully reproducible [8]. This is because most researchers either do not share their code and data or, when they do, the details and dependencies are omitted [9]. Reproducibility is important to validate previous results and to advance earlier research [10].
Global hydrological models (GHMs) are rapidly evolving as a result of developments in computational capacity and data storage as well as developments in remote sensing, the availability of gridded observation datasets, and meteorological forcing data [11]. Creating input files for GHMs becomes exceedingly challenging as their resolution improves [12]. Each GHM reads input data in a specific format. As a result, raw gridded datasets have to be pre-processed before being used by a GHM. These modifications require programming Table 1. Grid specification (data from GlobWat user manual [36]

Eref Methods and Data Processing
In this study, we provide a recipe (a YAML script) and a diagnostic (a Python script) for processing precipitation and Eref for use in the GlobWat hydrological model. The recipe is used to specify the input data required by the diagnostic as well as to set optional processing. The diagnostic is where calculations related to Eref are performed, and necessary changes are made to the data in the format used in the model.
The recipe and diagnostic were developed within the ESMValTool. The ESMValTool is a community-based open-source software package for the comprehensive analysis of Earth system models (ESMs) [18]. To assess the performance of a model, the ESMValTool uses both diagnostics and performance metrics. The ESMValTool has preprocessing functions for adjusting the data before applying diagnostics or metrics. The ESMValTool's most typical data adjustments are the calculation of non-directly available variables, vertical interpolation, land-sea or ice masking, re-gridding, multi-model statistics, temporal and spatial manipulations, missing value masking, and unit conversion [37]. Sections 2.2.1 and 2.2.2 describe the Eref method used in this study as well as the token processing steps.

Eref Methods
Eref is an ERA5 parameter described in the ECMWF parameter details [38]. However, Eref is not included in ERA-Interim. In addition to Eref from ERA5, we calculated the Eref for ERA5 and ERA-Interim. Eref at the global scale for use in GlobWat can be calculated based on the De Bruin and Langbein methods.

De Bruin Method
The De Bruin method is a straightforward algorithm that uses a thermodynamic technique to calculate the Eref of a well-watered grass field that closely resembles the FAO reference grass. This method calculates Eref as a function of the two-meter air temperature, mean sea level pressure, surface solar radiation downwards, and top of atmosphere (TOA) incident solar radiation. The results of the De Bruin method have been validated using Cabauw in situ observations in the Netherlands [39]. A python script has been provided for the calculation of Eref using the De Bruin method in the ESMValTool [40] and the eWaterCycle [41].

Langbein Method
The Langbein method to calculate Eref is an empirical relationship between the average annual temperature and Eref, based on data from 20 catchments across the United States. These catchments vary from humid to arid and from cold to warm [42,43]. This equation has been used in Eref analyses in various regions, including arid, semi-arid, and humid regions [44][45][46][47][48]. The Langbein calculation is shown in Equation (1), where Eref is in mm and T is the average annual temperature in • C.
The Langbein method has been added to the ESMValTool to address the interests of researchers who intend to use or have used this method and want to compare their results with ERA5 and ERA-Interim. The Langbein temperature-based Eref method is mainly useful in those basins where only the observed temperature is available. ERA5, ERA-Interim, and GlobWat operate at a global scale, allowing researchers to choose any location for further assessments.

Processing Steps
We provided a recipe [49] and a diagnostic [50] for processing the precipitation and Eref for use in the GlobWat model. The GlobWat recipe includes monthly and daily data processing of the ERA5 and ERA-Interim datasets. When the ESMValTool runs the GlobWat recipe, it finds the data and runs the processing function. It also executes the diagnostic script that involves a GlobWat-specific analysis to extract Eref and the precipitation, and stores the provenance details to ensure transparency and reproducibility. The processing flowchart is shown in Figure 1a.
In this paper, we use monthly data. However, the recipe can be run using daily data as well. The download and CMORization (Climate Model Output Rewriter) of input data are the first steps of data processing. The script to download ERA-Interim data is available at the ESMValTool GitHub repository [51], and ERA5 can be downloaded using era5cli [52].
CMORization is the process of standardizing data to ensure that it is CF (climate and forecast) compliant and follows the CMOR tables (standard_name, units, long_name, etc) [5,34]. For example, the ERA-Interim CMORizer changes the precipitation units from meters of water per day to kg of water per m 2 per day, converts the ERA-Interim 6 hr frequency to daily, and sets standard names for variables. The ERA-Interim and ERA5 CMORizer Python scripts can be found in the ESMValTool GitHub repository [53,54].
CMORization is performed only once for each dataset. The CMORized data can then be used in any ESMValTool recipe. Our recipe uses the CMORized ERA5-daily/monthly and ERA-Interim-daily/-monthly data. Then, Eref is calculated based on the De Bruin or Langbein methods.
Another variable required by GlobWat is precipitation, which is available in both datasets. However, the interpolation of cumulative precipitation fields often results in negative values [55]. To prevent NaN values in the GlobWat outputs, these negative values are set to zero. Next, a unit conversion is applied to both the precipitation and Eref to convert units from kg m −2 s −1 to mm per time step (month or day).

ERA5 and ERA-Interim
ERA5 and ERA-Interim are global atmospheric reanalysis datasets provided by the ECMWF [28,57]. ERA-Interim covers January 1979 to 31 August 2019, and ERA5 covers January 1950 to the present [29,58]. ERA5 and ERA-Interim have spatial resolutions of 30 km and 79 km and temporal resolutions of 1 hour and 6 hours, respectively [29]. Details about the ERA5 and ERA-Interim datasets can be found on the ECMWF website [59,60]. Subsequently, to match the grid size of the datasets with that of the GlobWat model, re-gridding is performed. The re-gridding scheme is set to 'area weighted' to keep the volume of water consistent before and after re-gridding. In this method, the data for each target grid are generated as a weighted mean of all cells from the source grid.
Finally, the data are saved in ASCII format and are ready for use as inputs in the GlobWat model. An example of processing ERA-Interim precipitation for the year 2004 for January is presented in Figure 1b.
We provide more details about the scripts on the ESMValTool documentation page [56].
2.3. Data 2.3.1. ERA5 and ERA-Interim ERA5 and ERA-Interim are global atmospheric reanalysis datasets provided by the ECMWF [28,57]. ERA-Interim covers January 1979 to 31 August 2019, and ERA5 covers January 1950 to the present [29,58]. ERA5 and ERA-Interim have spatial resolutions of 30 km and 79 km and temporal resolutions of 1 hour and 6 hours, respectively [29]. Details about the ERA5 and ERA-Interim datasets can be found on the ECMWF website [59,60]. ERA-Interim and ERA5 are often used for hydrological modeling [4,5]. Recently, the impact of ERA5 over ERA-Interim for use in hydrological models has been studied [4,5,22,61]. We retrieved the precipitation, two-meter air temperature, mean sea level pressure, surface solar radiation downwards, and top of atmosphere (TOA) incident solar radiation from 1986 to 2016.

Observed Hydro-Climatic Variables
The observational data were obtained from D. Moshir Panahi et al. [62]. They gathered data from synoptic meteorological stations for precipitation (P) and temperature as well as discharge (Q) data from hydrometric stations in the Urmia Lake Basin. The annual basin precipitation and temperature were estimated using Thiessen polygons, also using synoptic stations from neighboring basins to obtain precise estimations. The annual discharge was calculated by dividing the measured discharge values by the corresponding basin area. Water storage changes (DS) were determined based on the groundwater storage in the aquifer below the basin. There was no direct measurement of the actual evaporation (Eact) at the basin scale. Therefore, they calculated Eact from the measured P, Q, and DS and a water balance equation:

The Urmia Lake Basin, an Example Case Study
The Urmia Lake Basin is located in northwestern Iran. The elevation of the Urmia Lake Basin varies between 1204 and 3804 m. It has been classified as a biosphere reserve by UNESCO [63,64]. Urmia Lake is the world's second-largest hypersaline endorheic lake [65]. Rain-fed agriculture and manufacturing are the two main economic sectors in the Urmia Lake Basin, which is home to more than 6.5 million people [66]. Figure 2 depicts the locations of the Urmia Lake Basin. The area of the basin is 51,440 km 2 [67]. The Urmia Lake Basin, with a mean annual temperature of 12 • C and precipitation of 303 mm/yr, has a cold and semi-arid climate [67,68].

Evaluation Statistics
Four statistics were used in the assessment to evaluate the ERA5/ERA-Interim precipitation and observed precipitation and the performance of the GlobWat model in estimating the observed discharge over the Urmia Lake Basin during the modeling period. The statistics include the correlation coefficient (CC), standard deviation (SD), root-mean-square error (RMSE), and Nash-Sutcliffe efficiency (NSE), which are defined as follows:

Correlation Coefficient (CC)
Equation (3) gives the correlation coefficient between the observed values and those estimated by GlobWat (Q and Eact) or datasets (ERA5/ERA-Interim P and T); it ranges from −1 to 1.

of 21
Here, Y est is a variable estimated by GlobWat or a dataset variable, and Y obs is the observed variable. The closer R is to one, the better the model reproduces the observed variability.

Evaluation Statistics
Four statistics were used in the assessment to evaluate the ERA5/ERA-Interim precipitation and observed precipitation and the performance of the GlobWat model in estimating the observed discharge over the Urmia Lake Basin during the modeling period. The statistics include the correlation coefficient (CC), standard deviation (SD), root-meansquare error (RMSE), and Nash-Sutcliffe efficiency (NSE), which are defined as follows:

Correlation Coefficient (CC)
Equation (3) gives the correlation coefficient between the observed values and those estimated by GlobWat (Q and Eact) or datasets (ERA5/ERA-Interim P and T); it ranges from −1 to 1.

Standard Deviation (SD)
The standard deviation is a measure of how far values deviate from the mean. The standard deviation is calculated as the square root of the sum of the squared differences from the mean divided by the size of the dataset (Equation (4)).
The RMSE, which ranges from 0 to infinity and expresses how close the model's estimation or dataset variable is to the observed variable, is calculated with Equation (5).
The closer the RMSE gets to 0, the more accurate the model's estimate or dataset value is.

Nash-Sutcliffe Efficiency (NSE)
The NSE is described by Equation (6) and ranges from -infinity to 1.
Here, Q obs,i is the observed discharge, and Q est,i is the estimated discharge by the model. NSE = 1 indicates that the estimated discharge is consistent with the observed discharge and a negative NSE states that the mean of the observed discharge is a better predictor than the model.

Results
We compared the precipitation and temperature estimates for our study period (1986-2016) from the ERA5 and ERA-Interim datasets to observations from the Urmia Lake Basin. Figure 3 shows a comparison of the ERA5 and ERA-Interim precipitation and temperature values with the observed values. ERA5 and ERA-Interim overestimated precipitation and temperature over the Urmia Lake Basin. ERA5 overestimated precipitation less than ERA-Interim. While both datasets had similar temperature estimates. The patterns of the ERA5 and ERA-Interim datasets resembled that of the observation data, and in many cases, the observed precipitation and temperature increases and decreases were consistent with the ERA5 and ERA-Interim precipitation and temperature. However, the precipitation estimates from ERA5 and ERA-Interim from 2006 to 2016 deviated more significantly from the observed data. As a result, we examined three periods in our analysis to better understand the impact of this deviation on GlobWat's discharge estimations. These  The Taylor diagram, which is based on the CC, normalized SD (NSD), and RMSE, provides a graphical summary of the agreement between the observed and estimated data [69]. In order to display multiple periods in a single figure, the standard deviations of the reference datasets were normalized to 1.0 based on the observed standard deviations. The closer a dataset is to the black star marker, the more accurate the estimated data are. Figure  4 shows that the ERA5 and ERA-Interim datasets performed relatively well in estimating the precipitation and temperature for the time periods of 1986-2006 and 1986-2016. However, ERA5 and ERA-Interim had the lowest agreement in representing precipitation from 2006 to 2016. The datasets performed equally well for the estimation of temperature in all The Taylor diagram, which is based on the CC, normalized SD (NSD), and RMSE, provides a graphical summary of the agreement between the observed and estimated data [69]. In order to display multiple periods in a single figure, the standard deviations of the reference datasets were normalized to 1.0 based on the observed standard deviations. The closer a dataset is to the black star marker, the more accurate the estimated data are. Figure 4 shows that the ERA5 and ERA-Interim datasets performed relatively well in estimating the precipitation and temperature for the time periods of 1986-2006 and 1986-2016. However, ERA5 and ERA-Interim had the lowest agreement in representing precipitation from 2006 to 2016. The datasets performed equally well for the estimation of temperature in all periods, while ERA5 performed better than ERA-Interim regarding precipitation. For the period of 1986-2006, the precipitation was slightly better represented in both ERA5 and ERA-Interim compared to 1986-2016 by having lower RMSEs, whereas in the period of 2006-2016, ERA5 and ERA-Interim precipitation was not well presented and had the highest RMSEs. In general, the ERA5 and ERA-Interim temperatures had the same performance in all periods by having almost equal RMSEs. Tables S1 and S2 in the Supplementary   The Taylor diagram, which is based on the CC, normalized SD (NSD), and RMSE, provides a graphical summary of the agreement between the observed and estimated data [69]. In order to display multiple periods in a single figure, the standard deviations of the reference datasets were normalized to 1.0 based on the observed standard deviations. The closer a dataset is to the black star marker, the more accurate the estimated data are. Figure  4 shows that the ERA5 and ERA-Interim datasets performed relatively well in estimating the precipitation and temperature for the time periods of 1986-2006 and 1986-2016. However, ERA5 and ERA-Interim had the lowest agreement in representing precipitation from 2006 to 2016. The datasets performed equally well for the estimation of temperature in all periods, while ERA5 performed better than ERA-Interim regarding precipitation. For the period of 1986-2006, the precipitation was slightly better represented in both ERA5 and ERA-Interim compared to 1986-2016 by having lower RMSEs, whereas in the period of 2006-2016, ERA5 and ERA-Interim precipitation was not well presented and had the highest RMSEs. In general, the ERA5 and ERA-Interim temperatures had the same performance in all periods by having almost equal RMSEs. Tables S1 and S2 in the Supplemen- tary   We calculated Eref for use in the GlobWat model with the ERA5, ERA-Interim, and observed datasets using the De Bruin and Langbein methods. Since the ERA5 dataset contains Eref, we also retrieved Eref from the ERA5 dataset. Figure 5 depicts the calculated Eref values and the Eref retrieved from ERA5. In comparison to the Eref calculated using the De Bruin and Langbein methods, the ERA5 Eref had the highest values of Eref for the Urmia Lake Basin. However, it resembled the Eref patterns calculated by the De Bruin and Langbein methods.
In general, the method used to calculate the Eref appears to have a greater effect on the generated Eref values than the datasets used for the calculations. For example, the Eref values calculated using the De Bruin method with ERA5 and ERA-Interim were close to each other. Accordingly, the Eref estimates using the Langbein method were also close. Due to the lack of observed data, the Langbein method was used to compute Eref because it only requires observed temperature.
The GlobWat model was used to estimate the discharge and Eact for each basin of the world for 30 years, from 1986 to 2016. GlobWat was run five times with various Eref methods and reanalysis products each time. The runs included ERA5 in combination with different Eref methods, including De Bruin, Langbein, and the Eref retrieved from the ERA5 dataset (ERA5 Eref). Additional runs included ERA-Interim with combinations of the De Bruin and Langbein Eref methods. The GlobWat-model-developed scripts for downloading and processing the ERA5 and ERA-Interim for use in GlobWat, as well as their documentation, are accessible online. Appendix B describes the steps and online resources required to run the GlobWat model using the scripts developed in this study.
Eref values and the Eref retrieved from ERA5. In comparison to the Eref calculated using the De Bruin and Langbein methods, the ERA5 Eref had the highest values of Eref for the Urmia Lake Basin. However, it resembled the Eref patterns calculated by the De Bruin and Langbein methods.
In general, the method used to calculate the Eref appears to have a greater effect on the generated Eref values than the datasets used for the calculations. For example, the Eref values calculated using the De Bruin method with ERA5 and ERA-Interim were close to each other. Accordingly, the Eref estimates using the Langbein method were also close. Due to the lack of observed data, the Langbein method was used to compute Eref because it only requires observed temperature. GlobWat produces results for each of the world's major basins. In this study, we analyzed the results calculated for the Urmia Lake Basin. Figure 6 shows a comparison between the estimated discharges using the De Bruin and Langbein methods. For the whole study period, in many cases, the estimated discharge patterns closely resembled that of the observation data. The estimated discharge represents the same downward trend as the observed discharge. GlobWat produces results for each of the world's major basins. In this study, we analyzed the results calculated for the Urmia Lake Basin. Figure 6 shows a comparison between the estimated discharges using the De Bruin and Langbein methods. For the whole study period, in many cases, the estimated discharge patterns closely resembled that of the observation data. The estimated discharge represents the same downward trend as the observed discharge.  To compare the estimated and observed discharge at the Urmia Lake Basin over time, we used a Taylor diagram together with the RMSE and NSE to evaluate the GlobWat model performance.  To compare the estimated and observed discharge at the Urmia Lake Basin over time, we used a Taylor diagram together with the RMSE and NSE to evaluate the GlobWat model performance.  To compare the estimated and observed discharge at the Urmia Lake Basin over time, we used a Taylor diagram together with the RMSE and NSE to evaluate the GlobWat model performance. Figure 7 shows that the model had similar performances during 1986-2016 and 1986-2006. However, the model's performance in representing the discharge was lower from 2006 to 2016. In all periods, the model's performance using the ERA5 weather dataset and the De Bruin Eref method was the best, with the highest values of NSE and lowest values of RMSE. The statistical values between the observed and estimated discharge by GlobWat are shown in Table S3 of the Supplementary Materials (e.g. CC, SD, NSDs, and RMSE).  From 2006 to 2016, the model did not perform as well as it did from 1986 to 2006. This could be due to the model's sensitivity to precipitation, as the differences between the precipitation datasets and the observed precipitation were larger from 2006 to 2016 than from 1986 to 2006. Therefore, we plotted the discharge versus precipitation to determine the model's sensitivity to estimating the discharge from input precipitation. Figure 8 illustrates that the GlobWat model is extremely sensitive to precipitation. Generally, a 10% increase in precipitation results in a 38% increase in discharge.
ater 2022, 14, x FOR PEER REVIEW 13 of 23 from 1986 to 2006. Therefore, we plotted the discharge versus precipitation to determine the model's sensitivity to estimating the discharge from input precipitation. Figure 8 illustrates that the GlobWat model is extremely sensitive to precipitation. Generally, a 10% increase in precipitation results in a 38% increase in discharge.   [72]. Additionally, from 2006 to 2016, the Urmia lake basin provided a yearly average of 206.6 MCM of water for drinking and industrial purposes [73]. Water transfer projects in the basin also reduced the amount of observed discharge in the basin. For example, there was a large transfer of 157 MCM per year of potable water from Zarrineh Rood to Tabriz City in the Urmia Lake Basin in 1999 [67]. Table S4 in the Supplementary Materials provides an overview of the basin's water and irrigation projects.
The Urmia Lake Basin is endorheic. Thus, human interventions in the basin affect the water volume stored in Urmia Lake. One of the human interventions in the basin was the expansion of irrigated areas. As a result, we depicted the changes in the Urmia Lake area and irrigated areas between 1986 and 2016. In this way, we can see when human interferences occurred at a faster rate. Figure 9 shows the lake gradually shrinking from 1986 to 2006, followed by rapid shrinking from 2006 to 2016. In comparison, the irrigated land area increased more slowly from 1986 to 2006 and more rapidly from 2006 to 2016. In general, increasing the amount of irrigated land correlates with a decrease in the lake area. GlobWat estimates Eact by multiplying Eref by a crop or land use factor, which is assumed to be equal to the maximum evaporation of a land use or vegetation type [16] We analyzed the Eact results for the Urmia Lake Basin, which were calculated by GlobWat for each basin around the world. Eact was estimated using the observed data and a water balance equation (Equation (1)) because it could not be measured directly in the Urmia Lake Basin [62]. Figure 10 shows the Eact for the Urmia Lake Basin, as estimated by Glob-Wat and based on the observed data. In all experiments, GlobWat underestimated Eact. GlobWat estimates Eact by multiplying Eref by a crop or land use factor, which is assumed to be equal to the maximum evaporation of a land use or vegetation type [16]. We analyzed the Eact results for the Urmia Lake Basin, which were calculated by GlobWat for each basin around the world. Eact was estimated using the observed data and a water balance equation (Equation (1)) because it could not be measured directly in the Urmia Lake Basin [62]. Figure 10 shows the Eact for the Urmia Lake Basin, as estimated by GlobWat and based on the observed data. In all experiments, GlobWat underestimated Eact.
GlobWat estimates Eact by multiplying Eref by a crop or land use factor, wh assumed to be equal to the maximum evaporation of a land use or vegetation type We analyzed the Eact results for the Urmia Lake Basin, which were calculated by Glo for each basin around the world. Eact was estimated using the observed data and a balance equation (Equation (1)) because it could not be measured directly in the U Lake Basin [62]. Figure 10 shows the Eact for the Urmia Lake Basin, as estimated by Wat and based on the observed data. In all experiments, GlobWat underestimated E Figure 10. Estimated actual evaporation (Eact) calculated by the GlobWat model using di weather datasets and reference evaporation (Eref) methods against the observed data over Lake, Iran. Table 2 shows the CC and RMSE between the estimated and calculated Eact. Wat Eact estimates had a low correlation with the observed Eact calculated using th ter balance equation (Equation (1)). When the De Bruin method was used to calculate the lowest RMSEs were obtained for all periods. D. Moshir Panahi et al. [62] estimate mean Eact to be 275 (mm/yr) from 1986 to 2016, which is close to the mean Eact estimated by GlobWat using a combination of the De Bruin_ERA-Interim (244 m Figure 10. Estimated actual evaporation (Eact) calculated by the GlobWat model using different weather datasets and reference evaporation (Eref) methods against the observed data over Urmia Lake, Iran. Table 2 shows the CC and RMSE between the estimated and calculated Eact. GlobWat Eact estimates had a low correlation with the observed Eact calculated using the water balance equation (Equation (1)). When the De Bruin method was used to calculate Eact, the lowest RMSEs were obtained for all periods. D. Moshir Panahi et al. [62] estimated the mean Eact to be 275 (mm/yr) from 1986 to 2016, which is close to the mean Eact value estimated by GlobWat using a combination of the De Bruin_ERA-Interim (244 mm/yr) and De Bruin_ERA5 (234 mm/yr). In general, GlobWat performed better when using the De Bruin method and ERA-Interim in estimating the Eact. We should consider that D. Moshir Panahi et al. [62] calculated Eact using a water balance equation. This means that all the errors accumulated in the estimate of Eact, which can partially explain the large variation in the Eact calculated by D. Moshir Panahi et al. [62].

Development of Scripts for Processing Weather Input Data for GlobWat
The need to automate the complex and time-consuming process of converting climate datasets into GlobWat input motivated the development of ESMValTool Python and YAML scripts for processing ERA5 and ERA-Interim data to the GlobWat input data format. An important aspect of this study is that the scripts, reviewed by the ESMValTool software development community, ensure that they are based on the best code practices. In addition to the technical review of the scripts, before publishing the scripts in the ESMValTool repository, ESMValTool scientific reviewers checked the science behind the scripts to ensure correct inputs are created for the use in the GlobWat model. The history of script development and reviews can be found in the ESMValTool pull requests [74]. As the scripts are published in the ESMValTool repository, there is the opportunity to obtain guidance from the developers of the ESMValTool community as well as future maintenance of the scripts. The first goal of this article is to introduce the scientific community to these scripts.
The scripts reduce the amount of time and effort required to prepare input data for the GlobWat model and ensure the creation of proper weather input data. Hydrologists can now allocate less time to data extraction and formatting. As a result, they can move from concept to experimentation with GlobWat much more quickly. Python and YAML scriptingbased data processing improve the GlobWat model's reproducibility and repeatability, as the same script can be reused. When used in a new study, the scripts only need a few user inputs (e.g., the study period, Eref method, study period, and dataset). The provenance of the data processing steps is recorded by the Python and YAML scripts. For example, the provenance stores information about which raw input files were used to create which files. This ensures reproducibility because researchers will be able to repeat the experiment and obtain the same results [75]. Another important aspect of this study is that the development of the Python and YAML input data processing approaches can be transferred to other GHMs when writing scripts for other grid-based GHMs.

Comparison of ERA5 and ERA-Interim Precipitation and Temperature with Observed Data in Urmia Lake Basin
In comparison to ERA-Interim, ERA5 provided a better representation of precipitation in the Urmia Lake Basin. Our findings are consistent with previous studies that showed that ERA5 precipitation correlates better with observed data than ERA-Interim [76][77][78]. ERA5 has several improvements over ERA-Interim, including higher spatial resolution, a larger number of observation datasets for the assimilation system, and the inclusion of satellite precipitation estimates [79]. These improvements could explain why ERA5 performs better in precipitation representation. When ERA5 and ERA-Interim temperatures were compared to the observed temperatures in the Urmia Lake Basin, ERA5 did not outperform ERA-Interim. Our findings are in agreement with those of A. Delhasse et al. [80] and L. Liu et al. [81].
Previous studies compared the accuracy of ERA5 to observed data from several basins in Iran, indicating that ERA5 can be used as a potential source of weather data in basins with low densities of meteorological stations [82,83]. We also recommend using ERA5 as a source of weather data in the Urmia Lake Basin, which is consistent with the findings of M. Habibi et al. [84].

GlobWat Discharge and Eact Estimates for the Urmia Lake Basin
As weather input data, GlobWat requires precipitation and Eref. We used ERA5 and ERA-Interim precipitation. We calculated the reference evaporation using the ERA5 and ERA-interim meteorological variables using two Eref methods, De Bruin and Langbein. The De Bruin method is radiation-based, whereas the Langbein method is an empirical temperature-based method. Given the cold and semi-arid climate of the Urmia Lake Basin, De Bruin estimated a higher Eref for the Urmia Lake Basin in comparison to Langbein. Our results show that the model performed best in estimating the discharge when the ERA5 dataset and the De Bruin Eref method were used as model inputs. One reason is that GlobWat's discharge is highly sensitive to precipitation. ERA5 has significantly better precipitation than ERA-Interim. According to some studies, the choice of Eref in hydrological modeling has little effect on the hydrological modeling and estimated discharge [85,86]. However, in the case of the Urmia Lake Basin, the Eref method had a major impact on the discharge estimates. Part of the reason why discharge based on De Bruin was closer to the observed discharge than discharge based on Langbein is that ERA5 and ERA-Interim over-estimated precipitation, especially from 2006 onwards (see next section). This means that the high Eref values that De Bruin produced may offset the overestimated precipitation.
GlobWat's discharge estimates did not match the observed discharge pattern from 2006 to 2016. The main reason for this disparity was the low correlation between ERA5 and ERA-Interim precipitation and the observed precipitation during this period. The GlobWat model is highly sensitive to precipitation, and the overestimation of precipitation caused an overestimation of discharge. Another reason for the overestimation could be human intervention in the Urmia Lake Basin. Human interventions in the Urmia Lake Basin increased significantly since 2006, resulting in low discharge to the lake and a smaller lake area. As a global scale model, the GlobWat model is incapable of capturing these changes. For the lack of direct measurements, the actual evaporation was estimated using the water balance equation (water-balance Eact) [62]. The water-balance Eact varies greatly over time because all errors in the other terms accumulate in the estimated evaporation.
Our results showed that ERA5 can be used as an input for hydrological modeling in the Urmia Lake Basin but caution has to be taken with respect to recent precipitation.

Practical Implications of This Study
The ESMValTool scripts for processing input data for GlobWat can be used in global hydrological modeling with GlobWat to assess water use in agriculture, renewable water resources, and the amount of blue water (incremental evaporation) and green water (evaporation from in situ rainfall) [16]. The assessments can be global or local. Data manipulation for use in the GlobWat model was simplified with the help of the developed scripts, paving the way for future research. Examples are using the scripts for calculating Eref with other methods or using datasets with higher spatial and temporal resolutions. The assessment of the optimal resolution or Eref methods will be part of future research that will benefit from GlobWat's automated and repeatable model input creation. The lack of data on model-estimated variables, such as Eact, was one of the study's challenges.

Conclusions
Global hydrological models (GHMs) are increasingly using high-resolution datasets, which makes data processing complex and time-consuming, limiting GHM use to groups with the required programming and technical skills. GlobWat, a high-resolution GHM developed by the FAO to assess water use in agriculture, is one of these high-resolution GHMs. A set of Python and YAML scripts was created within the ESMValTool to convert ECMWF ERA5 and ERA-Interim weather data to the format required by GlobWat. This study develops and demonstrates the ability to support a streamlined, reproducible process for creating input data for the GlobWat GHM. These scripts automate downloading and standardizing raw data, calculating reference evaporation, unit conversion, and re-gridding to provide data input for GlobWat. Although these scripts are intended to create GlobWat input requirements from ERA5 and ERA-Interim, they could be adapted to process other datasets for use in GlobWat or to create input data for other grid-based GHMs that use a similar input data format. The usefulness of the scripts was demonstrated through the application of 30 years of processed ERA5 and ERA-Interim data in GlobWat. As an example case study, the results of GlobWat were evaluated for the Urmia Lake Basin in Iran. Based on our overall evaluation and the results from the Urmia Lake Basin, the major findings are as follows:

•
Using developed scripts within the ESMValTool allowed the rapid analysis of the underlying causes of observed trends.

•
The provenance made reproducibility easier by storing information from previous analyses. For example, in our study, we were able to look back and see which raw input files were used to create which files and so on.

•
When compared to its previous version, ERA-Interim, the ERA5 representation of temperature has not improved significantly over the Urmia Lake Basin. However, the ERA5 precipitation representation has significantly improved over the Urmia Lake basin.

•
In basins with low densities of meteorological stations, such as the Urmia Lake Basin, ERA5 can be a good source of weather data.

•
In the Urmia Lake basin, the De Bruin radiation Eref method outperformed the Langbein temperature-based method when evaluated in terms of the modeled and measured discharge.

•
The GlobWat discharge estimates are extremely sensitive to precipitation in (semi-) arid regions. Therefore, ERA5 would be preferable over ERA-Interim due to the better representation of precipitation. • The GlobWat model does not capture human interactions in the basin.
In this light, we have demonstrated that, when using the provided framework, rapid and reproducible comparisons can be made. On the ESMValTool website [49,50], all curated code is freely available for use by interested researchers. Any researcher should be able to replicate the findings and conduct new research.
Supplementary Materials: The following supporting information can be downloaded at: https: //www.mdpi.com/article/10.3390/w14121950/s1, Table S1: The comparison of evaluation statistics for ERA5 and ERA-Interim precipitation in three time periods; Table S2: The comparison of evaluation statistics for ERA5 and ERA-Interim temperature in three time periods; Table S3: Calculated statistics between GlobWat estimated and observed discharge in three time periods; Table S4: Major Dams of Urmia Lake Basin. Data Availability Statement: GlobWat data processing scripts, details on how to access the sample data, the GlobWat model, and ERA5 and ERA-Interim datasets are available on the ESMValTool documentation and GitHub page: https://docs.esmvaltool.org/en/latest/recipes/recipe_hydrology. html?highlight=hydrology (accessed on 17 June 2022) and https://github.com/ESMValGroup/ ESMValTool/tree/main/esmvaltool (accessed on 17 June 2022). Please cite the GlobWat recipe and diagnostic using the citation information available at https://doi.org/10.5281/zenodo.3974591 (accessed on 17 June 2022).

1.
Download the GlobWat model and its input data from the FAO's AquaMaps website [17].

2.
Follow the instructions on the ESMValTool Website [87] to install the ESMValTool on your system. 3.
Using the reference method, select the variables needed for processing data for use in the GlobWat model. More information can be found in the ESMValTool Documentation [88]. 4.
Using download_era_interim.py [51], download the variables you require from ERA-Interim for the time period you require. 5.
Using era5cli [52], download the variables you need for the desired time period. 6.
At recipe recipe_globwat.yml [49], select the time period and reference method with which you would like to process the data. 8.
Save the newly processed precipitation and reference evaporation to the GlobWat model input folder. 10. Run the GlobWat model as described in How_to_use_GlobWatv1.0rev2.pdf [36]. 11. Using any CSV or ASCII reader, extract the results for your desired location.