Evaluating the Potential of GloFAS-ERA5 River Discharge Reanalysis Data for Calibrating the SWAT Model in the Grande San Miguel River Basin (El Salvador)

Hydrological modelling requires accurate climate data with high spatial-temporal resolution, which is often unavailable in certain parts of the world—such as Central America. Numerous studies have previously demonstrated that in hydrological modelling, global weather reanalysis data provides a viable alternative to observed data. However, calibrating and validating models requires the use of observed discharge data, which is also frequently unavailable. Recent, global-scale applications have been developed based on weather data from reanalysis; these applications allow streamflows with satisfactory resolution to be obtained. An example is the Global Flood Awareness System (GloFAS), which uses the fifth generation of reanalysis data produced by the European Centre for Medium-Range Weather Forecasts (ERA5) as input. It provides discharge data from 1979 to the present with a resolution of 0.1°. This study assesses the potential of GloFAS for calibrating hydrological models in ungauged basins. For this purpose, the quality of data from ERA5 and from the Climate Hazards Group InfraRed Precipitation and Temperature with Station as well as the Climate Forecast System Reanalysis (CFSR) was analysed. The focus was on flow simulation using the Soil and Water Assessment Tool (SWAT) model. The models were calibrated using GloFAS discharge data. Our results indicate that all the reanalysis datasets displayed an acceptable fit with the observed precipitation and temperature data. The correlation coefficient (CC) between the reanalysis data and the observed data indicates a strong relationship at the monthly level all of the analysed stations (CC > 0.80). The Kling–Gupta Efficiency (KGE) also showed the acceptable performance of the calibrated SWAT models (KGE > 0.74). We concluded that GloFAS data has substantial potential for calibrating hydrological models that estimate the monthly streamflow in ungauged watersheds. This approach can aid water resource management.


Introduction
Hydrological models are commonly used to understand changes in hydrological processes due to changes in the climatic or the land use [1,2]. Such changes in land use and climatic conditions are especially important in Central America. Recent studies have highlighted deforestation as the main land-use change in this area [3]. However, climate change can also strongly affect the hydrological cycle by altering the timing and intensity of rainfall, recharge and runoff. This change has intensified the mid-summer drought characteristic of Central America's weather [4].
In addition to forecasting and estimating the quantity and quality of water for decisionmakers, hydrological models can assist local authorities in forecasting the effects of natural and anthropogenic changes on water resources. Furthermore, they can characterise the temporal and spatial availability of water resources to enable the design of appropriate strategies to mitigate water-related hazards. These includes droughts, floods and the discharge of pollutants.
Several conceptual and semi-distributed models have been applied at grid scale in tropical climatology. Srivastava et al. (2017) [5] successfully implemented the variable infiltration capacity (VIC) model for the Kangsabati River Basin and obtained satisfactory evapotranspiration estimates at the monthly scale. Srivastava et al. (2020) [6] compared two models, namely VIC and the model for the identification of unit hydrograph and components flows from rainfall, evapotranspiration and streamflow (IHACRES). They concluded that IHACRES is a very useful model for data-scarce regions. Paul et al. (2018) [7] similarly reported the successful implementation of a modified time-variant spatially distributed hydrograph technique integrated into the satellite-based hydrological model (SHM) for the Kabini River Basin.
The distributed Soil and Water Assessment Tool (SWAT) model has also been widely used in tropical basins [8]. Darbandsari and Coulibaly (2020) [9] demonstrated the usefulness of lumped hydrological models for simulating hydrological processes in data-scarce watersheds. However, in the current study, the distributed SWAT model is used, because once calibrated, it allows further analyses related to land-use changes. SWAT is one of several models employed to assess the influence of land use and land management changes on water resources [10].
Accuracy in simulating a basin's water resources fundamentally depends on the input data used for modelling and on the capability of the hydrological model. Primary input data are meteorological and geographical data (e.g., precipitation and temperature as well as data from digital elevation models and land-use and soil maps). In recent years, several ready-to-use global-scale maps have been developed that provide good results and make the SWAT model application easier [11].
The application of hydrological models is usually limited by the sparse distribution of rainfall observation stations. In most watersheds, the actual density of a rainfall network is notably lower than the values recommended by the World Meteorological Organization. Ground-based precipitation observation is also unevenly distributed in many countries due to economic constraints [12], and this issue can affect model estimates of streamflow performance. Missing values in rainfall data negatively affect the quality of hydrological modelling. Tan and Yang (2020) [13] demonstrated that missing values of more than 20% significantly affected the streamflow simulation for tropical climates. To overcome limitations arising from the scarcity of data or from poor-quality observations, numerous studies have compared gridded rainfall datasets with local datasets. The aim is to assess their suitability of those datasets in various hydrological models [14][15][16][17] for watersheds around the world.
The influence of temperature data on hydrological balance and discharge in simulated river basins has rarely been analysed (Tan et al., 2021).
In Southeast Asia, Tan et al. (2017) [18] recommended combining the Asian Precipitation Highly Resolved Observational Data Integration Towards Evaluation (APHRODITE) dataset [19] with the maximum and minimum temperatures from the Climate Forecast System Reanalysis (CFSR) dataset [20]. The objective was to model ungauged or gauge-limited catchments. In Ethiopia, Duan et al. (2019) [21] recommended the use of Climate Hazards Group Infrared Precipitation with Stations (CHIRPS) data [22] together with temperature data from CFSR for hydrological modelling.
Due to advances in satellite technology, many satellite weather products have been developed to monitor weather conditions on a global scale. Some are called reanalysis products; they combine satellite data with observed data to improve weather representation. An example is the CFSR dataset. It is widely used for hydrological modelling in the SWAT model, because-in addition to precipitation-it includes other meteorological variables that are easily downloadable from the SWAT website [23]. Additional reanalysis Remote Sens. 2021, 13, 3299 3 of 20 datasets that include precipitation and temperature data for simulations have recently been launched. For example, the CHIRPS precipitation dataset has recently been complemented with temperature data to yield the Climate Hazards Center InfraRed Temperature with Stations CHIRTS [24]. This data is available for the global scale at a spatial resolution of 0.05 • .
Another recently launched dataset that includes precipitation and temperature is the ERA5 global reanalysis dataset [25]. It provides data from 1950 to the present at a spatial resolution of 31 km. It was released recently and has not yet been tested in hydrological modelling for several areas of the world. However, Tarek et al. (2020) [26] tested the potential of ERA5 in hydrological modelling across North America. Their results highlighted many advantages over the previous dataset, ERA-Interim, and demonstrated a level of efficiency similar to that obtained in hydrological models that use observed data for most of the territory analysed. Kolluru et al. (2020) [27] concluded that ERA5 is efficient for detecting rainfall patterns, whereas CHIRPS dysplays better flow simulation. Jiang et al. (2021) [28] obtained highly varying results depending on the regions analysed and identified the general underestimation of extreme rainfall.
Model calibration and validation are key steps for obtaining accurate estimates of streamflow from hydrological models. These steps are generally performed using observed data [29]. However, in situ flow data are commonly unavailable for much of the land area, especially in developing countries, and the number of operational stations is decreasing rapidly. The recent availability of global-scale remote sensing climate products (such as those discussed above) has led to the development of hydrological models that provide discharge estimates at a global scale [30][31][32].
One such application is the Global Flood Awareness System (GloFAS), developed by the European Centre for Medium-Range Weather Forecasts (ECMWF) in collaboration with the University of Reading and the Joint Research Centre of the European Commission. This system couples the Hydrology Tiled ECMWF Scheme for Surface Exchanges over Land (HTESSEL) [33] and LISFLOOD models [34]; it provides streamflow estimates at a global scale from 1979 to the present, using ERA5 as the climatological input data. Global hydrological models are powerful tools for reconstructing components of the water balance because they generate continuous data, which can be used in applications such as hydrological model calibration [35]. Given the recent release of GloFAS, its potential has not been fully explored.
Central America is an area in which remotely sensed data can be highly useful for hydrological modelling to improve estimates of water resources [36]. Tan et al. (2021) [37] reviewed 123 articles regarding the use of alternative climate products in SWAT modelling. The authors found only one study conducted in Mexico and no precedents of this type of study for Central America.
In light of the above, this work may be of interest to stakeholders who model watersheds located in Central America. We selected the Grande San Miguel (GSM) River Basin as a case study, because many of the problems discussed above occur there. These include a low density of stations that provide precipitation and temperature records, a substantial percentage of missing data, and difficulty in obtaining streamflow data to enable model calibration. Using monthly flow data provided by the Ministry of Environment and Natural Resources in El Salvador for the period 2005-2010, we explored the potential of the GLoFAS-ERA5 river discharge reanalysis dataset for calibrating hydrological models in ungauged watersheds. The use of remotely sensed rainfall data for hydrological simulation is common in recent literature [37]. However, the use of globally generated flow data from remotely sensed data for calibrating a hydrological models is very novel because the release of these products is so recent [38,39]. This paper addresses the following objectives: (1) to evaluate the performance of precipitation and temperature variables using satellite reanalysis data such as CFSR, CHIRPS and ERA5 throughout the GSM River Basin; and (2) to assess the GLoFAS-ERA5 river discharge reanalysis dataset's potential for calibrating hydrological models and its relation to precipitation and temperature reanalysis data used as input data. To date, few studies [21,40] have analysed the effectiveness of reanalysis data that includes temperature for simulating hydrological processes in a watershed. Most studies have considered only precipitation data.

Study Area
The GSM River Basin is geographically located in the east of El Salvador; it covers 2377 km 2 up to the outlet control point (Figure 1). The basin is among the largest in El Salvador. The city of San Miguel is situated at its core and is El Salvador's second most populous city. The basin is ecologically sensitive in terms of international protection, such as the protected zones of Tepaca-San Miguel and Jiquilisco Bay. Tecapa-San Miguel is known for its coffee plantations, coastal plain wetlands, and volcanic craters; the area includes several lagoons listed under the Ramsar Convention on Wetlands. Since 2005, the Jiquilisco Bay-which is located at the mouth of the Grande de San Miguel River-has been designated as a Ramsar site and a UNESCO biosphere reserve.  Figure 1 shows the spatial distribution of rainfall stations in the GSM River Basin. Most of the existing weather stations are located in the lowlands, between 100 m and 200 m above sea level. As indicated in Table 1, three of the four available meteorological stations had more than 20% of data missing during the period under study (2005-2010). According to Tan and Yang (2020) [13], missing data of more than 20% significantly affects the simulation of flows in tropical climates. Given this fact and the low density of available stations, we used observed data to analyse the performance of the rainfall and tempera- Polluted water and the potential need for agricultural water are the two most pressing challenges in the GSM River Basin. [41]. To propose effective governance methods to mitigate the effect of these stress factors, the precise simulation of hydrological processes at the basin scale is crucial.

In Situ Rainfall and Temperature Data
This region's climate is tropical, with high annual precipitation rates. However, the intra-annual distribution is uneven, with 90% of precipitation falling during the rainy season between May and October and scattered showers occurring during the dry season between November and April [36,42]. According to weather station measurements, the average annual precipitation is 1700 mm. The wettest months are from May to October and the driest months are from November to April. The basin is occasionally crossed by hurricanes, especially in September and October, which cause substantial flooding. Maximum temperatures are as high as 37 • C, and minimum temperatures drop to 17 • C. The altitude ranges from sea level to higher than 2000 m at the San Miguel Volcano.
Andosols, phaeozems, and regosols are the three most common soil types in the area (Figure 1c). The andosols that cover the area around the San Miguel Volcano are volcanic soils, which are highly permeable and have ideal agricultural qualities [43]. Regosols are unconsolidated materials with fine granulometry, common in mountainous areas. This is the dominant soil type at the northern boundaries. By contrast, phaeozem soils are abundant in the eastern part of the basin; they accommodate wet grasslands and forest regions because they are porous and fertile, and they provide excellent agricultural land (FAO, 2008). Grassland and pasture (43%), crops (32%), and forest (17%) are the most common land uses. The land-use map of the basin is shown in Figure 1d. Figure 1 shows the spatial distribution of rainfall stations in the GSM River Basin. Most of the existing weather stations are located in the lowlands, between 100 m and 200 m above sea level. As indicated in Table 1, three of the four available meteorological stations had more than 20% of data missing during the period under study (2005)(2006)(2007)(2008)(2009)(2010). According to Tan and Yang (2020) [13], missing data of more than 20% significantly affects the simulation of flows in tropical climates. Given this fact and the low density of available stations, we used observed data to analyse the performance of the rainfall and temperature reanalysis data, but we did not simulate flows based on observed data.

ERA5 Reanalysis Dataset
The ECMWF's most advanced reanalysis output is ERA5. This output was recently released with a resolution of roughly 30 km and can be used to compute many atmospheric variables from January 1950 to near real-time [25]. In the current study, the ERA5 hourly rainfall and temperature were extracted from the toolbox available on the Copernicus Climate Data Store website (https://cds.climate.copernicus.eu, accessed on 1 April 2021) and aggregated to the daily time step.

CHIRPS and CHIRTS
The CHIRPS dataset is the result of a collaboration between the United States Geological Survey and the University of California. It consists of a rainfall grid with a geographical resolution of 0.05 • that combines data from satellites with data from on-site rainfall stations. The dataset was created using the following sources [22]: • the Tropical Rainfall Measuring Mission (TRMM) 3B42 product from NASA • the monthly precipitation climatology (CHPClim) • atmospheric model rainfall fields from the National Oceanic and Atmospheric Administration (NOAA) Climate Forecast System version 2 (CFSv2) • quasi-global geostationary thermal infrared (IR) satellite observations from two NOAA sources • in situ rainfall observations More recently, a temperature dataset with the same spatial resolution as CHIRPS has been developed on a daily scale. It entailed merging the monthly CHIRTS and disaggregating the monthly data using daily temperatures from ERA5 [24]. On the Climate Hazards Group website (https://www.chc.ucsb.edu/data/, accessed on 5 April 2021), users can obtain daily CHIRPS v2.0 and CHIRTS v1.0 data.

CFSR
The CFSR product was developed by the National Centers for Environmental Prediction (NCEP) [44]. It uses advanced data-assimilation methods and data from a global network of weather stations and satellite-based products; it also draws on complex atmospheric, oceanic, and surface modelling elements coupled with a resolution of 0.30 • and covering any land location in the world [20]. The available CFSR data is available for 1979 to 2014 and can be downloaded from the SWAT website (https://globalweather.tamu.edu/, accessed on 5 April 2021).

GloFAS River Discharge Reanalysis Dataset
The GloFAS is part of the Copernicus Emergency Management Service (CEMS). The dataset was developed through collaboration between the ECMWF, the Joint Research Centre of the European Commission and the University of Reading (www.globalfloods.eu, accessed on 22 March 2021). The GLoFAS river discharge reanalysis dataset is a product of CEMS and is produced by coupling surface and subsurface runoff data from the HTESSEL surface model used forced by ERA5 reanalysis data [25] with the Distributed Water Balance and Flood Simulation (LISFLOOD) hydrological and channel routing model [34].
The model was calibrated using more than a thousand flow stations located in 66 different countries. It achieved a median Kling-Gupta efficiency (KGE) values of 0.67 and a correlation value of 0.80 [35]. The river discharge reanalysis, with daily time steps and 0.1 • spatial resolution, is freely available to download for the period 1979 until nearpresent through the Copernicus Climate Data Store (https://cds.climate.copernicus.eu, accessed on 1 April 2021).

Materials and Methods
The methodological approach followed in this study is illustrated in Figure 2. It consisted of two main steps: (1) a comparison of rainfall and temperature data from reanalysis products with observed weather gauge data; and (2) an evaluation of the applicability of the flow data available in GLoFAS for the calibration of the SWAT hydrological model on a monthly scale. In the latter step, the weather input data used were ERA5, CHIRPS-CHIRTS and CFSR. The methodological approach followed in this study is illustrated in Figure 2. It co sisted of two main steps: (1) a comparison of rainfall and temperature data from reanaly products with observed weather gauge data; and (2) an evaluation of the applicability the flow data available in GLoFAS for the calibration of the SWAT hydrological model a monthly scale. In the latter step, the weather input data used were ERA5, CHIRP CHIRTS and CFSR. To perform the evaluation, streamflows were first assessed using each of the rean ysis products as input values; the monthly streamflows were simulated from the defa values of the parameters in the SWAT model. Second, each simulation was calibrated dependently using the GLoFAS data as the observed data. Finally, the accuracy of t GLoFAS-calibrated models for reproducing the observed monthly flows was assessed. To perform the evaluation, streamflows were first assessed using each of the reanalysis products as input values; the monthly streamflows were simulated from the default values of the parameters in the SWAT model. Second, each simulation was calibrated independently using the GLoFAS data as the observed data. Finally, the accuracy of the GLoFAS-calibrated models for reproducing the observed monthly flows was assessed.

SWAT Model Description
The SWAT model is a physically based and distributed, and continuous, time model. It is used to model rainfall runoff at the basin scale [10]. Several global studies have applied the SWAT model to investigate hydrological and water quality processes [45][46][47], the impact of human pressure on water resources [48][49][50], and the consequences of climate change [36,[51][52][53]. The model's GIS interface [54] allows for simple and quick data processing, such as watershed delineation and spatial and tabular data handling.
A watershed is divided into multiple sub-watersheds by SWAT. These are further subdivided into hydrological response units (HRUs), which include homogeneous land use, soil, and land slope. Water balance components, sediment flow, plant development and nutrient loss are some of the major processes that the model can replicate. To simulate the water balance components, SWAT solves the following equation: where SW t is the final soil water content (mm), SW 0 is the initial soil water content (mm), t is the time in days, R day is the precipitation (mm), Q sur f is the surface runoff (mm), E a is the evapotranspiration (mm), W seep is the percolation (mm) and Q gw is the return flow (mm). Neitsch et al. (2012) [55] provide more information on the operation of the SWAT hydrological model.

SWAT Model Setup
We used the QGIS interface for SWAT, namely QSWAT version 3 [54], to build the model with publicly available information. In this study, the spatial data for the SWAT model includes a digital terrain model, land-use map, and soil map. For basin delineation, we acquired the Advanced Spaceborne Thermal Emission and Reflection Radiometer Global Digital Elevation Model (ASTER GDEM) from the official website, with a resolution of 30 m (Figure 1b). The Harmonized World Soil Database, published by the United Nations Food and Agriculture Organization (using a grid size of 1 km × 1 km) was used to extract soil data (Figure 1c). El Salvador's Ministry of Environment and Natural Resources provided the land-use map (Figure 1d). Potential evapotranspiration rates were calculated using the Hargreaves method [56] because it requires only temperature data.

SWAT Model Calibration
To evaluate the remote-sensing precipitation and temperature data and the monthly flow simulation, we selected the periods 2005-2008 and 2009-2010 were selected as the calibration and validation periods, respectively. Precipitation and temperature data from ERA5, CHIRPS-CHIRTS, and CFSR were available for a longer period, which allowed us to use a three-year warming period (2002)(2003)(2004) to drive the SWAT model to a steady state. Twelve regularly used flow calibration parameters and their ranges were chosen, based on past experiences with similar watersheds [36] to integrate the components of surface runoff, groundwater, and soil data. The SWAT Calibration and Uncertainty Program (SWATCUP) [57] includes the Sequential Uncertainty Fitting Procedure version 2 (SUFI-2) optimisation method. We used this to perform monthly automatic calibration. The Nash-Sutcliffe model efficiency coefficient (NSE) was employed as the objective function, and 2000 simulations were performed. Table 2 shows the list of adjusted SWAT parameters. The range of variation and the final values were determined after calibration, as a function of the gridded dataset.

Performance Evaluation of the Reanalysis Datasets and Simulated Streamflow
Our aim was to qualitatively compare the ERA5, CHIRPS, and CFSR reanalysis datasets with the rain gauge observations. The following statistical indices for validation were used: the correlation coefficient (CC or R 2 ), mean (M), standard deviation (SD), mean error (ME), root-mean square error (RMSE), and relative bias (BIAS). The linear correlation is indicated by CC, the average difference is shown by RMSE, and the average error magnitude between the reanalysis precipitation and observed rain gauge data is shown by ME. The systematic bias of the satellite precipitation is described by BIAS.
Rainfall detection capability was analysed using three categorical statistical indices: (1) the probability of detection (POD); (2) the false alarm rate (FAR); and (3) the critical success index (CSI). The POD is also known as the hit rate. This is the ratio of total rainfall events that are successfully recognised by the reanalysis datasets. The FAR indicates the percentage of falsely warned rainfall events among all warnings. The most balanced and accurate detection statistic is the CSI, which is a function of POD and FAR [58]. The POD, FAR, and CSI scores range between 0 and 1, with 1 being a perfect score for POD and CSI and 0 for FAR. The formulas and further details about the indices appear in Jiang et al. (2018) [59].
To assess the SWAT model's accuracy, we included the coefficient of determination (R 2 ), the Nash-Sutcliffe efficiency ratio (NSE), percentage bias (PBIAS), observed data SD ratio (RSR), and the Kling-Gupta efficiency ratio (KGE). These statistics are extensively used in hydrological research [60]. At the monthly scale, when the PBIAS is below 25% and the NSE and KGE are above 0.5, and the RSR is below 0.7, the model's performance is considered to be adequate [61,62].

Comparison between Observed and Reanalysis Datasets
Precipitation data from the three reanalysis datasets (CFSR, ERA5, and CHIRPSv2.0) were directly compared to precipitation data from rainfall stations in the GSM River Basin. Daily precipitation data was collected from the reanalysis data grid cells closest to the available weather stations; days with no observed data were omitted from the comparative analysis. To enable conclusions regarding the flow simulation, we used the same period to evaluate the accuracy of the precipitation data as the period for which the flow data was available (2005-2010).
The validation statistics for the GSM River Basin are presented in Table 3. Among the three reanalysis datasets, the CHIRPS was more accurate; it yielded low ME values together with higher CC and CSI values. Hence, it performed best in both accuracy and detection capability. The results obtained from ERA5 and CFSR were also acceptable. In the case of ERA5, the correlation with observed data was slightly lower than that yielded by CHIRPS. Of the three reanalysis datasets, ERA5 achieved a monthly SD most similar to that of the observed data. However, ERA5 presented the highest BIAS of the three reanalysis datasets analysed, overestimating the rainfall values at some weather stations by more than 40%. The higher amount of rainfall explained why ERA5 yielded relatively high POD and FAR values.
The CFSR yielded a smaller correlation with the observed data than CHIRPS and ERA5. Conversely, the BIAS was lower than that shown by ERA5, which signified overestimation or underestimation of the rainfall depending on the station analysed. The lower BIAS value was related to a lower POD and FAR compared to the results obtained with ERA5. On average, CSI was similar for both CFSR and ERA5, which implies a similar detection capability.  Figure 3 shows the probability density function of rainfall events on a daily scale. It is evident that all the remote-sensing data we analysed missed some rain events and CHIRPS was most similar to the observed data in this regard. ERA5 clearly overestimated the amount of light and medium rainfall events (where 'light' refers to daily rainfall of 1-5 mm) and medium refers to daily rainfall of 5-20 mm). CFSR, despite overestimating these rainfall events, was the reanalysis dataset that most closely reflected the observed data for medium rainfall events. Among the three reanalysis datasets, CHIRPS best represented light rainfall events, but it significantly overestimated medium rainfall events. Regarding the highest intensity events (daily rainfall over 20 mm), the three reanalysis datasets yielded similar performances.  As evident in Figure 4, monthly observed rainfall and variations in rainfall patterns were also analysed. In the left column, violin plots combine box plots and a kernel density plot to simultaneously represent the data distribution and probability density. Except for MIG, the density distribution displayed a consistently more accurate adjustment when using the CHIRPS data. The median prediction is shown as a white dot in the graphs, and significant differences were detected. In general, ERA5 overestimated the median value, except at the CHA station (located at the highest altitude), where the reanalysis data resulted in an underestimated median value. Similarly, ERA5, CHIRPS and CFSR adequately reflected the inter-annual variation in precipitation; they indicated the existence of a dry period from November to April and a wet period from May to October.
A characteristic aspects of the climate in the study area is a maximum monthly rainfall that occurs between June and September, interrupted by a typical mid-summer drought during the month of July [36,64]. This pattern was detected by all the products we assessed. In addition, unlike CFSR, the ERA5 and CHIRPS datasets overestimated the average monthly rainfall reported during the rainy season. We also found that the scatterplots suggested a higher performance for the CHIRPS data, with an overall closer fit with the observations. This finding was supported by the calculated CC values. Using the The general overestimation of the number of rainfall events and the volume of rainfall may be due to the ability of satellites to detect strong convective events while having more difficulty in detecting shallow and warm rains. In addition, the bias correction techniques generally used to correct satellite data often inflate the volume of rainfall in the detected events to compensate for the missed events [63].
As evident in Figure 4, monthly observed rainfall and variations in rainfall patterns were also analysed. In the left column, violin plots combine box plots and a kernel density plot to simultaneously represent the data distribution and probability density. Except for MIG, the density distribution displayed a consistently more accurate adjustment when using the CHIRPS data. The median prediction is shown as a white dot in the graphs, and significant differences were detected. In general, ERA5 overestimated the median value, except at the CHA station (located at the highest altitude), where the reanalysis data resulted in an underestimated median value. Similarly, ERA5, CHIRPS and CFSR adequately reflected the inter-annual variation in precipitation; they indicated the existence of a dry period from November to April and a wet period from May to October.  The observed monthly temperatures were compared to data from ERA5, CHIRTS and CFSR, as discussed in the previous section ( Figure 5). Although the shape of the density distribution and the monthly variations showed a good fit, we noted a significant overestimation of CHIRTS temperatures by 2-3 °C, depending on the month considered over the year. For CFSR, an overestimation of the monthly mean temperature was also detected for all months, which was far lower than that observed in CHIRTS. At the MIG A characteristic aspects of the climate in the study area is a maximum monthly rainfall that occurs between June and September, interrupted by a typical mid-summer drought during the month of July [36,64]. This pattern was detected by all the products we assessed. In addition, unlike CFSR, the ERA5 and CHIRPS datasets overestimated the average monthly rainfall reported during the rainy season. We also found that the scatterplots suggested a higher performance for the CHIRPS data, with an overall closer fit with the observations. This finding was supported by the calculated CC values. Using the CHIRPS data, the CC values for the tested weather stations ranged from 0.86 to 0.93. By contrast, for the CFSR and ERA5 datasets, the CC values ranged from 0.84 to 0.86 and 0.80 to 0.86, respectively.
The observed monthly temperatures were compared to data from ERA5, CHIRTS, and CFSR, as discussed in the previous section ( Figure 5). Although the shape of the density distribution and the monthly variations showed a good fit, we noted a significant overestimation of CHIRTS temperatures by 2-3 • C, depending on the month considered, over the year. For CFSR, an overestimation of the monthly mean temperature was also detected for all months, which was far lower than that observed in CHIRTS. At the MIG station, which was the only station for which temperature data was available, the data from ERA5 provided the best fit.

Model Performance before Calibration
When data is missing from observations, the performance of an uncalibrated model is an important indicator of how well the model performs [65]. The main purpose for which the SWAT model was conceived was to model ungauged rural watersheds [10]. The suitability of the different reanalysis datasets was evaluated by simulating flows within the SWAT model framework using default parameters. Figure 6 shows the observed and simulated monthly runoff in the GSM River Basin for the period 2005-2010. The criteria for evaluating the model performance are indicated in Figure 5, from which it is evident that CFSR yielded the best results. This was as expected, since this dataset contained the least biased reanalysis data. However, it is important to note that on a monthly scale, all the reanalysis datasets yielded adequate CCs, which ranged between 0.64 and 0.74. These results suggest that after calibrating the most sensitive parameters, the overall performance of the models may be acceptable.

Model Performance before Calibration
When data is missing from observations, the performance of an uncalibrated model is an important indicator of how well the model performs [65]. The main purpose for which the SWAT model was conceived was to model ungauged rural watersheds [10]. The suitability of the different reanalysis datasets was evaluated by simulating flows within the SWAT model framework using default parameters. Figure 6 shows the observed and simulated monthly runoff in the GSM River Basin for the period 2005-2010. The criteria for evaluating the model performance are indicated in Figure 5, from which it is evident that CFSR yielded the best results. This was as expected, since this dataset contained the least biased reanalysis data. However, it is important to note that on a monthly scale, all the reanalysis datasets yielded adequate CCs, which ranged between 0.64 and 0.74. These results suggest that after calibrating the most sensitive parameters, the overall performance of the models may be acceptable. Figure 6 shows the observed and simulated monthly runoff in the GSM River Basin for the period 2005-2010. The criteria for evaluating the model performance are indicated in Figure 5, from which it is evident that CFSR yielded the best results. This was as expected, since this dataset contained the least biased reanalysis data. However, it is important to note that on a monthly scale, all the reanalysis datasets yielded adequate CCs, which ranged between 0.64 and 0.74. These results suggest that after calibrating the most sensitive parameters, the overall performance of the models may be acceptable.

Model Calibration Using GLoFAS Discharge Data
As shown in Figure 2, we first compared the observed rainfall and temperature data and the reanalysis data. Then we evaluated the performance of three datasets as inputs for the SWAT model to simulate the observed flow in the GSM River Basin. For this purpose, the SWAT model was calibrated for each of the reanalysis products, including both precipitation and temperature. The SUFI-2 algorithm included in the SWATCUP software

Model Calibration Using GLoFAS Discharge Data
As shown in Figure 2, we first compared the observed rainfall and temperature data and the reanalysis data. Then we evaluated the performance of three datasets as inputs for the SWAT model to simulate the observed flow in the GSM River Basin. For this purpose, the SWAT model was calibrated for each of the reanalysis products, including both precipitation and temperature. The SUFI-2 algorithm included in the SWATCUP software was used to optimise 12 SWAT parameters. Parameter selection was based on previous studies in nearby catchments in El Salvador [36], as mentioned in Section 3.3.
In addition, a sensitivity analysis was conducted to determine the most sensitive parameters using each of the reanalysis datasets as input data and performing 500 model runs. As shown in Table 3, regardless of the reanalysis data used, CN2 was the most sensitive parameter, followed by GWQMN and ESCO; these parameters obtained the lowest p-values. The p-value for each parameter represents a test of the null hypothesis that the regression coefficient is equal to zero. According to Abbaspour et al. (2007) [66], the more sensitive the parameter, the smaller the p-value. Table 4 shows the parameter ranges and the final calibrated values for each of the reanalysis products. Among the calibrated parameters-and as demonstrated by the sensitivity analysis, CN2 was one of the most sensitive parameters as it is directly related to runoff generation [67,68]. We thus expected that the calibrated CN2 values would be substantially reduced to correct the overestimation of precipitation as detected using the reanalysis data, with the expected reduction being between 11.7% and 19.9%. In addition, ESCO was also reduced from the default value of 0.95 to values between 0.80 and 0.86, which represents an increase in evaporation generated by the model. These ESCO values are in line with those obtained in other tropical areas [36,69]. The largest discrepancies between the fitted values and the reanalysis data were noted for the groundwater parameters (ALPHA_BF, GWQMN, GW_REVAP, RCHRG_DP and REVAPMN). This result might be attributable to the inherent complexity of the volcanic aquifers in Central America; the aquifers display high permeability and fissure flows, making them very complicated to study [70]. However, ALPHA_BF varied from 0.24 (for CFSR) to 0.85 (for ERA5). The latter value indicates a faster recharge response [71], which is consistent with the volcanic aquifers in the study area.
The performance of the calibrated model for each of the input datasets is summarised in Table 5. The statistics show that the SWAT model simulated the GloFAS discharge flows reasonably well for both calibration (2005)(2006)(2007)(2008) and validation (2009)(2010) periods. This result was independent of the reanalysis data, since all the simulations had a CC ranging between 0.76 and 0.85, an NSE greater than 0.50, and a KGE value between 0.84 and 0.86. As expected, the best results were obtained using data from ERA5, which is used to obtain the global-scale flows in GloFAS.

Evaluation of the Simulated Monthly Streamflows for Various Scenarios
Finally, the simulated monthly scale flows obtained from the GloFAS calibration were compared with the observed flows. The simulations performed using CHIRPS-CHIRTS data showed the best fit, as evident in Figure 7. Nonetheless, all three simulations performed using ERA5, CHIRPS-CHIRTS, and CFSR data showed an acceptable fit with the observed streamflows.

Evaluation of the Simulated Monthly Streamflows for Various Scenarios
Finally, the simulated monthly scale flows obtained from the GloFAS calibration were compared with the observed flows. The simulations performed using CHIRPS-CHIRTS data showed the best fit, as evident in Figure 7. Nonetheless, all three simulations performed using ERA5, CHIRPS-CHIRTS, and CFSR data showed an acceptable fit with the observed streamflows.
These results demonstrate that when the ERA5 reanalysis data show an adequate fit, GloFAS discharge data could potentially be used to simulate the hydrological processes of ungauged catchments at the monthly scale. This would allow the use of distributed hydrological models such as SWAT to analyse fundamental aspects in water resource management-such as the impact of changes in land use or the climate. Similar to our findings, Eini et al. (2019) [72] reported that when precipitation reanalysis data represented well-observed precipitation (R 2 higher than 0.6) in a semi-arid basin in Iran, the result was reasonable simulations for river discharge. However, these results should be viewed with caution as they depend on the quality of the GloFAS adjustment to the observed flows. In this regard, Harrigan et al. (2020) [30] demonstrated that the quality of the GloFAS adjustment increased substantially along with the size of the catchment. We thus recommend that the methodology followed in our study should be replicated in larger catchments.

Limitations and Future Research Directions
This study demonstrates that when remotely sensed weather data are accurate with respect to observed climatological data, flow simulation is often accurate. Hence, the use of discharge data, such as GLoFAS, contributes to the correct simulation of the hydrological processes in a basin. However, several limitations need to be considered. Firstly, data These results demonstrate that when the ERA5 reanalysis data show an adequate fit, GloFAS discharge data could potentially be used to simulate the hydrological processes of ungauged catchments at the monthly scale. This would allow the use of distributed hydrological models such as SWAT to analyse fundamental aspects in water resource management-such as the impact of changes in land use or the climate. Similar to our findings, Eini et al. (2019) [72] reported that when precipitation reanalysis data represented well-observed precipitation (R 2 higher than 0.6) in a semi-arid basin in Iran, the result was reasonable simulations for river discharge. However, these results should be viewed with caution as they depend on the quality of the GloFAS adjustment to the observed flows. In this regard, Harrigan et al. (2020) [30] demonstrated that the quality of the GloFAS adjustment increased substantially along with the size of the catchment. We thus recommend that the methodology followed in our study should be replicated in larger catchments.

Limitations and Future Research Directions
This study demonstrates that when remotely sensed weather data are accurate with respect to observed climatological data, flow simulation is often accurate. Hence, the use of discharge data, such as GLoFAS, contributes to the correct simulation of the hydrological processes in a basin. However, several limitations need to be considered. Firstly, data from a single flow-gauging station at the outlet of the basin was used to calibrate the model. This means there is the possibility of equifinality issues with some parameters having optimal values that are physically unrealistic. Future research should include additional calibration with other variables that are available through remote sensing, such as evapotranspiration.
Second, NSE has been used as the objective function. This coefficient usually presents the problem of being weighted towards higher flows. The use of other objective functions would return different results, and it would be interesting to study the effect of the selected objective function on the results obtained.
Third, only the SWAT model was employed to test the methodological approach used in this work. Future research and performance testing with different hydrological models could help to clarify the limitations and strengths of our methodological approach. Finally, if observed data is available, future studies could assess the performance of GLoFAS discharge data on a daily and sub-daily basis.

Conclusions
This study evaluates the application of GLoFAS discharge data in model calibration in El Salvador, Central America. This is a country in which climatological input data and observed flow data for calibrating hydrological models is scarce or unavailable. GLoFAS determines the streamflow by applying a global-scale hydrological model that uses ERA5 reanalysis data as the input data. This work tested whether the streamflow data from GLoFAS provided a suitable option for calibrating hydrological models in ungauged catchments, as long as there is a good fit between reanalysis precipitation and temperature data and observed climatological data. Climatological reanalysis data (CHIRPS-CHIRTS and CFSR) were also evaluated. The following conclusions are presented: (1) The statistical indicators (CC, RMSE, ME, and BIAS) allowed the accuracy of the reanalysis data to be quantitatively evaluated. We found that CHIRPS performed best in reproducing the observed precipitation, despite consistently overestimating the rainfall. (2) In terms of rain detection ability, CHIRPS (CSI ranging from 0.52 to 0.63) displayed the greatest daily accuracy in detecting the precipitation occurrences. The next best were ERA5 and then CFSR. However, all three reanalysis datasets showed acceptable rainfall detection capability. (3) Among the three temperature reanalysis products, the performance of CHIRTS was the least accurate; it repeatedly overestimated mean temperature by 2-3 • C. By contrast, ERA5 and CFSR presented excellent agreement with the observed data.
(4) Models that were calibrated using GloFAS data as the observed data, independently of the precipitation and temperature data (ERA5, CHIRPS-CHIRTS and CFSR) showed acceptable model performance. This point was evident in the KGE values, which ranged from 0.74 to 0.79, and the R 2 values of between 0.57 and 0.78.
Overall, these findings demonstrate that reanalysis rainfall products can improve hydrological process modelling for Central American watersheds, where poorly gauged or ungauged watersheds are common. This research also highlights the GLoFAS dataset's potential for model calibration in catchments where the availability of streamflow data is limited. The availability of a calibrated hydrological model that adequately reflects the hydrological processes of a basin provides decision-makers with a tool to quantify the availability of water resources. The modelalso provides the basis for estimating the impact of land use changes or climate change on water resources.