Calibration and Validation of SWAT Model by Using Hydrological Remote Sensing Observables in the Lake Chad Basin

: Model calibration and validation are challenging in poorly gauged basins. We developed and applied a new approach to calibrate hydrological models using distributed geospatial remote sensing data. The Soil and Water Assessment Tool (SWAT) model was calibrated using only twelve months of remote sensing data on actual evapotranspiration (ETa) geospatially distributed in the 37 sub-basins of the Lake Chad Basin in Africa. Global sensitivity analysis was conducted to identify inﬂuential model parameters by applying the Sequential Uncertainty Fitting Algorithm–version 2 (SUFI-2), included in the SWAT-Calibration and Uncertainty Program (SWAT-CUP). This procedure is designed to deal with spatially variable parameters and estimates either multiplicative or additive corrections applicable to the entire model domain, which limits the number of unknowns while preserving spatial variability. The sensitivity analysis led us to identify ﬁfteen inﬂuential parameters, which were selected for calibration. The optimized parameters gave the best model performance on the basis of the high Nash–Sutcliffe Efﬁciency (NSE), Kling–Gupta Efﬁciency (KGE), and determination coefﬁcient (R 2 ). Four sets of remote sensing ETa data products were applied in model calibration, i.e., ETMonitor, GLEAM, SSEBop, and WaPOR. Overall, the new approach of using remote sensing ETa for a limited period of time was robust and gave a very good performance, with R 2 > 0.9, NSE > 0.8, and KGE > 0.75 applying to the SWAT ETa vs. the ETMonitor ETa and GLEAM ETa. The ETMonitor ETa was ﬁnally adopted for further model applications. The calibrated SWAT model was then validated during 2010–2015 against remote sensing data on total water storage change (TWSC) with acceptable performance, i.e., R 2 = 0.57 and NSE = 0.55, and remote sensing soil moisture data with R 2 and NSE greater than 0.85.


Introduction
The sustainable management of river and lake basins requires mitigating the vulnerability of the eco-hydrological systems against environmental stressors.This is widely considered a crucial priority by organizations such as the United Nations, national and international water resources bureaus, and research institutions.The challenge is augmented by the growth in population, leading to rapid and extensive changes in land use under severe climate variability [1].These two phenomena are more complicated and severe in arid and semi-arid regions, especially in developing countries.Due to its location, the African Sahel is most vulnerable to simultaneous changes in land use/land cover (LULC) and climate.In addition to the considerable changes in LULC, the arid and semi-arid climate in the Sahel is characterized by scarce rainfall as well as high temperature.These conditions have a large impact on the environment in general and particularly on the surface water resources [2].The United Nations articulated the Sustainable Development Goals (SDGs) to identify shared priorities towards a better future for all.Specifically, SDG 6 addresses water security through a detailed hierarchy of Tasks and Targets.A better understanding of the impacts of LULC and climate variability is needed to develop effective paths towards sustainable water security regarding both quantity and quality [3,4].
To reveal the hydrological processes that occur in a changing environment, distributed hydrological models such as the Soil and Water Assessment Tool (SWAT) [5] are most useful to assess the impacts of land/water management and climate variability on water quantity and quality.The SWAT model has been applied to understand the hydrological system and explain the impacts of land and water resource management scenarios [6,7].Such studies based on different climate and land-use scenarios require accurate model calibration and validation which are usually performed using a traditional method based on the ground observed data, e.g., discharge.However, in many regions, such data are very rare or totally absent in some cases for different reasons (e.g., in some basins in Africa); novel calibration and validation procedures are therefore needed.
In Africa, several studies have been conducted by applying the SWAT model in different basins [8][9][10][11][12][13]; the main limitation of these studies was the spatiotemporal coverage of data used for calibration and validation.These previous studies on SWAT applications in Africa documented some variability in model performance, although all aimed at improving model performance in data-scarce catchments.The Lake Chad Basin was selected as a study area because it is considered an ideal study case in terms of ground observation data scarcity.
Due to the scarcity of ground observed data used for calibration and validation, many studies found that retrievals of hydrological variables from remote sensing data may help to improve model performance.Multiple remote sensing actual evapotranspiration (ETa) products have recently become available with extended temporal coverage [14][15][16][17].These data have proven useful to calibrate and validate hydrological models, especially in datascarce basins [18].Recently, Ha et al. (2018) [19] used three years of remote sensing ETa in monthly time steps to calibrate the SWAT model for a tributary of the Red River in Vietnam.The performance of the calibration showed high-performance metrics.Poméon et al. (2018) [20] applied the SWAT model in the Niger, Volta, and Senegal River Basins; they used the traditional calibration method based on observed discharge data for calibration, and they validated the model using time series of remote sensing data in monthly time steps.They used actual evapotranspiration, soil moisture, and total water storage derived from remote sensing data to evaluate model performance after calibration.The multiobjective validation using remote sensing evapotranspiration, total water storage change, and soil moisture revealed good agreement between model estimates and observations.Odusanya et al. [21] applied the SWAT model in the Ogun catchment in Southwestern Nigeria.The SWAT ETa estimates using the Hargreaves equation performed well against the GLEAM ETa data.These studies have used several years of satellite-based data to calibrate and validate the SWAT model.They concluded that the remote sensing data could be potentially used for SWAT calibration and validation.The weak point of these studies is that they did not well emphasize the benefit of the geospatial distribution of remote sensing retrievals, which could resolve the problem of the lack of ground observation time series, e.g., discharge.
This study aims to provide a novel calibration approach of the SWAT model based on limited time series of earth observation data in the data-scarce Lake Chad Basin.This study has two objectives: (1) to evaluate the performance of the SWAT model after a limited calibration period (one year) using multiple satellite remote sensing ETa products, which would be the novelty of this study; and (2) to validate the model using remote sensing ETa, total water storage, and soil moisture in a distributed manner in the whole Lake Chad Basin.

Study Area
The Lake Chad Basin (LCB), located in the center of the African Sahel between 5.19 • N-25.29 • N latitude and 6.85 • E-24.45 • E longitude (Figure 1), is one of the most important catchments in the Sahel due to the dramatic decrease of water level and extent during the last decades.
This study aims to provide a novel calibration approach of the SWAT model based on limited time series of earth observation data in the data-scarce Lake Chad Basin.This study has two objectives: (1) to evaluate the performance of the SWAT model after a limited calibration period (one year) using multiple satellite remote sensing ETa products, which would be the novelty of this study; and (2) to validate the model using remote sensing ETa, total water storage, and soil moisture in a distributed manner in the whole Lake Chad Basin.

Study Area
The Lake Chad Basin (LCB), located in the center of the African Sahel between 5.19° N-25.29°N latitude and 6.85° E-24.45°E longitude (Figure 1), is one of the most important catchments in the Sahel due to the dramatic decrease of water level and extent during the last decades.The Lake Chad Basin is the largest endorheic basin in the world, with an initial area of 2,500,000 km 2 (about 8% of Africa) [22][23][24].The Lake Chad Basin is shared by 10 African countries, i.e., Algeria, Cameroon, Central African Republic, Chad, Libya, Niger, Nigeria, and Sudan.In terms of location, 44% of the lake area is in Chad, and 29% is in Niger.The mean annual precipitation in the Lake Chad Basin is about 415 mm, varying from 1215-1600 mm in the south-western parts of the basin (Central African Republic) to 20-150 mm in the northern region such as Algeria [25,26].In 1964, the Lake Chad Basin area was reduced to about 20% (427,000 km 2 ) of the initial area [27,28].In the same year, 1964, the Lake Chad Basin Commission (LCBC) was founded, representing four countries, i.e., Ni- The Lake Chad Basin is the largest endorheic basin in the world, with an initial area of 2,500,000 km 2 (about 8% of Africa) [22][23][24].The Lake Chad Basin is shared by 10 African countries, i.e., Algeria, Cameroon, Central African Republic, Chad, Libya, Niger, Nigeria, and Sudan.In terms of location, 44% of the lake area is in Chad, and 29% is in Niger.The mean annual precipitation in the Lake Chad Basin is about 415 mm, varying from 1215-1600 mm in the south-western parts of the basin (Central African Republic) to 20-150 mm in the northern region such as Algeria [25,26].In 1964, the Lake Chad Basin area was reduced to about 20% (427,000 km 2 ) of the initial area [27,28].In the same year, 1964, the Lake Chad Basin Commission (LCBC) was founded, representing four countries, i.e., Nigeria, Cameroon, Niger, and Chad.According to Policelli et al. [29], the water area continued to decrease till it reached an annual peak area estimated at 14,700 km 2 in 2017.After the dramatic decrease, Lake Chad became divided into a dry and hydrologically disconnected northern part and a southern part that is active [30].The northern part of the catchment belongs to the Sahara desert and does not generate runoff that reaches Lake Chad [31]; thus, we focused on the southern part of the Lake Chad Basin (Figure 1).

Data
We have used two types of data in this study: (a) the configuration and climate forcing data to run the SWAT model; and (b) remote sensing data used to calibrate and validate the model.In this study, the datasets were used from 2009 to 2015.These datasets were resampled to 250 m spatial resolution using ArcGIS resample tool to be aligned with LULC data, except for the remote sensing actual evapotranspiration products, which were used in their original resolutions.A detailed description of the data used in this study is given in the following sections.The forcing data are described in Section 2.2.1 (Table 2), while the data used for calibration and validation of the SWAT model are described in Section 2.2.2 (Table 3) and Section 2.2.4 (Table 1), respectively.Minimum and maximum daily temperature, wind speed, solar radiation, and relative humidity at 2 m height were the fifth generation of the European Centre for Medium-Range Weather Forecasts (ECMWF ERA5) reanalysis.These data are the improvement of ECMWF reanalysis first generation [32].The variables were used in the daily time step as required by the SWAT model.

Precipitation
Climate Hazards Group Infrared Precipitation with Station data (CHIRPS) were used in this study.The data were generated from 1981 to the present [] in daily, monthly, and yearly time steps.Compared to other precipitation products, its main characteristic is a fine spatial resolution of 5 km [33].CHIRPS uses several data sources, such as the monthly precipitation climatology from the climate hazard center's precipitation climatology (CH-Pclim), infrared measurements from geostationary satellites, and information from the TRMM Multi-satellite Precipitation Analysis (TMPA) 3B42 product.The precipitation estimates are merged with in situ gauge data from several archives, including the sparse World Meteorological Organization's Global Telecommunication System (GTS), to reduce biases [34].In this study, the precipitation data were used in the daily time step.

Remote Sensing ETa Data
Due to the scarcity of ground observations in the Lake Chad Basin, particularly of river discharge in the study area, four ETa products derived from remote sensing observations were used to calibrate the SWAT model.These four ET datasets include the global ET products from ETMonitor, the Simplified Surface Energy Balance for operational application (SSEBop_v4, version 4), the FAO WAter Productivity through Open access of Remotely sensed derived data (WaPOR_1.1,version 1.1, covers only Africa), and the Global Land Evaporation Amsterdam Model (GLEAM_3.3a,version 3.3a) (Table 3).The details of the four ETa datasets are described in Text A1.The selection of these ETa products was based on free availability, spatiotemporal coverage, and resolution.GLEAM and WaPOR have been validated in several countries in Africa; SSEBop and ETMonitor have been validated globally using data from ground observation sites.

Soil Moisture
The soil moisture product used in this study is version 5.2 of the European Space Agency Climate Change Initiative combined soil moisture data product (ESA CCI SM v5.2), which was generated by combining various available passive and active microwave-based soil moisture datasets and released in 2019 [39][40][41].It was generated by blending active and passive microwave soil moisture retrievals from the data acquired by C-band scatterometers and multi-frequency radiometer data.The data are available at a daily temporal resolution from 1978 to 2019 and a spatial resolution of 0.25 • and represent only the topsoil layer (0-5 cm).

Total Water Storage Change
The Gravity Recovery and Climate Experiment (GRACE) mission comprises two twin satellites following each other, 220 km apart.It was launched by the National Aeronautics and Space Administration (NASA) and Deutsches Zentrum für Luft-und Raumfahrt (DLR) to determine the global Earth gravity field every 30 days [42].The time-dependent distance between these two twin satellites is converted to a local Earth gravity field.Retrievals at different times can be applied to compute the mass changes in units of equivalent water height [43].Different "GRACE solutions," i.e., retrievals of local gravity, have been reported; for more information about GRACE, see, for example, the study by [44].The GRACE product used in this study is the global "mascons" solution [45].The data were downloaded from: (http://www2.csr.utexas.edu/grace;accessed on 13 January 2021).The GRACE provides the total water storage anomaly in the monthly time step (TWSA), which corresponds to the sum of all water mass variations at the continent's surface and in the soil [46].Monthly data from GRACE in the study period (2009-2015) yield a total of 84 months with gaps of 15 months (i.e., about 18% of the total data).The GRACE has a spatial resolution of 300 km; in this study, it was resampled to 1 km spatial resolution using the bilinear interpolation method without filling the gaps of missing months (15 months).The same missed months data in GRACE were not considered in the TWSC from the SWAT model in the comparison of the two datasets.
The monthly total water storage change (TWSC) was calculated as the difference between the values estimated in two subsequent months [47]: where TWSC t is the total water storage change at time step t; TWSA t and TWSA t−1 are the total water storage anomalies at steps t and t−1, respectively.

Auxiliary Data
The annual land use and land cover dataset (LULC) used in this study was produced by Tsinghua University based on MODerate-resolution Imaging Spectroradiometer (MODIS) data at a spatial resolution of 250 m.More details about the dataset can be found in [32].The LULC dataset is available from 2000 to 2015.The LULC dataset was used at its original spatial resolution of 250 m, and the other datasets were resampled to this spatial resolution to be used in the SWAT model.
The soil data used in this study were obtained from the Digital Soil Map of the World (DSMW) version 3.6 produced by the Food and Agriculture Organization (FAO) at 1 km resolution.The physical properties were extracted for each soil texture class from the Map Window (MW) interface of the SWAT database (http://swat.tamu.edu/software/mwswat/; accessed on 21 October 2020), which is compiled from FAO world soil data.The MWSWAT interface was developed by the WaterBase project (http://www.waterbase.org;accessed on 21 October 2020) of the United Nations University [48].
The Digital Elevation Model (DEM) of LCB was clipped out of the Shuttle Radar Topography Mission (SRTM) 30 m resolution Digital Elevation Data (www.earthexplorer.usgs.gov;accessed on 6 December 2020).

Methodology
The conceptual framework of this study is given in Figure 2. The overall approach used in this study is based on the hydrological SWAT model, which was run to estimate water balance components at each Hydrological Response Unit (HRU) to be aggregated to each of the 37 delineated sub-basins.Actual evapotranspiration (ETa), as a major component of water balance and terrestrial water cycle, is defined as the quantity of water that is actually removed from the land surface by the processes of evaporation (soil or water) and transpiration (vegetation).In this study, the ETa of the HRUs calculated by SWAT was combined with the spatially distributed ETa based on satellite remote sensing observations to calibrate the SWAT model parameters.Four satellite-observation-based ETa data products were evaluated, and the one with the best performance was used for further analysis.The SWAT model calculates the ETa using the potential evapotranspiration (ETp), defined as the measure of the ability of the atmosphere to remove water from the land surface through both evaporation and transpiration.In this study, three available ETp equations (Hargreaves, Priestley-Taylor, and Penman-Monteith) were used to configure the SWAT model to estimate the ETa (Figure 2a).In contrast to previous traditional calibration methods, a new calibration approach based on a limited record length of calibration data (i.e., monthly data for one year), but having rich spatial information from the spatially distributed remote sensing ETa data, was applied using the SUFI-2 algorithm in the SWAT-CUP tool-kit for each sub-basin.To accomplish these steps, four satellite-based ETa datasets were tested to evaluate the performance of the new calibration approach (Figure 2b).For validation, the results from the calibrated SWAT were assessed by comparing with the satellite-based observations of surface layer soil moisture and terrestrial water storage change (Figure 2c).

Model Description
The eco-hydrological model Soil and Water Assessment Tool (SWAT) is an open-source, process-based, and semi-distributed model [5].The model uses daily meteorological data, i.e., precipitation, wind speed, air temperature, relative humidity, and solar radiation, in addition to topography, soil data, and LULC, to simulate different water balance components in a watershed.The model can also describe the water quality and soil erosion in the basin.SWAT model describes watershed hydrology in two stages: (a) sediments, nutrients, and water flow into the main channel of each sub-basin; and (b) transport of sediments, nutrients, and water through the network of channels to the outlet of the main watershed.The catchment water balance calculation in SWAT is based on the principle of conservation of mass as follows: where SWC t is the final soil water content of the simulation period for the entire soil profile (mm), SWC 0 is the initial soil water content on day i (mm), t is the simulation time (day), R day is the precipitation on day i (mm), Q surf is the surface runoff on day i (mm), ET a is the actual evapotranspiration on day i (mm), W sep is the water entering the vadose zone from the soil profile on day i (mm), and Q gw is the return flow on day i (mm).
bration data (i.e., monthly data for one year), but having rich spatial information from the spatially distributed remote sensing ETa data, was applied using the SUFI-2 algorithm in the SWAT-CUP tool-kit for each sub-basin.To accomplish these steps, four satellite-based ETa datasets were tested to evaluate the performance of the new calibration approach (Figure 2b).For validation, the results from the calibrated SWAT were assessed by comparing with the satellite-based observations of surface layer soil moisture and terrestrial water storage change (Figure 2c).The soil dataset used in this study was the Digital Soil Map of the World (DSMW), which provides physical characteristics of the soil profile split into two layers: 0−300 mm and 300-1000 mm.SWAT model calculates the soil moisture (SWAT SM) as plant-available water (PAW) [46], specifically as the mean values for the soil layers at 0−10 mm as part of the first layer (septic layer), 0−300 mm, and 300−700 mm.The PAW is calculated as: where PAW ly is the plant-available water in a layer "ly" (mm), SWC ly is the total soil water content in the same layer (mm), and WP ly is the soil water content at the wilting point (mm) (i.e., permanent wilting point).
The permanent wilting point (WP) is the amount of water held in the soil at a tension of 1.5 MPa.The WP in each soil layer was estimated from soil texture [46] using the clay fraction and the mean bulk density in each HRU as: where Bd ly is the bulk density of the layer (Kg/m 3 ), the f clay, ly is the percentage of the clay content in the layer.
Since the soil dataset used in this study does not have a 50 mm soil layer, this SWAT model could not compute the soil moisture at 50 mm to be compared to ESA CCI SSM (50 mm).In this part, the conversion of the SWAT SM at the soil layer 0-300 mm into volumetric soil moisture was performed by dividing it by the depth of 300 mm and adding the soil water at the wilting point of the same soil layer (0-300 mm), then multiplying by 50 mm to obtain the soil moisture at 50 mm as shown in the following equation: where SM 50mm is soil moisture at 50 mm soil depth (mm), PAW 300mm is plant-available water at 300 mm soil depth (mm), WP 300mm is wilting point of the same soil layer (mm).
The values of SM 50mm obtained soil moisture were compared with the ESA CCI SSM multiplied by 50 mm.
The SWAT model does not directly calculate the TWSC as output.However, it simulates the shallow aquifer and the deep aquifer in the function of recharge and percolation as described in [49,50].The groundwater (shallow and aquifer storages) is given by the SWAT model as outputs in "output.hru"file.So, the TWSC was estimated as the sum of the soil water content (SWC), deep aquifer (DA), and shallow aquifer (SA).Then, the deviation of monthly TWSC was estimated according to Poméon et al. [20] as: where ∆TWS t is the total water storage change at time step t and estimated by SWAT, DA t is deep aquifer water storage, SA t is shallow aquifer water storage, and SWC t is soil water content.The overbar indicates the long-term average and applies to the period 2009-2015.
All variables are expressed in mm.
The thresholds, i.e., fraction of area of specific LULC class in each sub-basin (F _LULC ), the fraction of area of specific soil type in that LULC class area (F LULC_soil ), and the fraction of specific slope class in the area of that land cover type and that soil type (F LULC_soil_slope ), are employed to select the LULC classes, soil types, and slope classes that need to be taken into account in a given sub-basin.We carried out the experiments by comparing the SWAT model results from using different combinations of thresholds (i.e., F _LULC , F LULC_soil , F LULC_soil_slope ) with the results from the standard delineation of HRU, which is carried out using zero-thresholds (i.e., F _LULC , F LULC_soil , and F LULC_soil_slope were all >0% sequentially).The experiment results showed that the best agreement was obtained by using the thresholds of LULC class, soil type, and slope class of 5%, 20%, and 20%, respectively, which were used in this study to run SWAT.Those LULC classes, soil types, and slope classes that were below the thresholds would be reapportioned into the other qualified land covers, soil types, and slope classes, respectively.If no one HRU in a subbasin can be defined by the thresholds (meaning all are below the defined thresholds), the model selects the combination that has the largest fraction of areas of land cover, soil type, and slope class, respectively, as a single HRU.A similar procedure could be found in other studies, though they have used different values of the thresholds for their study cases [52,53].
In this study, we used annual land use data, i.e., the land use changed for each year from 2009 to 2015.This change in land use led to the change in the number of HRUs for each year, as shown in Table 4.After the delineation of the HRUs, the weather data (precipitation, temperature, wind speed, relative humidity, and solar radiation) were averaged for each HRU in the Lake Chad Basin catchment.
The SWAT-simulated ETa was used for the calibration of model parameters as described in Section 3.3.Literature and our own experiments showed that the SWAT estimates of ETa are highly sensitive to the choice of the equation applied to estimate ETp.We could not select one equation among the three equations provided in the SWAT a priori; therefore, we designed the calibration experiments to shed light on model performance in the estimation of ETa.Our numerical experiments on SWAT calibration and validation showed that the best performance was achieved by using the ETMonitor ETa combined with SWAT ETa calculated based on Hargreaves equation for ETp; the results from this combination will be used for further analysis.

Calibration Procedure
The calibration of the SWAT model was performed using the semi-automated multisite and inverse modeling algorithm SUFI-2 in the SWAT-CUP tool-kit [54,55].The parameters to be calibrated were initially preselected (about 40 parameters) on the basis of literature [11,18,19,21,[55][56][57][58].Furthermore, a global sensitivity analysis was carried out to identify the parameters most influential on ETa by applying the multiple regression method following the study by Abbaspour et al. (2015) [54].An iteration included 1000 simulations for all HRUs, and the parameters were ranked on the basis of the pand the t-test values.The latter provide information about the influence of the parameter on ETa, with a larger absolute value indicating more influence on ETa, while the former indicates the significance of the influence, i.e., a value close to zero indicates highly significant influence.Based on the global sensitivity analysis, 15 parameters were selected to be determined by calibration with SUFI-2 (Figure 2b).
The traditional approach to calibrating hydrological models such as SWAT relies on discharge measurements at hydrometric stations.This requires multiple stations distributed in a river basin and a long time series of discharge data, in which the calibration period generally is longer than the validation period.In our study, we used raster-based spatially distributed retrievals of hydrological variables from multiple remote sensing observations.The spatial coverage of such data, which provided rich information in space, made it possible to limit the temporal coverage of the calibration experiments to one year at a monthly time step.Specifically, the calibration was performed using satellite retrievals of actual evapotranspiration (ETa), i.e., ETMonitor, GLEAM, SSEBop, and WaPOR.Monthly observations for each of the 37 delineated sub-basins in the watershed were applied for each yearly calibration or validation, adding up to 444 data points to estimate the 15 selected parameters.
The parameter values are assigned to each HRU, and two types of parameters are considered: (1) heterogeneous, i.e., specific for each soil type, slope range, or LULC class, which have a different value in each HRU; (2) homogeneous, i.e., constant in the entire watershed.SWAT-CUP allows three types of updates on the selected model parameters through the iterations: (a) by a multiplicative factor "(1 + α)"; (b) by adding a constant "β"; and (c) assigning a parameter a new value "γ".The default parameter value is replaced with a new candidate value in each iteration by: where P new is the updated parameter value in each iteration, P is the value in the last iteration with the initial value taken from the "default value" in SWAT.In this study, we have applied corrections of types (a) and (c).The former is generally applicable to parameters having a well-defined spatial distribution, such as soil properties assigned to each soil type in a soil map.The calibration procedure preserves the soil spatial pattern within the watershed.The latter is applicable to parameters that are process-rather than location-related, such as the maximum stomatal conductance.The initial ranges of different parameters were estimated on the basis of literature [49,51,54].The 15 parameters are listed in Table 5, which also specifies which type of correction was selected for each parameter.
The calibration was performed at the sub-basin level, i.e., the corrections detailed in Table 5 are estimated for each sub-basin.A full SWAT run is carried out, and new monthly values of ETa are obtained and compared with ETa from satellite-based observations.The accuracy of the SWAT model estimate of ETa is evaluated using the Nash-Sutcliffe efficiency (NSE) [59,60].The calibration is carried out by evaluating the candidate correction values iteratively for the selected parameters to maximize the NSE value.The performance is considered satisfactory when the objective function reaches a certain threshold, e.g., NSE = 0.5, and the iterations are terminated.The values of the parameter corrections in the last iteration are the final result of a calibration experiment.

Evaluation of Calibration, Validation, and Uncertainty Analysis
The uncertainty in the input data, the observed variable used for calibration, and calibration parameters lead to uncertainty in the output results [61].The analysis of the uncertainties of the outputs caused by the propagation of the uncertainty of the parameters used for calibration was performed using SUFI-2.These uncertainties are expressed as 95% probability distributions (95PPU).In this stage, the SUFI-2 depicts the 95PPU of the model estimates compared with the ETa retrievals.The 95PPU was computed at the 2.5% and 97.5% levels of the cumulative distribution of the model ETa generated by the propagation of the uncertainties using Latin Hypercube Sampling.The uncertainties were evaluated based on two factors: the P factor and the R factor [62,63].The P-factor is the fraction of observations enveloped by the model estimates (95PPU) and varies from 0 to 1, where 1 indicates 100% bracketing of the observations within model simulations.The R-factor is the thickness of the 95PPU envelope, with values ranging between zero and infinity.The ideal situation is when the P factor is g close to 1, and the R factor is close to 0.
The metrics of model performance are recommended by Moriasi et al. [59,60], i.e., NSE, PBIAS, and R 2 are usually applied to calibration and validation of model estimates of monthly sediment and nutrient transport and runoff.In this study, the calibration was performed using satellite retrievals of actual evapotranspiration.We applied the evaluation methods that were well documented in the literature [18,19,21,64,65], where calibration and validation of hydrological models were carried out by applying the NSE, PBIAS, and R 2 metrics according to Moriasi et al. [59,60].
In this study, the simulation period (2009-2015) was split into one year for calibration (2009) and six years for validation (2010)(2011)(2012)(2013)(2014)(2015).The calibration experiments have been described in Section 3.2 on model setup and Section 3.3.Then, according to the user manual of SWAT-CUP [61], the number of simulations in each iteration is recommended to be 500, but if the calibration is too slow, it could be even less (200-300).SWAT-CUP has been run for five iterations, each one comprising 500 simulations.The validation was performed by running the SWAT model using the calibrated values of the 15 model parameters.Likewise, the calibration and the validation were performed by comparing the model ETa with each one of the four remote sensing ETa datasets for the period (2010-2015).
We applied four performance metrics: the R squared coefficient (R 2 ), the Nash-Sutcliffe Efficiency (NSE), Kling-Gupta Efficiency (KGE), and the percent bias (PBIAS) to evaluate the SWAT calibration and the validation, as performed by Odusanya et al. [21].The R 2 is a metric of the strength of the relationship between the data and the fitted regression line.The range of R 2 is from 0 to 1, where the value close to 1 means less error variance.
The NSE is used to compare the relative magnitudes of the residual ("noise") and the variance of the observations [62].NSE varies from −∞ to 1, where NSE > 0.5 means a good agreement [59,60] between model estimates and observations.NSE close to 1 is the optimal value.
The KGE goodness-of-fit metric was developed by [66], and it measures the relative importance of different metrics, i.e., correlation, bias, and variability, in hydrologic simulations.The range of KGE is from −∞ to 1, with KGE = 1 being the optimal value.
The PBIAS is the relative bias, which indicates whether the model data are larger or smaller than the observations.The ideal value is zero, i.e., a value closer to zero indicates better model accuracy.A negative value indicates that the model overestimates observations, and vice versa if the value is positive [67].The performance metrics equations are all described in Table 6.
Table 6.Metrics used to evaluate the calibration and validation.

Performance Metrics Equations Descriptions
Coefficient of determination

Results
The SWAT-CUP was run for five iterations comprising 500 simulations each.The values of the fifteen corrections estimated in each calibration experiment are given in Table A1.

Sensitivity Study
To explain the effectiveness of the ETa vs. discharge-based calibration, we have determined the sensitivity of the selected parameters to the simulated ETa and discharge.SWAT was run using the initial range of 15 selected parameters described in Table 5 for 1000 simulations.To determine the sensitivity of the parameters to the simulated ETa and discharge, the variation of the parameter values during the simulations was used.Two linear regression projects were implemented.The first was to evaluate the 15 parameters' sensitivity to simulated ETa, while the second was to determine the sensitivity of the same 15 parameters to simulated discharge.The results are shown in Table 7.The higher the absolute value of the t-stat, the higher the parameter sensitivity, provided the p-value is lower than 5%.The t-stat values of the parameters for ETa are much higher than the t-stat values for discharge (e.g., CN2 t-stat = −26.91 for ETa while it is 2.93 for discharge).Moreover, the number of parameters that have very high sensitivity to ETa (p-value <<< 0.05) is higher than those that have high sensitivity to discharge (eight parameters vs. three parameters).The configuration using the Hargreaves ETp (Figure 3) achieved the best performance compared to other configuration ETp equations (Figure A1), so only these results are presented in detail (Table 8).The P-factor varied between 0.2 and 0.3, with the highest achieved with the GLEAM retrievals.Note that p = 0.29 means that 29% of the model estimates were within the 95% PPU.The lowest p = 0.21 was obtained with SSEBop, while ETMonitor gave p = 0.27.The R-factor varied between 0.07 and 0.29, where the lowest was achieved with the ETMonitor ETa and the highest with WaPOR.The ETMonitor and the GLEAM ETa data gave the best performance, i.e., the highest P-factor values and the lowest R-factor values (Table 8).The validation experiments gave higher P-factor and r-factor values than the calibration with all the ETa retrievals (Table 9).The P-factor varied between 0.25 (WaPOR) and 0.65 (GLEAM).The R-factor varied between 0.16 (ETMonitor) and 0.96 (WaPOR).The highest P-factor and the lowest R-factor were achieved with the GLEAM and ETMonitor data.The configuration using the Hargreaves ETp (Figure 3) achieved the best performance compared to other configuration ETp equations (Figure A1), so only these results are presented in detail (Table 8).The P-factor varied between 0.2 and 0.3, with the highest achieved with the GLEAM retrievals.Note that p = 0.29 means that 29% of the model estimates were within the 95% PPU.The lowest p = 0.21 was obtained with SSEBop, while ETMonitor gave p = 0.27.The R-factor varied between 0.07 and 0.29, where the lowest was achieved with the ETMonitor ETa and the highest with WaPOR.The ETMonitor and the GLEAM ETa data gave the best performance, i.e., the highest P-factor values and the lowest R-factor values (Table 8).The validation experiments gave higher P-factor and r-factor values than the calibration with all the ETa retrievals (Table 9).The P-factor varied between 0.25 (WaPOR) and 0.65 (GLEAM).The R-factor varied between 0.16 (ETMonitor) and 0.96 (WaPOR).The highest P-factor and the lowest R-factor were achieved with the GLEAM and ETMonitor data.

Calibration Results
The performance achieved in the calibration and validation was further evaluated by considering the R 2 , NSE, KGE, and PBIAS metrics (Tables 8 and 9).The ETMonitor ETa gave the best performance with R 2 = 0.91, NSE = 0.83, and KGE = 0.79, indicating very good performance.The second-best performance (Table 8) was achieved with the GLEAM ETa data, i.e., R 2 = 0.88, NSE = 0.8, and KGE = 0.78.The PBIAS was lowest with the ETMonitor and GLEAM, i.e., less than ±5% (2.33 and −4.18, respectively), while the highest values were obtained with the SSEBop and WaPOR ETa, i.e., 7.59 and 30.44, respectively.SWAT underestimated ETa by 2.33% compared to ETMonitor, by 7.59% vs. SSEBop, 30.44% vs. WaPOR, while it overestimated ETa by 4.18% compared to GLEAM data.Each scatter plot in Figure 4 shows the 12 monthly values of the SWAT estimates of ETa (based on Hargreaves ETp equation which returned the highest performance) against the ETMonitor retrievals of ETa (which also yielded the highest performance compared to other ETa of remote sensing products) for a given sub-basin.The high performance of SWAT ETa estimates against the ETMonitor retrievals was confirmed by the comparison disaggregated across the 37 sub-basins, and it gave a slope positive and mostly equal to 1 in different sub-basins (Figure 4).The high performance of SWAT ETa estimates against the ETMonitor retrievals was confirmed by the comparison disaggregated across the 37 sub-basins, and it gave a slope positive and mostly equal to 1 in different sub-basins (Figure 4).
The calibration was performed in the monthly time step for one year (12 months) of 2009 for each of the 37 sub-basins; Figure 3 and Table 8 show the metrics performance in which the ETa from SWAT was based on Hargreaves ETp.In Figure 3, the spatial distribution of the performance metrics of SWAT configured using Hargreaves ETp against the four remote sensing ETa products (ETMonitor (a), GLEAM (b), WaPOR (c), and SSEBop (d)) within the Lake Chad watershed provides further insights on the sensitivity of model performance to parameter setting.There is clearly an advantage in performance of the ETMonitor and the GLEAM compared to the WaPOR and the SSEBop.The spatial distribution metrics maps of the SWAT ETa calculated based on the other two ETp equations (Penman-Monteith and Priestley-Taylor) against the four remote sensing ETa are shown in Figure A1.

Validation Results
Likewise, the performance metrics were calculated for each year of the validation period (Table 9 and Figure 5).The model validation experiments gave higher R 2 values than calibration with the ETMonitor data (Table 9), i.e., R 2 = 0.94 on average across sub-basins.Good agreement with slightly lower values of R 2 was also achieved with the GLEAM data, i.e., mean R 2 = 0.88 varying between 0.85 and 0.92 across sub-catchments.Performance was lower against the WaPOR and the SSEBop data with mean R 2 = 0.67 and 0.62, respectively.The NSE metric provided similar indications, i.e., higher values were obtained against ETMonitor in the validation experiments, with NSE = 0.86, compared to calibration (see above).A lower NSE = 0.79 was obtained with the GLEAM data, while NSE was negative with the WaPOR and SSEBop data.Likewise, with ETMonitor ETa, the mean KGE = 0.79, i.e., the same value as in the calibration experiments, while KGE = 0.76 with GLEAM and KGE = 0.2 with WaPOR, lower than in the calibration experiments in both cases.The validation using SSEBop gave a higher value than the corresponding calibration experiment (0.56).PBIAS values shown for validation using both ETMonitor and GLEAM datasets indicated promising values (<10%).In general, the SSEBop and WaPOR products gave high PBIAS values greater than ±15%.
Remote Sens. 2022, 14, x FOR PEER REVIEW 17 of 31 The calibration was performed in the monthly time step for one year (12 months) of 2009 for each of the 37 sub-basins; Figure 3 and Table 8 show the metrics performance in which the ETa from SWAT was based on Hargreaves ETp.In Figure 3, the spatial distribution of the performance metrics of SWAT configured using Hargreaves ETp against the four remote sensing ETa products (ETMonitor (a), GLEAM (b), WaPOR (c), and SSEBop (d)) within the Lake Chad watershed provides further insights on the sensitivity of model performance to parameter setting.There is clearly an advantage in performance of the ETMonitor and the GLEAM compared to the WaPOR and the SSEBop.The spatial distribution metrics maps of the SWAT ETa calculated based on the other two ETp equations (Penman-Monteith and Priestley-Taylor) against the four remote sensing ETa are shown in Figure A1.

Validation Results
Likewise, the performance metrics were calculated for each year of the validation period (Table 9 and Figure 5).The model validation experiments gave higher R 2 values than calibration with the ETMonitor data (Table 9), i.e., R 2 = 0.94 on average across subbasins.Good agreement with slightly lower values of R 2 was also achieved with the GLEAM data, i.e., mean R 2 = 0.88 varying between 0.85 and 0.92 across sub-catchments.Performance was lower against the WaPOR and the SSEBop data with mean R 2 = 0.67 and 0.62, respectively.The NSE metric provided similar indications, i.e., higher values were obtained against ETMonitor in the validation experiments, with NSE = 0.86, compared to calibration (see above).A lower NSE = 0.79 was obtained with the GLEAM data, while NSE was negative with the WaPOR and SSEBop data.Likewise, with ETMonitor ETa, the mean KGE = 0.79, i.e., the same value as in the calibration experiments, while KGE = 0.76 with GLEAM and KGE = 0.2 with WaPOR, lower than in the calibration experiments in both cases.The validation using SSEBop gave a higher value than the corresponding calibration experiment (0.56).PBIAS values shown for validation using both ETMonitor and GLEAM datasets indicated promising values (<10%).In general, the SSEBop and WaPOR products gave high PBIAS values greater than ±15%.Overall, the performance achieved in the SWAT calibration and validation was rather high and consistent when using the ETMonitor and GLEAM data.The highest performance, with R 2 , NSE, PBIAS, and KGE values, respectively, 0.94, 0.86, 0.43, and 0.79, was achieved with the ETMonitor data, while the corresponding metric values with the GLEAM data were 0.88, 0.79, −2.35, and 0.76, respectively, indicating slightly worse performance.The performance with SSEBop and WaPOR was much worse, with very low metrics values.
Taking into account the overall performance achieved in the calibration (Table 8 and Figure 3) and validation (Table 9), the results of SWAT configured using Hargreaves validation against ETMonitor for the year 2010 to 2015 are shown in Figure 5 for both calibration (Figure 3a) and validation (Figure 5a-f); R 2 was higher than 0.7 for all the sub-basins within the study area in the Lake Chad Basin, and more than 80% of the 37 subbasins gave NSE and KGE between 0.7 and 1.This shows very good performance of SWAT estimates of ETa, with ETp computed with the Hargreaves equation against ETa from ETMonitor.
The comparison of calibrated and uncalibrated SWAT ETa (Figure 6) confirms these findings.SWAT ETa prior to calibration was in good agreement with the ETMonitor ETa during the transitions from the dry to wet season and vice versa.In the first few months of the year, the SWAT uncalibrated ETa curve (red color) shows the second peak in ETa in the early season of every year (Figure 6), which was successfully corrected after model calibration.SWAT underestimated maximum ETa in summer without calibration, while the SWAT ETa was much closer to ETa retrievals after calibration.Overall, the calibration gave R 2 > 0.9 and NSE > values higher than 0.8 every year.
high and consistent when using the ETMonitor and GLEAM data.The highest performance, with R 2 , NSE, PBIAS, and KGE values, respectively, 0.94, 0.86, 0.43, and 0.79, was achieved with the ETMonitor data, while the corresponding metric values with the GLEAM data were 0.88, 0.79, −2.35, and 0.76, respectively, indicating slightly worse performance.The performance with SSEBop and WaPOR was much worse, with very low metrics values.
Taking into account the overall performance achieved in the calibration (Table 8 and Figure 3) and validation (Table 9), the results of SWAT configured using Hargreaves validation against ETMonitor for the year 2010 to 2015 are shown in Figure 5 for both calibration (Figure 3a) and validation (Figure 5a-f); R 2 was higher than 0.7 for all the subbasins within the study area in the Lake Chad Basin, and more than 80% of the 37 subbasins gave NSE and KGE between 0.7 and 1.This shows very good performance of SWAT estimates of ETa, with ETp computed with the Hargreaves equation against ETa from ETMonitor.
The comparison of calibrated and uncalibrated SWAT ETa (Figure 6) confirms these findings.SWAT ETa prior to calibration was in good agreement with the ETMonitor ETa during the transitions from the dry to wet season and vice versa.In the first few months of the year, the SWAT uncalibrated ETa curve (red color) shows the second peak in ETa in the early season of every year (Figure 6), which was successfully corrected after model calibration.SWAT underestimated maximum ETa in summer without calibration, while the SWAT ETa was much closer to ETa retrievals after calibration.Overall, the calibration gave R 2 > 0.9 and NSE > values higher than 0.8 every year.

Validation of SWAT Soil Moisture
During the dry months, there was good agreement between SWAT SWC and the ESA CCI SSM (Figure 7a,c).Larger differences were observed during the wet months (June to

Validation of SWAT Soil Moisture
During the dry months, there was good agreement between SWAT SWC and the ESA CCI SSM (Figure 7a,c).Larger differences were observed during the wet months (June to September) (Figure 7b,d), which is also the growing season, probably due to the potential SWAT model deficiency in the ETa parameterization [68].The time series of SWAT SWC and ESA CCI SSM in the top 50 mm soil layer gave better agreement throughout the entire period 2009-2015 (Figure 8a).Contrary to the top 10 mm soil layer (Figure 7), the maximum SWAT SWC during the wet months was consistently higher than the ESA CCI SSM.This suggests that model estimates and the surface soil moisture data product capture, in a different manner, the vertical redistribution of rainwater.The R 2 , NSE, and KGE were all higher than 0.85 (Figure 8b).The time series of SWAT SWC and ESA CCI SSM in the top 50 mm soil layer gave better agreement throughout the entire period 2009-2015 (Figure 8a).Contrary to the top 10 mm soil layer (Figure 7), the maximum SWAT SWC during the wet months was consistently higher than the ESA CCI SSM.This suggests that model estimates and the surface soil moisture data product capture, in a different manner, the vertical redistribution of rainwater.The R 2 , NSE, and KGE were all higher than 0.85 (Figure 8b).
The time series of SWAT SWC ESA CCI SSM in the top 50 mm soil layer gave better agreement throughout the entire period 2009-2015 (Figure 8a).Contrary to the top 10 mm soil layer (Figure 7), the maximum SWAT SWC during the wet months was consistently higher than the ESA CCI SSM.This suggests that model estimates and the surface soil moisture data product capture, in a different manner, the vertical redistribution of rainwater.The R 2 , NSE, and KGE were all higher than 0.85 (Figure 8b).

Validation of SWAT Estimates of Changes in Total Water Storage
Overall, the SWAT estimates appear to capture the seasonality correctly in TWSC: the total water storage change estimated by the SWAT (SWAT TWSC) model was in good agreement with the GRACE retrievals, although the differences in the wet months were at times large and not systematic (Figure 9).The monthly GRACE TWSC data showed rather large and rapid fluctuations in the dry period, leading to large and variable differences with the SWAT TWSC estimates.The performance metrics were satisfactory, but not very high, i.e., R 2 = 0.56 and NSE = 0.55 (Figure 9b).Overall, the SWAT estimates appear to capture the seasonality correctly in TWSC: the total water storage change estimated by the SWAT (SWAT TWSC) model was in good agreement with the GRACE retrievals, although the differences in the wet months were at times large and not systematic (Figure 9).The monthly GRACE TWSC data showed rather large and rapid fluctuations in the dry period, leading to large and variable differences with the SWAT TWSC estimates.The performance metrics were satisfactory, but not very high, i.e., R 2 = 0.56 and NSE = 0.55 (Figure 9b).The resampling of the GRACE data to 1 km did not affect the signal.Both at 1 km (Figure 9(a1,b1)) and 300 km (Figure 9(a2,b2)), the GRACE data provide a comparable signal, as shown in Figure 9, particularly the comparable regression coefficient in Figure 9(a1,a2).The resampling was performed to apply the same spatial sampling as the ETMonitor ETa, which was used for calibration (1 km sampling grid-size).In addition, according to [69], it is possible to capture a meaningful signal from GRACE data for catchments larger than 200,000 km 2 .In this study, the area of the catchment is approximately equal to 10 6 km 2 , i.e., >200,000 km 2 .The results at 1 km and 300 km resolution (Figure 9) are almost the same with (R 2 = 0.57).

Water Balance
In fact, the calibration had clear impacts on different water balance components, besides ETa, which were used to calibrate the model (Figure 10).Generally, all the water balance components showed an increase compared to the first year of simulation ( 2009).The resampling of the GRACE data to 1 km did not affect the signal.Both at 1 km (Figure 9(a1,b1)) and 300 km (Figure 9(a2,b2)), the GRACE data provide a comparable signal, as shown in Figure 9, particularly the comparable regression coefficient in Figure 9(a1,a2).The resampling was performed to apply the same spatial sampling as the ETMonitor ETa, which was used for calibration (1 km sampling grid-size).In addition, according to [69], it is possible to capture a meaningful signal from GRACE data for catchments larger than 200,000 km 2 .In this study, the area of the catchment is approximately equal to 10 6 km 2 , i.e., >200,000 km 2 .The results at 1 km and 300 km resolution (Figure 9) are almost the same with (R 2 = 0.57).

Water Balance
In fact, the calibration had clear impacts on different water balance components, besides ETa, which were used to calibrate the model (Figure 10).Generally, all the water balance components showed an increase compared to the first year of simulation (2009).The lowest value was shown in 2009, and the maximum value occurred in 2012.The mean annual values of actual evapotranspiration (ETa), soil water (SW), perception (PERC), surface runoff (SURQ), groundwater recharge (GW_Q), water yield (WYLD), and lateral runoff (LAT_Q) are 533.88mm/year, 7.15 mm/year, 62.15 mm/year, 51.9 mm/year, 43.15 mm/year, 98.01 mm/year, and 0.59 mm/year, respectively.

Discussion
Both the calibration and validation with the ETMonitor and GLEAM ETa retrievals showed very good performance when the model was configured using the Hargreaves ETp equation to simulate ETa.The SSEBop and WaPOR ETa data gave the lowest calibration and validation performance with different ETp equations used to estimate ETa.
According to Moriasi et al. [59,60] and Kouchi et al. [70], good model performance is indicated by the PBIAS values (PBIAS ≤ ±15) for the calibration and validation of the SWAT model with monthly remote sensing ETa from ETMonitor and GLEAM (Tables 8  and 9).These positive values of PBIAS showed that the SWAT model underestimated monthly ETa in the LCB.On the other hand, negative values of PBIAS indicate that SWAT overestimated the ETa.This difference between different RS ETa products and SWAT ETa estimates is mainly due to the difference in forcing data and to the algorithm used to esti-

Discussion
Both the calibration and validation with the ETMonitor and GLEAM ETa retrievals showed very good performance when the model was configured using the Hargreaves ETp equation to simulate ETa.The SSEBop and WaPOR ETa data gave the lowest calibration and validation performance with different ETp equations used to estimate ETa.
According to Moriasi et al. [59,60] and Kouchi et al. [70], good model performance is indicated by the PBIAS values (PBIAS ≤ ±15) for the calibration and validation of the SWAT model with monthly remote sensing ETa from ETMonitor and GLEAM (Tables 8 and 9).These positive values of PBIAS showed that the SWAT model underestimated monthly ETa in the LCB.On the other negative values of PBIAS indicate that SWAT overestimated the ETa.This difference between different RS ETa products and SWAT ETa estimates is mainly due to the difference in forcing data and to the algorithm used to estimate the ETa.The PBIAS values achieved in the calibration and validation using ETMonitor and GLEAM < 15% were lower than values in Odusanya et al. [21], who calibrated and validated SWAT using MOD16 and GLEAM < 25% in the Ogun River Basin in Nigeria.The PBIAS achieved during calibration and validation of the model with ETMonitor almost agreed with the results by Poméon et al. [20] when they validated the SWAT in West Africa using MOD16.The PBIAS values obtained with the WaPOR and SSEBop data do not agree with the findings of Weerasingh et al. [71], who evaluated different remote sensing ETa products in Africa and found that the highest-ranked products were WaPOR and SSEBop, while the GLEAM dataset attained the lowest rank.
Our results on the calibration with ETMonitor suggested an underestimation by SWAT, but in the validation experiments, SWAT generally overestimated ETa.A very good performance of the SWAT model ETa configured using Hargreaves ETp equation with both ETMonitor and GLEAM was indicated by all performance metrics.These findings showed better values than those found by other studies [21].The best SWAT calibration and validation performance is related to the choice of the Hargreaves ETp equation.This equation was applied to actual observations of precipitation and maximum and minimum temperature to compute the ETa.On the other hand, the other two ETp equations (Penman-Monteith and Priestley-Taylor) were calculated with the model (reanalysis) meteorological data on solar radiation relative humidity and wind speed due to the absence of the required data.The differences in all remote sensing ETa datasets are due to their input and forcing data [72] and to the different remote sensing ETa retrieval algorithms.Our findings agreed with Lopez et al. [18] in Morocco.They reported that the GLEAM ETa gave a satisfactory performance for the calibration of a large-scale hydrological model set-up.Moreover, our results agree with the findings of Odusanya et al. [21] in the Ogun River Basin in Nigeria.They reported that the calibration of the SWAT model using the GLEAM dataset showed a satisfactory performance.
Furthermore, the four remote sensing ETa products were compared and we found that the WaPOR values were on average higher than SSEBop, ETMonitor, and GLEAM by 80 mm, 240 mm, and 300 mm, respectively.The SSEBop values are on average higher than ETMonitor and GLEAM by 160 mm and 220 mm, respectively, while the difference between ETMonitor and GLEAM is about 60 mm (Figure A2).Moreover, the calibration was performed using the original spatial resolutions of the four products, and, as shown in Figure 3, the spatial distribution of the performance metrics indicated that there is no relation between the calibration performance and sub-basin extensions (size).Furthermore, the calibration with the ETMonitor data achieves the highest performance, while SSEBop gave a lower performance at the same spatial resolution (Figure 3 and Table 8) and this indicates that the performance is not related to the spatial resolution of the products.
The SWAT estimates of SWC for the top 10 mm soil layer were compared with the ESA CCI SSM for a dry, i.e., 2009, and wet year, i.e., 2012 (Figure 7).To further refine this analysis, the results are also presented separately for the dry, i.e., January-May and November-December, and wet months, i.e., June-October.During the dry months, there was good agreement between SWAT SWC and ESSA CCI SSM.Larger differences were observed during the wet months, suggesting limited sensitivity of the SWAT SWC to precipitation.
The time series of ESA CCI SSM at a depth of 50 mm and SWAT SWC at the top 50 mm gave better agreement throughout the entire period 2009-2015 (Figure 8b).Contrary to the top 10 mm soil layer, the maximum SWAT SWC during the wet months was consistently higher than the ESA CCI SSM.This suggests that model estimates and the SSM data product capture in a different manner the vertical redistribution of rainwater.The R 2 , NSE, and KGE were all higher than 0.85.That confirmed what was reported by Poméon et al. [20] and Odusanya et al. [21] that the dynamic of the SWAT SM fit very well with the ESA CCI (%) in the upper few centimeters of the soil profile in most of the basin at a monthly time step.
Overall, the SWAT estimates appear to capture the seasonality correctly in TWSC (Figure 9b): the SWAT TWSC was in good agreement with the GRACE retrievals, although the differences in the wet months were at times large and not systematic.The monthly GRACE TWSC data showed rather large and rapid fluctuations in the dry period, leading to large and variable differences with the SWAT TWSC estimates.The performance metrics were satisfactory, but not very high, i.e., R 2 = 0.56 and NSE = 0.55.Grippa et al. [73] observed similar outcomes in the Sahel and West Africa after they compared the TWSC GRACE retrievals with estimates based on nine hydrological models.They concluded that the most important difference between the two TWSC datasets is the noticeable decline of TWSC during dry months.They reported that these results were explained by the incorrectly simulated evapotranspiration through the dry season.Ndehedehe et al. [74] reported similar results, and they assumed that it might have been due to the anthropogenic influences intensifying land surface processes that were not properly captured by hydrological models.Poméon et al. [20] also suggested that the lack of observations to be used in model calibration could lead to a biased estimation of soil water outflow, which causes erroneous estimates of TWSC.The multi-validation results show the SWAT model to perform satisfactorily in the study area.
Several studies have been performed in order to evaluate the water balance of the Lake Chad Basin [2,25,30,[75][76][77][78][79].Some of these studies are summarized in Table 10.The first two studies evaluated the water budget when the Lake Chad Basin was in hydrological equilibrium and the lake was not shrinking.Almost all the previous studies simulated the runoff only for the Chari-Logone River, such as the studies conducted by Vuillaume [79], Olivry et al. [77], Odada et al. [2], Zhu et al. [25], and Mahamat Nour et al. [30].In our study, the runoff was estimated for the whole southern Lake Chad Basin.Note that all these studies were carried out in different years and different parts of Lake Chad Basin so that exact comparison is unrealistic.In general, the runoff simulated by our study is comparable to the other studies in recent years (i.e., after 1970), notwithstanding the different temporal and spatial coverage of the study.We also compared the time series of our estimates of yearly runoff with other available studies on time series, either from observations or by modeling [80,81] (Figure 11).All the model studies and the observed runoff confirmed the increase in the runoff in the comparison period starting from 2009, and followed similar trends and fluctuations.The difference in some values between our study and the two other studies is probably due to the different spatial coverage, where the studies by Zhu et al. [81] and Mahmood and Jia [80] simulated the runoff only for the Chari-Logone River Basin which is a part of our study area.In these two studies, of observed runoff were collected only at the outlet of Chari-Logone River [80,81], while our study simulated the runoff for the whole southern Lake Chad Basin.There is a difference in various hydro-meteorological conditions between those two studies, the observations, and our study, which is probably the main reason for the high value in 2012 in our study; however, our simulated runoff showed higher agreement with the runoff observations shown in Figure 11.

Conclusions
This study demonstrated that it is feasible to calibrate the semi-distributed reg hydrological model SWAT for the entire LCB, notwithstanding the scarcity of hydro cal data, by using remote sensing data products of ETa.An innovative aspect was to the calibration to one year, i.e., 2009, by designing the calibration experiments to re the number of unknown parameters.Our results demonstrated that this new limite ibration approach did improve model performance for the entire period past calibra i.e., 2010-2015.A broader set of forty potentially influential parameters was defined b on the literature.Using the global sensitivity analysis tool in SWAT-CUP, fifteen par ters were selected for calibration.The 15 correction values used to modify the sel parameters, which are assigned by HRU, were computed using LHS in SUFI-2 i SWAT-CUP package, and only one value for each correction of each parameter wa culated for the entire LCB using twelve observations of ETa in each of the 37 sub-ba i.e., using 444 data points in the calibration experiments.The final values of the correc were applied in the validation experiments and subsequent analyses by SWAT.
The calibration experiments were designed to improve the accuracy of the S estimates of monthly ETa against four different ETa datasets based on remote se observations.The model was configured to use three equations to calculate pote evapotranspiration, i.e., Hargreaves, Priestley-Taylor, and Penman-Monteith, du their impacts on the estimated ETa.Four satellite observation-based ETa products SWAT estimates with the three ETp equations gave twelve (12) calibration experim aiming to improve model performance in the LCB.Generally, the limited calibration year on a monthly timescale) results show that the remote sensing products are use calibrate and validate the SWAT model in arid to semi-arid poorly gauged basins, though the temporal coverage of the calibration was limited.The best performance obtained with the SWAT ETa estimates based on the ETMonitor ETa.

Conclusions
This study demonstrated that it is feasible to calibrate the semi-distributed regional hydrological model SWAT for the entire LCB, notwithstanding the scarcity of hydrological data, by using remote sensing data products of ETa.An innovative aspect was to limit the calibration to one year, i.e., 2009, by designing the calibration experiments to reduce the number of unknown parameters.Our results demonstrated that this new limited calibration approach did improve model performance for the entire period past calibration, i.e., 2010-2015.A broader set of forty potentially influential parameters was defined based on the literature.Using the global sensitivity analysis tool in SWAT-CUP, fifteen parameters were selected for calibration.The 15 correction values used to modify the selected parameters, which are assigned by HRU, were computed using LHS in SUFI-2 in the SWAT-CUP package, and only one value for each correction of each parameter was calculated for the entire LCB using twelve observations of ETa in each of the 37 sub-basins, i.e., using 444 data points in the calibration experiments.The final values of the corrections were applied in the validation experiments and subsequent analyses by SWAT.
The calibration experiments were designed to improve the accuracy of the SWAT estimates of monthly ETa against four different ETa datasets based on remote sensing observations.The model was configured to use three equations to calculate potential evapotranspiration, i.e., Hargreaves, Priestley-Taylor, and Penman-Monteith, due to their impacts on the estimated ETa.Four satellite observation-based ETa products and SWAT estimates with the three ETp equations gave twelve (12) calibration experiments aiming to improve model performance in the LCB.Generally, the limited calibration (one year on a monthly timescale) results show that the remote sensing products are useful to calibrate and validate the SWAT model in arid to semi-arid poorly gauged basins, even though the temporal coverage of the calibration was limited.The best performance was obtained with the SWAT ETa estimates based on the ETMonitor ETa.
statistical study has been performed to evaluate the model calibration limited to one year.Differences across the remote sensing ETa products were significant, consistent with the different algorithms used to estimate the ETa.The statistical analysis of both calibration and validation results indicated that the ETMonitor and GLEAM led to a better SWAT performance than SSEBop and WaPOR.The variations in ETa performance across different sub-basins are due to the non-linearity of the algorithms used to generate the four data products and the SWAT model.This gives a complex response of actual evapotranspiration to the spatial variability in the main input data such as weather, land use, soil moisture, etc.To address this issue properly requires a complete, in-depth study to analyze and explain in detail the interplay of these factors in determining the spatially variable performance in estimating ETa across different locations.
SWAT estimates of soil water content and total water storage change were compared with satellite data products.Overall, the agreement was good, further confirming the usefulness of the proposed limited calibration in our data-scarce study area.The limited calibration of a hydrological model using remote sensing data is one of the solutions to deal with the scarcity of hydrological data, and it also needs less computational capacity and time, as opposed to a calibration performed for several years which requires much computational time and resources.So, the computational load in the case of limited calibration is much lower than the calibration for several years.

WaPOR
Version v1.1 of the productivity (WaPOR) ETa was used in this study.The Wa-POR evapotranspiration dataset was produced by the Food and Agriculture Organization of the United Nations.Evaporation (E) and Transpiration (T) were calculated based on the ETLook model described in [82].It uses the Penman-Monteith (P-M) equation, adapted to remote sensing input data.The Penman-Monteith equation [83] estimates the rate of total evapotranspiration using commonly measured meteorological data (solar radiation, air temperature, vapor pressure, and wind speed).Calculating evapotranspiration requires input from seven data components.Solar radiation and precipitation were obtained from ground stations; other weather data (the wind speed, min and max temperature, and relative humidity were estimated using models) are daily inputs.Soil moisture stress, NDVI, and surface albedo are decadal inputs; readers can refer to FAO, (2018) [38] for further details.The WaPOR ETa product has been available since 2009.

GLEAM
The Global Land Evaporation Amsterdam Model (GLEAM) was developed in 2011 [17], and it was revised and updated in 2017 [16].In this study, the latest version of this product (GLEAM_v3.3a)was downloaded from (www.gleam.eu;accessed on 28 October 2020).The forcing variables used to produce GLEAM_v3.3aare detailed in [16].The Priestley-Taylor equation was used in GLEAM to compute the potential evapotranspiration (mm/day) based on surface net radiation and near-surface air temperature observations.Actual evapotranspiration is estimated by multiplying potential evapotranspiration with the evaporative stress factor "S" which was calculated based on microwave observations of the vegetation optical depth, remote sensing retrievals of soil moisture, and simulations of root-zone soil moisture.More details about GLEAM can be found in [16,84].

Figure 1 .
Figure 1.The African Sahel, the location of the Lake Chad Basin, the study area (Southern Lake Chad Basin), and the 37 delineated sub-basins.

Figure 1 .
Figure 1.The African Sahel, the location of the Lake Chad Basin, the study area (Southern Lake Chad Basin), and the 37 delineated sub-basins.

Figure 2 .
Figure 2. The conceptual framework of this study: (a) the SWAT flowchart (and uncalibrated model outputs), (b) SWAT-CUP flowchart (parameter selection and calibration), and (c) the validation schemes (using the calibrated model).

Figure 2 .
Figure 2. The conceptual framework of this study: (a) the SWAT flowchart (and uncalibrated model outputs), (b) SWAT-CUP flowchart (parameter selection and calibration), and (c) the validation schemes (using the calibrated model).

2 where
ET Rs represents satellite-based ETa values; ET s represents simulated ETa values; ET Rs represents mean satellite-based ETa values; ET s represents mean simulated ETa values.r is the Pearson product correlation coefficient between satellite-based ETa and the simulated ETa; α is the standard deviation of the simulated ETa over the standard deviation of the satellite-based ETa; β is the ratio of the mean simulated ETa to the satellite-based ETa.

Figure 3 .
Figure 3. Spatial distribution of performance metrics (R 2 , NSE, and KGE) of SWAT_Hargreaves, when calibrated in 2009 against ETMonitor, GLEAM, WaPOR, and SSEBop (a,b,c,d), respectively, in the study area in the Lake Chad Basin.

Figure 3 .
Figure 3. Spatial distribution of performance metrics (R 2 , NSE, and KGE) of SWAT_Hargreaves, when calibrated in 2009 against ETMonitor, GLEAM, WaPOR, and SSEBop (a,b,c,d), respectively, in the study area in the Lake Chad Basin.

Figure 4 .
Figure 4. Comparison between ETMonitor ETa and SWAT-calibrated ETa for all sub-catchments in the LCB.

Figure 4 .
Figure 4. Comparison between ETMonitor ETa and SWAT-calibrated ETa for all sub-catchments in the LCB.

Figure 6 .
Figure 6.Time series of monthly uncalibrated and calibrated ETa simulated by the SWAT and ETa from ETMonitor in 2009-2015 in the study area in the Lake Chad Basin.

Figure 6 .
Figure 6.Time series of monthly uncalibrated and calibrated ETa simulated by the SWAT and ETa from ETMonitor in 2009-2015 in the study area in the Lake Chad Basin.

31 Figure 7 .
Figure 7. Seasonal comparison between monthly averaged SWAT and ESA CCI SSM at 1 cm on 2009 as driest year: (a) dry months, (b) wet months and on 2012 as wettest year: (c) dry months, and (d) wet months.

Figure 7 .
Figure 7. Seasonal comparison between monthly averaged SWAT and ESA CCI SSM at 1 cm on 2009 as driest year: (a) dry months, (b) wet months and on 2012 as wettest year: (c) dry months, and (d) wet months.

Figure 8 .
Figure 8.Comparison between monthly averaged SWAT SWC (black line) vs. ESA CCI SM (red line) at 50 mm during 2009-2015 in the study area in the Lake Chad Basin: (a) the scatter plot, (b) comparison of time series.

Figure 8 .
Figure 8.Comparison between monthly averaged SWAT SWC (black line) vs. ESA CCI SM (red line) at 50 mm during 2009-2015 in the study area in the Lake Chad Basin: (a) the scatter plot, (b) comparison of time series.
Remote Sens. 2022, 14, x FOR PEER REVIEW 20 of 31 Validation of SWAT Estimates of Changes in Total Water Storage

Figure 9 .
Figure 9.Comparison between monthly TWSC averaged over the study area in the Lake Chad Basin for 2009-2015: SWAT estimates and GRACE data product, at 1 km resolution: (a1) scatter plot, (b1) time series, and at 300 km resolution: (a2) scatter plot, (b2) time series.

Figure 9 .
Figure 9.Comparison between monthly TWSC averaged over the study area in the Lake Chad Basin for 2009-2015: SWAT estimates and GRACE data product, at 1 km resolution: (a1) scatter plot, (b1) time series, and at 300 km resolution: (a2) scatter plot, (b2) time series.

24 Figure 11 .
Figure 11.Comparison of simulated runoff in the Lake Chad Basin by different studies.

Figure 11 .
Figure 11.Comparison of simulated runoff in the Lake Chad Basin by different studies.

Figure A2 .
Figure A2.The annual mean of different remote sensing evapotranspiration products in Lake Chad.

Figure A2 .
Figure A2.The annual mean of different remote sensing evapotranspiration products in Lake Chad.Figure A2.The annual mean of different remote sensing evapotranspiration products in Lake Chad.

Figure A2 .
Figure A2.The annual mean of different remote sensing evapotranspiration products in Lake Chad.Figure A2.The annual mean of different remote sensing evapotranspiration products in Lake Chad.

Table 1 .
Satellite-observation-based data products of surface soil moisture and total water storage change used for validation of the SWAT model.

Table 2 .
Description of forcing data used in the SWAT model in this study.

Table 3 .
Satellite -observation-based actual evapotranspiration products used for calibration of the SWAT model (E: soil evaporation; T: transpiration; I: rainfall interception).

Table 4 .
Number of HRU changes with the annual LULC changes.

Table 5 .
(15) calibration: 15 most influential parameters(15); type and range of the corrections and initial values assigned according to the SWAT database.

Table 7 .
Parameter sensitivity to ETa and discharge.

Table 7 .
Parameter sensitivity to ETa and discharge.

Table 8 .
Performance metrics of SWAT_Hargreaves when calibrated using remote sensing ETa products for the year 2009.

Table 9 .
Performance metrics of ETa from SWAT_Hargreaves when validated against remote sensing ETa products.

Table 10 .
Comparison of water budget components estimated in Lake Chad Basin in different studies.

Table A1 .
The 15 parameters with fitted values for the 12 calibration projects.