Hydrological Alteration Index as an Indicator of the Calibration Complexity of Water Quantity and Quality Modeling in the Context of Global Change

: Modeling is a useful way to understand human and climate change impacts on the water resources of agricultural watersheds. Calibration and validation methodologies are crucial in forecasting assessments. This study explores the best calibration methodology depending on the level of hydrological alteration due to human-derived stressors. The Soil and Water Assessment Tool (SWAT) model is used to evaluate hydrology in South-West Europe in a context of intensive agriculture and water scarcity. The Index of Hydrological Alteration (IHA) is calculated using discharge observation data. A comparison of two SWAT calibration methodologies are done; a conventional calibration (CC) based on recorded in-stream water quality and quantity and an additional calibration (AC) adding crop managements practices. Even if the water quality and quantity trends are similar between CC and AC, water balance, irrigation and crop yields are di ﬀ erent. In the context of rainfall decrease, water yield decreases in both CC and AC, while crop productions present opposite trends ( + 33% in CC and − 31% in AC). Hydrological performance between CC and AC is correlated to IHA: When the level of IHA is under 80%, AC methodology is necessary. The combination of both calibrations appears essential to better constrain the model and to forecast the impact of climate change or anthropogenic inﬂuences on water resources.


Introduction
In the last few decades, the increase of the world population, the living standard improvement, the increase of consumption patterns and the expansion of irrigated agriculture have driven a rising global demand for water [1][2][3]. In Europe, the summer water scarcity impacts ecosystems and affects human management over the entire South-Western Europe [4]; (http://www.aguamod-sudoe.eu/en/). Climate change and anthropogenic activities increase the risk of water quantity and quality deterioration [5,6]. The European Union is currently supporting research programs helping regional and local governments to establish and implement policies addressing such environmental issues (i.e., http://www.aguamodsudoe.eu/en/).
Models have become key tools to support decision-making [7,8]. Modeling is indeed a useful approach to sustainably manage water resources and assess the needs of different users over a given territory [4]. Output generated by agro-hydrological models offer many insights for both stakeholders and the scientific community, concerning water resources management, assessment of nitrate load and contamination sources, determination of ecological function's behavior or forecast of the potential impacts of global and local change, including environmental policies [9]. Deploying a model with an homogeneous resolution at the European level could provide an integrated approach of the territory and answer the European Union's need about the harmonization of transboundary tools and methodologies focused on water resources and global changes. To conduct such cross-boundary water analysis in a context of global changes, the model deployed needs to (i) incorporate climate, (ii) conceptualize different types of water resources such as "blue" (surface and aquifers) and "green" (evapotranspiration and soil moisture) waters (cf. [10,11]), (iii) have a detailed representation of limiting nutrients and sediment dynamics, and (iv) integrate anthropogenic pressures such as agricultural, industrial, and urban inputs. Investing modeling methodology at large scale is also important in order to understand water and nutrient cycles, as well as watershed behavior. The modeling methodology can be applied to any model as soon as the model fulfills large scale criteria and could allow the researchers to work on model uncertainties in order to improve the model conceptualization [12,13].
Fu et al. [8] and Moriasi et al. [13] assessed a large number of existing hydrological and water quality models and their calibration/validation procedures. According to their results, water quality models can be divided into two types regarding modeling based on purpose [8]: (1) Scientific exploration (understanding the sources/fate of biogeochemical cycles and contaminants) and (2) decision support (e.g., assessing the impacts of climate and management changes). The scales of these models can vary from field-scale (APSIM model [14]) to large scale (INCA model [15]) and the time step can vary from minutes, days [16,17], months, years [18,19], to even statics [14].
Over the past 15 years, the most commonly used hydrological model has been the Soil and Water Assessment Tool (SWAT) based on the Scopus environmental field database containing 3282 scientific papers using SWAT [8]. SWAT [16] is an open-source model with a large number of possible applications as seen in various studies [8,20,21]. SWAT can be applied from local to continental scale [9,22]. SWAT incorporates different modules issuing from modified stand-alone models such as QUAL2E and EPIC modules-developed by Brown and Barnwell [23] and Sharpley and Williams [24] respectively to simulate water quality and crop production. Moreover, management practices application is a strength of SWAT through the multiple options of inputs such as fertilizer, tillage practice, irrigation, and management practices. Finally, the SWAT model is also able to integrate urban and point sources inputs.
Developed by the United States Department of Agriculture (USDA), SWAT has been used around the world to simulate water quality and quantity [8,22], for large scale on finer resolution such as the U.S. territory with the "Hydrologic Unit Model for the United States" [25,26] or the whole Europe [9]. Abbaspour et al. [9] demonstrate the feasibility of applying SWAT at the continental scale in Europe with a subbasin resolution and at monthly time steps. At the continental scale, the main restrictions are the lack of harmonized observed data and the lack of modelers' knowledge on the functioning of all the watersheds to calibrate the model [9]. Calibration of hydrological model is based on the water balance within the watersheds. The water balance is the difference between inputs (precipitation) and outputs (discharge and evapotranspiration) at the outlet of the watershed. However, even if the water balance is correctly simulated, it does not mean that the water in the different compartments (soil, shallow and deep aquifers, and the fluxes between these compartments) are corrected [11]. During the last decade, attempts have been made to calibrate the hydrological model on additional hydrological parameters than discharge in order to analyze the water distribution between the different compartments and fluxes inside the watershed [11]. The complexity of such calibration increases with the level of human activity. Human activities affect water resources [27,28] such as water used for irrigation, hydroelectric dam operations, or artificial pond feeding. A model does not initially integrate this level of complexity but the calibration can help to incorporate the human influence into the model. Hence, a dataset and tools to calibrate the model are therefore important if the model is to accurately represent water stocks and fluxes in the different compartments of the hydrological cycle. Such calibration approach has been shown to be necessary in order to enhance the management of water resources especially in the context of global change [29][30][31]. This study provides suggestions on the model set up choice and the calibration and validation procedures according to the level of the Index of Hydrological Alteration (IHA) [32] over a large-scale-transborder-European region. To achieve this goal, South West Europe was chosen as the study area due to the combination of climatic conditions and human stressors leading to recurrent water scarcity episodes [33].
The SWAT model was set up to build a calibrated and validated hydrological and water quality model over South West Europe at subbasin level and monthly time steps. Water quality simulation is evaluated through sediment load, reflecting particulate pollutants and nitrate loads, depicting dissolved pollutants [20]. The objective of this study is to evaluate the need to frame the calibration not only on the water balance knowledge but also in the water and agricultural management practices (irrigation, agricultural inputs). For example, irrigation over South West Europe is important, representing 58% of the water use in Spain and up to 90% in regions with intense agricultural practices such as Extremadura in South West Spain [34]. Several studies pointed out the issue of water scarcity in South-Western Europe [2,33,35]. It is thus essential to have a good representation of the volume of irrigation in the model, as irrigation volume will constrain water balance, especially in climate change and cultural practices impact studies. In the present study, we suggest an additional validation method of water balance through the comparisons of spatialized water irrigated or/and crop yield with national datasets. This additional validation might be useful for studies with limited documentation and a lack of financial support. One possible approach would be exploring the potential correlation between the calibration/validation procedure and the IHA [32]. The IHA characterizes the impact of river regulation from anthropogenic activity such as dams and irrigations on the flow regime compared to the natural regime. IHA is a well-known hydrological indicator used in environmental studies [36][37][38] to scale the anthropogenic disturbance level of a watershed. No study has yet attempted to link the level of complexity of a calibration procedure with the IHA. Therefore, the partial objectives of this work are: (i) To calibrate and validate a multi-countries scale multi-watershed models based on different open-source data available at national and European scale, (ii) to provide a calibration guideline regarding the degree of human activities in the multi-watersheds and (iii) to determine the strength and weaknesses of this guideline.

Study Area
The study area, called SUDOE, covers 773,191 km 2 in South-Western Europe ( Figure 1A). SUDOE is a transnational territory delimited by the European Union in order to develop scientific and institutional collaborations and to build integrative approaches regarding environmental issues. SUDOE encompasses the south west of France, Andorra, and the Iberian Peninsula (Spain, Portugal, and Gibraltar). More than 60 million inhabitants live in the SUDOE territory, including a few of the biggest European cities ( Figure 1A). This territory is the main agricultural region of the European Union with about half of its surface dedicated to this activity according to Corine Land Cover 2012 (CLC) map [39] ( Figure 1B). The rest of the territory is occupied by 46% forests and semi-natural areas, 3% artificial surfaces and 1% wetlands and water bodies. A substantial part of the agricultural surface is irrigated; for instance, about 13.5% are permanently irrigated in Spain and Portugal [34]. The irrigation volume is important in SUDOE territory where water stress is more acute [2,33].
Water 2020, 12, x FOR PEER REVIEW 4 of 32 (CLC) map [39] ( Figure 1B). The rest of the territory is occupied by 46% forests and semi-natural areas, 3% artificial surfaces and 1% wetlands and water bodies. A substantial part of the agricultural surface is irrigated; for instance, about 13.5% are permanently irrigated in Spain and Portugal [34]. The irrigation volume is important in SUDOE territory where water stress is more acute [2,33]. The SUDOE two main climates are (i) Mediterranean over most parts of the Iberian Peninsula and the Mediterranean coast of France and (ii) temperate oceanic from the northern coast of Spain to the west coast and inland part of the south west of France [40]. The average annual precipitation measured between 1993 and 2012 over the area is 785.7 mm [41]. SUDOE territory incorporates major and minor streams directly flowing to the sea/ocean that corresponds to 78.75 km 3 .year −1 of freshwater water resources over the period 1993-2012 [41] with an average nitrate concentration of 1.7 mg.L −1 [4] over the period 1992-2012. Hydrological systems are managed differently by stakeholders in each subsystem depending on the administrative entity ( Figure 1B).
Urban sewage has major impacts within the SUDOE surface water resources where the reduction of streamflow during summer is the main issue for health security [4]. Low streamflows increase pollutant concentrations in rivers by decreasing the dilution effect, affecting aquatic ecosystems and threating the availability of good quality water.

Global Methodology
In order to conceptualize the water and nutrient balance of the entire SUDOE territory, the model has been divided into three zones characterized by a specific pedoclimatic condition: The Pyrenean, the Mediterranean, and the Oceanic zone (pedologic information in Figure 1C and the division of the pedoclimatic regions in Figure 2). The Pyrenean zone is characterized by an oceanic climate and the presence of the Pyrenean mountains; the Mediterranean zone is described by a Mediterranean climate and, the Oceanic zone has been identified in the west part of the territory with an oceanic climate. The SUDOE two main climates are (i) Mediterranean over most parts of the Iberian Peninsula and the Mediterranean coast of France and (ii) temperate oceanic from the northern coast of Spain to the west coast and inland part of the south west of France [40]. The average annual precipitation measured between 1993 and 2012 over the area is 785.7 mm [41]. SUDOE territory incorporates major and minor streams directly flowing to the sea/ocean that corresponds to 78.75 km 3 ·year −1 of freshwater water resources over the period 1993-2012 [41] with an average nitrate concentration of 1.7 mg·L −1 [4] over the period 1992-2012. Hydrological systems are managed differently by stakeholders in each subsystem depending on the administrative entity ( Figure 1B).
Urban sewage has major impacts within the SUDOE surface water resources where the reduction of streamflow during summer is the main issue for health security [4]. Low streamflows increase pollutant concentrations in rivers by decreasing the dilution effect, affecting aquatic ecosystems and threating the availability of good quality water.

Global Methodology
In order to conceptualize the water and nutrient balance of the entire SUDOE territory, the model has been divided into three zones characterized by a specific pedoclimatic condition: The Pyrenean, the Mediterranean, and the Oceanic zone (pedologic information in Figure 1C and the division of the pedoclimatic regions in Figure 2). The Pyrenean zone is characterized by an oceanic climate and the presence of the Pyrenean mountains; the Mediterranean zone is described by a Mediterranean climate and, the Oceanic zone has been identified in the west part of the territory with an oceanic climate. The model set up and calibration/validation procedures were identical for each project. The model was set up, calibrated, and validated on monthly time steps thanks to the collaborations of French, Portuguese, and Spanish stakeholders working in public (Water agencies, non-profit organizations) and private institutions (hydroelectric and dam management companies). These experts shared their data and gave their advice and suggestions on the integration of dams, crop management practices, irrigation volume and crop yield in the modeling calibration framework. The model's set up incorporates dam management, urban sewages, and crop management.
Two methodologies for calibration of the model has been done: (i) A conventional calibration (CC), based on recorded in-stream water quality and quantity integrating auto-fertilization and autoirrigation, i.e., the model estimates the amount of fertilizer and irrigation demanded by plants, which is the most common calibration strategy for water resource modeling and (ii) an additional calibration (AC), based on the same data, but including additional outputs coming from other models' outputs or from national statistics (e.g., irrigation and crop yield, Figure 3). The comparison of these two calibrations allows this study to determine the strengths and weaknesses of the CC and AC methodologies on the representation of water resources. The last step of this study is to define an index that indicates the choice of the calibration/validation procedure.

Modeling Approach: The SWAT Model
In the SWAT model, the simulated watersheds are first divided into subbasins based on river network and topography. The implementation procedure allows us to identify Hydrological Response Units (HRUs) in each subbasin based on a unique combination of soil, land cover and slope information. HRUs are used to compute water balance articulated around four compartments (soil, snow, shallow, and deep aquifer) linked by hydrological processes such as infiltration, runoff, evapotranspiration, lateral flow, and percolation. HRUs water balance is then summed at the subbasin scale and routed to the watershed outlet. Impact on hydrology, sediment and nitrate The model set up and calibration/validation procedures were identical for each project. The model was set up, calibrated, and validated on monthly time steps thanks to the collaborations of French, Portuguese, and Spanish stakeholders working in public (Water agencies, non-profit organizations) and private institutions (hydroelectric and dam management companies). These experts shared their data and gave their advice and suggestions on the integration of dams, crop management practices, irrigation volume and crop yield in the modeling calibration framework. The model's set up incorporates dam management, urban sewages, and crop management.
Two methodologies for calibration of the model has been done: (i) A conventional calibration (CC), based on recorded in-stream water quality and quantity integrating auto-fertilization and auto-irrigation, i.e., the model estimates the amount of fertilizer and irrigation demanded by plants, which is the most common calibration strategy for water resource modeling and (ii) an additional calibration (AC), based on the same data, but including additional outputs coming from other models' outputs or from national statistics (e.g., irrigation and crop yield, Figure 3). The comparison of these two calibrations allows this study to determine the strengths and weaknesses of the CC and AC methodologies on the representation of water resources. The last step of this study is to define an index that indicates the choice of the calibration/validation procedure.

Modeling Approach: The SWAT Model
In the SWAT model, the simulated watersheds are first divided into subbasins based on river network and topography. The implementation procedure allows us to identify Hydrological Response Units (HRUs) in each subbasin based on a unique combination of soil, land cover and slope information. HRUs are used to compute water balance articulated around four compartments (soil, snow, shallow, and deep aquifer) linked by hydrological processes such as infiltration, runoff, evapotranspiration, lateral flow, and percolation. HRUs water balance is then summed at the subbasin scale and routed to the watershed outlet. Impact on hydrology, sediment and nitrate dynamics of urban sewage, agricultural management and dams are taken into consideration at subbasin scale.  The surface runoff volume is computed from a modified version of the Soil Conservation Service (SCS) curve number procedure [42]. Muskingum routing method was selected to calibrate streamflow in the channel [43]. The Hargreaves method [44] was chosen to estimate the potential evapotranspiration, as the Hargreaves method only requires air temperature and precipitation. Hargreaves's formula has been shown to satisfactorily simulate the potential evapotranspiration when applied outside of humid climatic zones [45].
In the SWAT model, sediment transport is simulated both in-land and channel compartments. Over lands, erosion and sediment yield is estimated using the Modified Universal Soil Loss Equation (MUSLE) [46] for each HRU. This equation considers hydrology and land characteristics (more details of the USLE equation factors can be found in Neitsch et al. [47].
Inland and instream nitrogen cycles are also modeled. Soil and atmospheric waters, as well as soil characteristics, drive the nitrogen cycle within each HRU. Nitrogen can be added to the system by fertilizer applications, dry deposition, and rainfall. Instream nitrate processes are based on the QUAL2E model integrated into the SWAT model [23,48]. More details of the SWAT nitrate module can be found in Brown and Barnwell [23] and Neitsch et al. [44].
The plant growth component of SWAT is based on the radiation use efficiency approach with empirical parameters. Plant growth can be inhibited by temperature, water, nitrogen and phosphorus stress. Plant development is based on daily accumulated heat unit values [49,50]. The heat unit states the stage of plant development. It varies from 0 to 1, 0 indicating the sowing time and 1 the optimal moment for the plant to be harvested.
Potential biomass is estimated using a method developed by Monteith et al. [51] depending on leaf development, light interception, and light-biomass ratio conversion. This potential is calculated considering ideal growing conditions with adequate water and nutrient supply and a favorable climate.
When a harvest/kill operation occurs, a portion of plant biomass is removed from the system as yield. Crop yield is calculated from biomass production, harvest efficiency and the fraction of the The surface runoff volume is computed from a modified version of the Soil Conservation Service (SCS) curve number procedure [42]. Muskingum routing method was selected to calibrate streamflow in the channel [43]. The Hargreaves method [44] was chosen to estimate the potential evapotranspiration, as the Hargreaves method only requires air temperature and precipitation. Hargreaves's formula has been shown to satisfactorily simulate the potential evapotranspiration when applied outside of humid climatic zones [45].
In the SWAT model, sediment transport is simulated both in-land and channel compartments. Over lands, erosion and sediment yield is estimated using the Modified Universal Soil Loss Equation (MUSLE) [46] for each HRU. This equation considers hydrology and land characteristics (more details of the USLE equation factors can be found in Neitsch et al. [47].
Inland and instream nitrogen cycles are also modeled. Soil and atmospheric waters, as well as soil characteristics, drive the nitrogen cycle within each HRU. Nitrogen can be added to the system by fertilizer applications, dry deposition, and rainfall. Instream nitrate processes are based on the QUAL2E model integrated into the SWAT model [23,48]. More details of the SWAT nitrate module can be found in Brown and Barnwell [23] and Neitsch et al. [44].
The plant growth component of SWAT is based on the radiation use efficiency approach with empirical parameters. Plant growth can be inhibited by temperature, water, nitrogen and phosphorus stress. Plant development is based on daily accumulated heat unit values [49,50]. The heat unit states the stage of plant development. It varies from 0 to 1, 0 indicating the sowing time and 1 the optimal moment for the plant to be harvested.
Potential biomass is estimated using a method developed by Monteith et al. [51] depending on leaf development, light interception, and light-biomass ratio conversion. This potential is calculated considering ideal growing conditions with adequate water and nutrient supply and a favorable climate.
When a harvest/kill operation occurs, a portion of plant biomass is removed from the system as yield. Crop yield is calculated from biomass production, harvest efficiency and the fraction of the above-ground plant's dry biomass removed as dry yield also called harvest index. Harvest index is calculated daily according to plant stage development and HRU water deficit.

Input Data
A large database compilation has been gathered to study the transnational SUDOE area. Data sources, as well as their spatial and temporal resolutions, are displayed in Table 1. CC and AC are using the same model set up and have the same input data. A global 1-km resolution land surface Digital Elevation Model (DEM), derived from the United States Geological Survey (USGS), was used to described topography and quantify slopes. Such DEM has been selected as a compromise between the large-scale model setup and the computation resources available. Land cover was available from Corine Land Cover map 2012 [39]. Soil characteristics were defined from the Harmonized world soil map that encompasses the dataset of the European Soil Database. Daily precipitation and temperature originate from the MESoscale ANalysis (MESAN) climate reanalyzes spanning Europe for the 1989-2010 period [55,56]. This dataset has been shown to be as good as the reference French meteorological dataset [57], SAFRAN [58]. In Spain, the MESAN reanalysis is not based on a dense network of observations, but a comparison of streamflow outputs forced by SAFRAN [59] and MESAN in different rivers over Spain showed similar results.

Dam Management
The characteristics of dams (volume, year of construction, water release, management rules) were collected for Spain and France (Table 1). Only dams with a total volume superior to 5 Hm 3 were Water 2020, 12, 115 8 of 33 considered to strongly impact the hydrology and taken into consideration. In the case of different dams in the same subbasin, a fictional dam conceptualization integrating the volume equivalent of the sum of dams has been used. A total of 44 dams were implemented ( Figure 2). Management data were used to set minimum and maximum objective volumes or the target storage water volume. In some cases, the maximum and minimum target water release was reconstructed from the observed streamflow downstream of the reservoir.

Urban Sewage
Urban sewage localizations and characteristics (maximum capacity, annual load) were taken from the 2010 European dataset [60] including information about all European cities. Every wastewater treatment plant (WWTP) with more than 50,000 equivalent inhabitants' capacity has been considered. Such as for dams, the WWTPs were aggregated by subbasin. A total of 44 point sources were integrated into the model ( Figure 2). The annual spilled volume is deduced from the number of equivalent-inhabitants of selected WWTPs and the annual water consumption per inhabitant per country (France: 58.04 m 3 year −1 , Portugal: 67.46 m 3 year −1 and Spain: 47.4 m 3 year −1 ; https://www.insee.fr, https://ine/pt, https://ine/es). In order to overcome the lack of nutrient loading from WWTPs at the European scale, a simplified version of the Zessner and Lindter [61] method has been used to estimate the annual nitrate loading with the available information about treatment plant. The daily averaged nitrogen loads that enter the treatment plant is assumed to be 8.8 gN person-equivalent −1 and the average treatment efficiency for the whole South West of Europe is 40% [61]. The following equation determines the load of nitrogen released in the environment: with: X load : The nitrogen pollution in g, X removal : The fraction of efficiency between 0 and 1, Load design : The inhabitant equivalent, X influent load : The daily average amount of nitrogen rejected per inhabitant (g/inhabitant equivalent/d).
This equation has been validated at the outlet of several French and Spanish watersheds, showing a good correlation between observed and simulated nitrogen loads (R 2 = 0.87, p-value < 0.01) (Supplementary Materials Figure S1).

Crop Management
Interannual average of monthly irrigation volumes for the South of France and Spain were given by the Institut national de recherche en sciences et technologies pour l'environnement et l'agriculture (IRSTEA) [62][63][64], the Compagnie d'Aménagement des Coteaux de Gascogne (CACG) and the Spanish Hydrological National Plan [65,66]. Crop management data were obtained from the national database of agriculture statistics from each country. The technical management and fertilizer application for each crop in each region originate from the recommendations provided by the French Chambers of Agriculture (FCA), the French and Spanish Directorate-General for Agriculture and Rural Development (DGADR) and the Spanish Confederations (SC).
Crop management were applied to HRUs according to the recommendations of FCA, DGADR, and SC. The dominant crops (determine by national statistics) of each region and crop technical management were selected and implemented in the model used for both CC and AC calibrations ( Table 2). For example, the dominant irrigated crop is cotton in the South East of Spain whereas in France dominant irrigated crops are corn and wheat. (1) 15-15-15, NPK fertilize, is a combination of nitrogen (N), phosphorus (P) and potassium (K) designed to maximize crop yield and quality. It contains 15%N, 15% P, and 15% K. (2) 15-30-0 is a high phosphate fertilizer. It contains 15%N, 30% P and 0% K. (3) 18-46-0, Diammonium phosphate, is a fertilizer and when applied as food for plants, the soil pH is increased temporarily. It contains 18%N, 46% P 2 O 5 , and 0% K. (4) 21-0-0+24S, the Ammonium Sulfate Fertilizer contains 21%N and 24% of sulfur.
The same crop management has been applied to all crops in the SUDOE territory. The beginning of plant growth and the auto-irrigation module are set to 0.15 heat units. Auto-irrigation activation terms are defined according to the type of crop and the climatic condition. The auto-irrigation can be triggered by two conditions, chosen for each subbasin regarding calibration, based on (1) plant water demand or (2) soil water content. Due to the lack of precise data, choices on the repartition of irrigated and non-irrigated crops have been made: Rice, olive, vineyard, fruit trees, berry plantations, maize, and almonds are considered as irrigated crops, while the rest are considered as non-irrigated. Irrigation is supplied by the nearest dams or main rivers. Fertilizers are applied automatically according to plant needs. The harvest is set at 1.2 heat units. The harvest heat unit is superior to 1 because this study considers that there is always a time lag between the optimized harvest time and the true harvest [47]. In that case, between 1 and 1.2 heat units, the plant consumes energy without producing more.
The model used to calibrate CC and AC has the same setup. For both calibrations, the irrigation was set up as "auto-irrigation". For CC and AC calibration we enter a volume at each auto-application and a maximum volume. The exact volume to apply is determined by the model according to the parametrization (plant water demand or soil water content). However, for the AC, crop management routing, simulated irrigation volume, crop yield, and biomass were calibrated and validated with national statistic datasets (Table 1) whereas, for CC, these model outputs were not calibrated. By default, SWAT yield estimates are based on average yields of US crop production. Some parameters of plant growth were adjusted to South-Western European crops based on Institute of Agriculture recommendations (cf. Supplementary Materials Table S1).

Study Area Discretization
The conceptualized approach is represented in Figure 3. As explained in Section 2.2 Global Methodology, three projects have been set up to be able to depict three contrasting territories: The Pyrenean, the Mediterranean and the Oceanic zones ( Figure 2). The delineation, warm-up and calibration/validation steps were achieved following the same procedure for each project. HRUs have been defined using a total of 24 land classes [39], 34 soil classes [67] and 5 slopes classes (0-2, 2-5, 5-15, 15-25, and >25 • ) [68]. A total of 263 subbasins with 22,997 HRUs have been identified in the SUDOE territory.

The Conventional Calibration (CC)
For the CC calibration, hydrology, sediments and nitrate loads were calibrated. A three year warm-up period from 1997 to 1999, was taken into consideration to set up the initial variables' value. Hydrology and nitrate loads were automatically calibrated based on sensitivity analysis at 47 gauging stations including 19 hydrological stations and 28 physical-chemical stations over the period 2000-2005. p-value (determines the significance of the sensitivity) and t-stat (provides a measure of sensitivity) were calculated to identify the sensitive parameters [69]. The LH-OAT sensitivity analysis tool [70] was used in order to determine the sensitive parameters that are presented in Table S1. The SUFI-2 algorithm of SWAT-CUP [13,69,71] was selected to perform the automatic multi-site calibration. The SUFI-2 algorithm is known to identify an appropriate parameter set in a limited number of iterations [72]. A total of 1500 runs have been done for calibration steps as recommended by Yang et al. [73]. Validation was then conducted for the same stations over the period 2006-2010. The model was run on a monthly time step.
Four standard statistical criteria suggested by Moriasi et al. [13,74] and Gupta et al. [75] were used to evaluate the goodness of hydrological simulations: coefficient of determination (R 2 ), Nash and Sutcliff Efficiency (NSE), percent bias (PBIAS) and Kling-Gupta efficiency (KGE). The NSE is commonly used to evaluate hydrological models. It ranges from minus infinity to 1 (perfect fit). The correlation coefficient (R 2 ) measures the fit between observed and simulated values and varies from 0 to 1 (strong linear association). A perfect fit requires that the regression slope and intercept are equal to 1 and 0 respectively. The PBIAS is a metric indicating and global underestimation (positive value) or an overestimation (negative value) of the simulation compared to observations. According to Moriasi et al. [13], NSE, R 2 , and PBIAS are satisfactory for values higher than 0.5, 0.6, and ±15% respectively for monthly time series. KGE is a hydrological criteria suggested by Gupta et al. [75] and modified by Kling et al. [76], based on "r" Pearson coefficient (similar to R 2 ), model bias and variability. It ranges from minus infinity to 1 (strong linear association). According to Knoben et al. [77], simulation between −0.41 and 1 could be considered as reasonable performance.
The nitrate and sediment load simulations were compared to sampled data collected once or twice a month from 2000 to 2010 ( Table 1). The Load Estimator model (LOADEST) [78] was used for each physicochemical station due to the lack of nitrate and sediment data in order to generate monthly nitrate and sediment load and extend time series, as commonly used to calibrate and validate SWAT outputs [79][80][81]. This software generates water quality parameters through regressions between streamflow and water quality [78]. A statistically-based optimization procedure identifies the most accurate of the eleven regressions. The Spearman's rank correlation coefficient (ρ, [82]), the non-parametric test measuring the dependence between two variables, is used to compare LOADEST simulation with observations. LOADEST simulations give sediment loads (ρ = 0.94 and p-value < 0.001) and nitrate loads (ρ = 0.99 and p-value < 0.001) close to observations ( Figures S2 and S3 in Supplementary Materials), even if the LOADEST model slightly underestimates the sediment load (slope = 0.71). LOADEST provides us interpolated water quality data that were used to calibrate and validate the SWAT model on a monthly scale in addition to the observed data. The R 2 statistic is used to evaluate the agreement of the nitrate and sediment load simulation between LOADEST and SWAT models for calibration (2000)(2001)(2002)(2003)(2004)(2005) and validation periods (2006-2010).

The Additional Calibration (AC)
From the CC procedure, an additional calibration (AC) we made by adding two calibrations steps: (1) the calibration of the "natural streamflow"-the streamflow that should occur without any anthropogenic pressures and (2) the calibration of the anthropogenic pressures (Fertilization application, crop yield, and irrigation volume) in order to improve the evapotranspiration and the uptake of plants. The validation step is the same procedure as CC using SWAT-CUP. Data used to validate AC were the "natural streamflow", the crop management data and the one used during CC procedure. As a reminder, hydrology, sediments and nitrate loads were calibrated (2000)(2001)(2002)(2003)(2004)(2005) and validated (2006)(2007)(2008)(2009)(2010).
To ensure the correct hydrological performance of the SWAT projects ( Figure 2) calibrated with AC methodology, discharge outputs were compared with two different sources in annual and monthly time steps. The first comparison was done with the observed discharge data at the subsystem scale (administrative entities, Figure 1B). The second one was performed with the discharge values simulated with the "Integrated System for Rainfall-Runoff Modeling" model, commonly named SIMPA [83]. The AC "natural" streamflow simulated by SWAT (the simulation where all human pressures were withdrawn) was calibrated by comparing it with SIMPA simulations, which simulated natural flows. SIMPA is a rainfall-runoff model, developed by the Center for Hydrographic Studies of the Spanish Ministry of Environment (CEDEX) to analyze the spatial and temporal hydrological variables and to simulate hydrological processes without anthropogenic pressures. In Spain, SIMPA model is a reference for public and private stakeholders [84,85]. To compare our SWAT model with SIMPA, the CC and AC calibrations were run after deleting all anthropogenic pressures (agriculture, urban inputs and dam management). Inter-annual and monthly "natural" water volumes were compared with SIMPA outputs in the 7 SUDOE main watersheds (Figure 2). At each calibration step, hydrology, sediments, nitrate loads, and dam and crop managements were re-verified and re-calibrated if necessary.
Dam management, fertilization application, crop yield and irrigation volume were calibrated from 2000 to 2005 and validated from 2006 to 2010. Simulated irrigation volumes were compared to the annual national values. In Spain, where data were available, a comparison of irrigated volume at the regional scale has also been conducted. Moreover, a comparison between simulated crop yields, fertilizer amount applied and national statistics was made on an annual basis. To quantify the goodness of fit between simulated and observed mean crop yield, a PBIAS value is calculated, defined by: where n is the number of observations in the period under consideration.

Climate Change and Agricultural Practices Changes
In order to investigate the sensitivity of the model to climate change, the variability of the simulated outputs has been tested for a precipitation decrease of 20% over the period 1980-2010 based on the global precipitation change between 1971 and 2000 and the +2 • C period of Vautard et al. study [86]. CC and AC calibrations with observed precipitation have been compared to outputs where the precipitations have been decreased between 2000 and 2010. Focus has been placed on water yield (mm), fertilization (kg ha −1 ), biomass (t ha −1 ), yield (t ha −1 ), and irrigation volume (Hm 3 ·year −1 ) and a percentage of variation has been calculated for the outputs simulated by the reduced precipitation with respect to the normal precipitation dataset. The water yield is considered as the long-term average flow to convey the idea that only a volume of water is being referred to, as opposed to a hydrograph [87].

Indicators of Hydrological Alteration
More than 170 hydrological indicators have been developed in the past years including the widely used Index of Hydrological Alteration (IHA) [36]. IHA characterizes the impact of river regulation on flow regimes in environmental flow studies [36]. The IHA was developed by the Nature Conservancy organization [32,37,38]. The IHA calculated on a monthly scale is the sum of twelve indicators that reflects, from different statistics (magnitude index, variability index, flood and drought intensity and frequency indexes), the modification of the current human-impacted flow regime, with respect to the natural one [32,37,38]. The IHA used in this study was proposed by Richter et al. [32] and modified by Santa-María [88]. The IHA varies between 0 and 100 with the maximum value meaning that the river flow is not altered.
In order to provide suggestions on the model set up choice and the calibration/validation procedures according to the level of the IHA, an inter-model comparison between AC and CC has been conducted. The IHA has been compared to the deviation of the hydrological criteria KGE (∆KGE) between the CC and AC simulations. KGE has been chosen in this analysis because KGE simultaneously encompasses the model correlation, bias and variability. The ∆KGE were calculated for the 47 gauging stations considered in this study (even if the streamflows have been calibrated on only 19 gauging stations). When the ∆KGE obtain from the comparison of the CC and AC calibrations is close to 0, AC and CC performances (the bias, dispersion and correlations compared to observations) are similar.

Results and Discussions
Calibration is a crucial step on modeling, considering its impacts on model outputs and result interpretations [89]. For hydrological models, only the streamflow and the water balance is usually calibrated based on field records and previous studies [11,20,80,90]. For agronomic models, biomass production and crop yield help to calibrate the model [91]. In this study, the main difficulty regarding the modeling set up and calibration methodology is related to the large scale. Collecting data (biomass production, plant growth stage, water and nitrate runoff) is relatively easy to conduct on a local scale but becomes very complicated with time and budget constraints when a transnational scale is considered. The results and implications of the calibration strategy for water quantity and quality are detailed and discussed below.

Water Quantity Evaluation
To evaluate water quantity, the comparison of streamflow recorded in gauging stations and streamflow simulations is commonly done. This section presents the streamflow evaluation of CC and AC simulations and a comparison of these simulations. Moreover, a comparison of the water yield was conducted in order to validate the hydrological models not only on the streamflow record.

Streamflow Evaluation of Conventional Calibration (CC)
The simulated monthly streamflow from the conventional calibration (CC) (calibrated only on streamflows, nitrate, and sediment loads) is shown in Figure 4. The sensitivity analysis for streamflow identified 15 most influential parameters for the 45 selected (cf. Supplementary Materials Tables S2 and S3). The sensitive parameters ranking change between the different parts of the SUDOE territory. The water quantity modeling is influenced the most by the base flow alpha-factor and the groundwater delay. According to Moriasi et al. [13,74] Table 3). The results are not satisfactory in terms of extreme values, however, the trends were adjusted appropriately. The global behavior of the multiple watersheds model is satisfactory to good, especially during low flow periods (Figure 4) considering the complexity of a multi-watershed model. NSE indicator is poor in some watersheds that are subjected to extreme values (e.g., Duero in 2010, Guadiana in 2001 and 2010) or highly anthropized (e.g., Jucar) ( Figure 4). There are two major reasons for the poor NSE performances in some watershed: (1) reanalysis of MESAN as input climate datasets and (2) level of human stressors. NSE performances at Adour, Duero, Jucar, and Minho could be related to high uncertainties on the MESAN reanalyzes over the Pyrenees and over the South-Western coasts of Europe such as Aquitaine, Basque Country, Tajo, Duero, and Jucar regions [56].
In anthropized watersheds, the SWAT model has the ability to integrate human stressors [21] even if some management practices cannot be reproduced by the model. For example, dam management in the Tajo river watershed significantly changes between the calibration and the validation period. Streamflow does not go above 40 m 3 /s after 2005 due to dam management, where the peak flows are erased, whereas SWAT simulations simulate these peak flows. In addition, the multiple watershed modeling set up presented in this study has required several simplifications leading to an increase in the uncertainties. For instance, the dominant crop of each region and its technical management were selected in order to simplify the model conceptualization. If we compare this study simulation with the simulation of Abbaspour et al. [9], simulations are in the same range or better for our simulation. However, the South of Spain has poor simulation in Abbaspour et al. [9] in Jucar and Guadalquivir outlets (R 2 < 0.1) as much as our study (R 2 equal to 0.58 for Jucar outlet and to 0.67 for Guadalquivir outlet for the calibration period but NSE equal to −0.3 and 0.27 for Jucar and Guadalquivir respectively). Water 2020, 12, x FOR PEER REVIEW 15 of 32

Streamflow Evaluation of Additional Calibration (AC)
The monthly streamflow simulated with the Additional Calibration AC (see Figure 3 for more information about the procedure) are presented in Figure 4 and corresponding statistics in Table 3. The sensitivity parameters are presented in Supplementary Materials Table S2 (parameters sensitivity ranking) and Table S3 (sensitivity analysis). The most sensitive parameters differ depending on the SUDOE regions. However, the base flow alpha factor, the groundwater delay and the soil evaporation compensation factor are the most sensitive over the entire territory. According to Moriasi et al. [13,74] and Gupta et al. [76] standards, the statistical performance over the entire territory is satisfactory to good for calibration and validation periods with respectively, with a range of NSE of 0. Over the SUDOE main watersheds (Figure 2), AC simulated and observed annual irrigation volumes are rather similar (16,917 Hm 3 and 17,537 Hm 3 , respectively). The regional comparison of the simulated volumes of irrigation from the AC calibration in each Spanish region ( Figure 5) indicates also a good regional distribution of irrigated water in the model with R 2 (0.87) and PBIAS (10%).
The calibrated irrigation volumes were obtained by offsetting volumes per hectare, as the model overestimates the irrigated area because of the initial assumption (each type of land use was considered as totally irrigated or non-irrigated-Section 2.4.3). For this reason, the absolute irrigated volume spread by fields by hectares is partially biased with less water applied and a loss of biomass production. However, in order to compensate for the lack of biomass, the plant biomass production parameters were modified (cf. Supplementary Materials Table S1). Water 2020, 12, x FOR PEER REVIEW 17 of 32

Comparison of Streamflow Simulations between Conventional Calibration (CC) and Additional Calibration (AC)
CC and AC streamflow dynamics are similar (Figure 4) whereas the calibration procedures were different-AC procedure is enhanced by calibrating anthropogenic activities such as irrigation volume and crop yield. The parameters used to calibrate irrigation volume and crop yield for AC procedure are presented in Supplementary Materials Table S1. The sensitive parameters ranking according to the significance of the sensitivity (p-value) is introduced on the Supplementary Materials in Table S2 with the most sensitive parameters ranked first. CC and AC calibration are different sensitive parameters for streamflow calibration. However, similar sensitive parameters are found as the base flow alpha factor or the groundwater delay. Previous studies [11,16,92] show similar sensitivity for both of these parameters indicating a direct relationship between the surface runoff and other compartments parameters.
Water balances between CC and AC calibrations are different. The difference can be important ranging from −114% to +39% between compartments for surface runoff (−114%; 28%), lateral flow (-29%; 31%), water yield (−27%; 39%), evapotranspiration (−51%; 4%), and sediment yield (−100%; 10%). The strongest differences are presented during low flow periods with a difference of -18% on average whereas the lowest differences occur during high flow (−9% on average). It also seems that there are fewer differences for watersheds with low anthropogenic pressures (difference of −11%) whereas the difference is more important for watersheds highly anthropized (difference of −39%). However, no data from these hydrological components (lateral flow, evapotranspiration, sediment yield) with a sufficient temporal and spatial resolution can allow the validation of one or the other calibration process regarding the distribution of water fluxes inside the water cycle. Additional validation was conducted with the comparisons of water irrigated and crop yields simulations and national datasets values.

Comparison of Streamflow Simulations between Conventional Calibration (CC) and Additional Calibration (AC)
CC and AC streamflow dynamics are similar (Figure 4) whereas the calibration procedures were different-AC procedure is enhanced by calibrating anthropogenic activities such as irrigation volume and crop yield. The parameters used to calibrate irrigation volume and crop yield for AC procedure are presented in Supplementary Materials Table S1. The sensitive parameters ranking according to the significance of the sensitivity (p-value) is introduced on the Supplementary Materials in Table  S2 with the most sensitive parameters ranked first. CC and AC calibration are different sensitive parameters for streamflow calibration. However, similar sensitive parameters are found as the base flow alpha factor or the groundwater delay. Previous studies [11,16,92] show similar sensitivity for both of these parameters indicating a direct relationship between the surface runoff and other compartments parameters.
Water balances between CC and AC calibrations are different. The difference can be important ranging from −114% to +39% between compartments for surface runoff (−114%; 28%), lateral flow (−29%; 31%), water yield (−27%; 39%), evapotranspiration (−51%; 4%), and sediment yield (−100%; 10%). The strongest differences are presented during low flow periods with a difference of −18% on average whereas the lowest differences occur during high flow (−9% on average). It also seems that there are fewer differences for watersheds with low anthropogenic pressures (difference of −11%) whereas the difference is more important for watersheds highly anthropized (difference of −39%). However, no data from these hydrological components (lateral flow, evapotranspiration, sediment yield) with a sufficient temporal and spatial resolution can allow the validation of one or the other calibration process regarding the distribution of water fluxes inside the water cycle. Additional validation was conducted with the comparisons of water irrigated and crop yields simulations and national datasets values.
The total irrigated volume of AC (20,400 Hm 3 ) is close to observed values (21,100 Hm 3 ) (R 2 = 0.87, p-value < 0.001) compared to the CC simulations with the observed values (R 2 = 0.37, p-value < 0.001) which overestimates irrigated volumes by 36% (27,800 Hm 3 ). Irrigation volume has a substantial impact on water repartition over the watershed and, consequently, on the water balance between compartments (surface runoff (−114%; 28%) and evapotranspiration (−51%; 4%)). In AC calibration, irrigation volume is taken from dams and rivers whereas, in CC calibration, water use for irrigation comes from outside the watershed. In CC water is not limiting and water management is not constrained. The choices of irrigation parameterization have an impact on water balance and hydrological response with the simulated watershed. This study highlights the importance for the SWAT model users to carefully verify the specification and boundaries of crop management and irrigation routines, in order to avoid an over or underestimation of the water volume. A suggestion is to compare the simulated crop management techniques with national statistic values to evaluate the truthfulness of the simulation.

Water Yield Validation with Observations and Model Outputs
A comparison of the water yield obtained from the different calibration methodologies has been conducted in order to validate the hydrological simulation on different sources of data and not just streamflow records. The water yield obtained from AC simulations and CC simulations was first compared with observed annual water volume ( Figure 6A) in order to compare the performance of each calibration methodology. In a second step, the CC and AC projects once calibrated were validated by comparing water volumes under "natural" conditions with water volumes obtained with SIMPA under "natural" conditions as well ( Figure 6B). . Irrigation volume has a substantial impact on water repartition over the watershed and, consequently, on the water balance between compartments (surface runoff (−114%; 28%) and evapotranspiration (−51%; 4%)). In AC calibration, irrigation volume is taken from dams and rivers whereas, in CC calibration, water use for irrigation comes from outside the watershed. In CC water is not limiting and water management is not constrained. The choices of irrigation parameterization have an impact on water balance and hydrological response with the simulated watershed. This study highlights the importance for the SWAT model users to carefully verify the specification and boundaries of crop management and irrigation routines, in order to avoid an over or underestimation of the water volume. A suggestion is to compare the simulated crop management techniques with national statistic values to evaluate the truthfulness of the simulation.

Water Yield Validation with Observations and Model Outputs
A comparison of the water yield obtained from the different calibration methodologies has been conducted in order to validate the hydrological simulation on different sources of data and not just streamflow records. The water yield obtained from AC simulations and CC simulations was first compared with observed annual water volume ( Figure 6A) in order to compare the performance of each calibration methodology. In a second step, the CC and AC projects once calibrated were validated by comparing water volumes under "natural" conditions with water volumes obtained with SIMPA under "natural" conditions as well ( Figure 6B).  Figure 6A shows satisfactory regressions between the annual water volumes observed and simulated with CC (R 2 = 0.91, slope = 1.13) and AC calibrations (R 2 = 0.94, slope = 1.07). The good simulation of annual water yield supports the analysis of an acceptable simulation of discharge as described above. High R 2 was found between SWAT simulated "natural" interannual water volume (the simulation where all human pressures have been removed) and SIMPA simulated water volume (0.88 and 0.91, respectively for CC and AC; Figure 6B). The regression is good for both models but the slopes are different: 0.74 for CC and 0.99 for AC. The slope difference could be explained by the difference in irrigation calibration. To achieve a good calibration, the CC calibration was set up to provide irrigation with water originating from outside the watershed inducing an increase of surface runoff that is compensated with the increase of infiltration into groundwater. For the AC regressions  Figure 6A shows satisfactory regressions between the annual water volumes observed and simulated with CC (R 2 = 0.91, slope = 1.13) and AC calibrations (R 2 = 0.94, slope = 1.07). The good simulation of annual water yield supports the analysis of an acceptable simulation of discharge as described above. High R 2 was found between SWAT simulated "natural" interannual water volume (the simulation where all human pressures have been removed) and SIMPA simulated water volume (0.88 and 0.91, respectively for CC and AC; Figure 6B). The regression is good for both models but the slopes are different: 0.74 for CC and 0.99 for AC. The slope difference could be explained by the difference in irrigation calibration. To achieve a good calibration, the CC calibration was set up to provide irrigation with water originating from outside the watershed inducing an increase of surface runoff that is compensated with the increase of infiltration into groundwater. For the AC regressions (R 2 > 0.9), linear slopes are close to 1 indicating that water volume in SWAT is not under-or overestimated. Water volumes seem correctly distributed between the main watersheds of the SUDOE territory for both "natural" and "anthropogenic" versions of the AC calibration. The double comparison of water volume (SIMPA and observation) reinforces the validation of AC simulations.
Monthly water volumes were then compared by subsystems ( Figure 1B) in order to validate the spatial and temporal variations of the water volume on a smaller spatial scale. The R 2 between SIMPA and AC "natural" simulations is ranging between 0.84 and 0.98 (Figure 7), ascertaining a proper distribution of water volume within each main watershed. distribution of water volume within each main watershed. Figure 8 shows the interannual monthly dynamics of "natural" water volume for the main watersheds between SIMPA, CC, and AC simulations. Two groups of watersheds can be defined: One group with high similarity in term of temporal dynamics of monthly water volumes (Ebro, Duero, Guadiana, Guadalquivir, Tajo) and another group with a significant difference between SWAT and SIMPA water volumes from June to December (Jucar and Segura). These two groups might be related to the level of anthropogenic pressures on the watershed. The Jucar and Segura watersheds, the most important ones in the South East of Spain, characterized by a warm and dry climate [93] with low flow support based on water imported from other watersheds, thus highly anthropized. The SIMPA model (the reference in this study), which modeled upstream streamflow with low pressures, seems to support water volumes almost stable throughout the year whereas SWAT simulation shows a severe decrease of water during low flow ( Figure 8). Overall, most of the watersheds have the same trend between models hence decreasing the water balance uncertainties. Model comparison, with models that had attained a level of familiarity and trust by stakeholders, is efficient when dataset availability is limited and show promise to evaluate scenarios and decisionmaking [94,95].

Water Quality Evaluation
To evaluate water quality, sediment and nitrate loads are commonly used as indicators of the particulate and dissolved in-stream elements, respectively. Modeled sediment and nitrate load time series are compared to observations as well as the impact of cultural practices on CC and AC simulations.  Figure 8 shows the interannual monthly dynamics of "natural" water volume for the main watersheds between SIMPA, CC, and AC simulations. Two groups of watersheds can be defined: One group with high similarity in term of temporal dynamics of monthly water volumes (Ebro, Duero, Guadiana, Guadalquivir, Tajo) and another group with a significant difference between SWAT and SIMPA water volumes from June to December (Jucar and Segura). These two groups might be related to the level of anthropogenic pressures on the watershed. The Jucar and Segura watersheds, the most important ones in the South East of Spain, characterized by a warm and dry climate [93] with low flow support based on water imported from other watersheds, thus highly anthropized. The SIMPA model (the reference in this study), which modeled upstream streamflow with low pressures, seems to support water volumes almost stable throughout the year whereas SWAT simulation shows a severe decrease of water during low flow ( Figure 8). Overall, most of the watersheds have the same trend between models hence decreasing the water balance uncertainties. Model comparison, with models that had attained a level of familiarity and trust by stakeholders, is efficient when dataset availability is limited and show promise to evaluate scenarios and decision-making [94,95].

Water Quality Evaluation
To evaluate water quality, sediment and nitrate loads are commonly used as indicators of the particulate and dissolved in-stream elements, respectively. Modeled sediment and nitrate load time series are compared to observations as well as the impact of cultural practices on CC and AC simulations.

Sediment Loads
The monthly sediment loads from LOADEST and both SWAT versions are shown in Figure 9. The mean R 2 of monthly sediment load is 0.72 ranging from 0.17 to 0.95. Sediment load is difficult to calibrate especially during flood events with greater deviations but the simulation gives a good understanding of the global trend [20,90].

Yield and Fertilization
Observed and AC simulated interannual crop yields over the basin show a good adjustment according to the average PBIAS indicator of +8% (−78; +40) ( Figure 11). The average crop yield over the territory was 7.4 t ha −1 for the AC calibration but only 2.3 t ha −1 for CC. Two hypotheses might explain the CC crop yield underestimations: (1) The fertilization is too low, (2) the irrigation is only triggered when the plant is under stress which can slow down plant growth. The sediment loads between CC and AC calibrations are similar for most of the watersheds such as Aude, Minho, Guadiana, and Segura ( Figure 9). However, CC and AC calibrations do not have the same response in some other watersheds such as the outlet of Garonne, Ebro, and Jucar ( Figure 9A,F,I) where the model tends to overestimate the sediment loads in AC calibration whereas these loads are underestimated with CC calibration. The Garonne, Ebro, and Jucar are all main agricultural watersheds. The sediment loads difference is due to the increase in water volume used in irrigation that increases the soil runoff and erosion coefficient. In the Garonne, Ebro, and Jucar watersheds, the main difference between CC and AC is the localization of the irrigation source. By default, the value is "outside the watershed" when the difference was changed. This irrigation source information could create a substantial difference in the streamflow in a watershed with low water levels and high irrigation such as Jucar watershed. The pedoclimatic conditions have indeed a substantial influence on some parameters impacting sediment load. In fact, the weather, the lithology and the land use have an impact on soil erosion and some parameters such as USLE C factor or the management practice will have an impact on this erosion [47].

Nitrate Loads
The interpolation of nitrate concentration with LOADEST gives satisfactory results (Supplementary Materials Figures S2 and S3) according to Kim et al. [96] standards. LOADEST tends to overestimate extreme values as largely [97], but this work is focusing on the impact of global change on water resources and more precisely on the SUDOE water scarcity issues which mostly occur during low flow periods [33]. Other studies already compared the nitrate loads simulated by LOADEST and SWAT models [97] and showed that the LOADEST software is useful when you have a discrete and large range of values under different hydrological conditions.
The comparison of nitrate loads between SWAT versions and LOADEST at the main river outlets shows an overall good agreement with an R 2 of 0.  Figure 10). The differences between AC and CC calibrations are not significant (p-value > 0.29) but with respect to the absolute performance value, nitrate loads of the CC calibration have lower statistics than the AC calibration with a mean R 2 of 0.50 and 0.70, respectively. For Jucar and Guadiana watersheds, nitrate loads of CC calibration ( Figure 10) are lower than AC which could be explained by a decrease of nitrate runoff correlated to an increase of biomass production produced by the non-limiting irrigation. There are two actions that influence this response. The first is auto-irrigation depending on the plant demand (the production is not limited) and the second is fertilization. Biomass production is high thanks to the presence of fertilizer, applied in the most effective way, decreasing the soluble soil nitrate concentration which leads to a decrease in the nitrate runoff. For other watersheds such as Garonne, Aude, or Segura, nitrate runoff is more important during CC calibration because irrigation is triggered according to soil water saturation, causing excessive irrigation in these basins, which leads to nitrate runoff after fertilizer application. The comparison between observed and simulated crop yields is thus essential to determine nitrate sources and to validate the model. Jégo et al. [98] indicate the importance of crop management and particularly for N application periods on nitrate runoff in the Garonne watershed. Hence, it is therefore important to carefully take into account crop management, as well as crop yield and fertilization in the model.

Yield and Fertilization
Observed and AC simulated interannual crop yields over the basin show a good adjustment according to the average PBIAS indicator of +8% (−78; +40) ( Figure 11). The average crop yield over the territory was 7.4 t ha −1 for the AC calibration but only 2.3 t ha −1 for CC. Two hypotheses might explain the CC crop yield underestimations: (1) The fertilization is too low, (2) the irrigation is only triggered when the plant is under stress which can slow down plant growth. Water 2020, 12, x FOR PEER REVIEW 23 of 32      Table 4 represents the percentage of change of each variable for both calibrations with a −20% decrease in precipitation. A negative impact on variables is expected with a diminishing rainfall such as a reduction in the irrigated volume and crop yields [99]. CC and AC water yields have the same trend with a decrease close to −40%. A streamflow decrease is also highlighted by Stahl et al. [100]. However, the crop management variables are different between CC and AC because CC crop management is not calibrated. Under drier conditions, plant growth will be coerced by the lack of water in the watershed in the AC calibration (−35%) whereas irrigated water is increasing in the CC calibration (+5%). The same opposite trends occurred with crop yield simulation with respectively +23% and −26% for CC and AC calibration. The irrigation source used in the CC is located outside of the watershed. Under such condition, the auto-irrigation is not limited by the water volume available in the river and the plant growth are not impacted by the drought period. That is the reason why the model simulated a light increase of plant growth with a decrease of rainfall. AC outputs are the most realistic calibration considering that plant growth, crop yield and irrigation volume will be coerced by the lack of water under a drier climate. This study strongly recommends calibrating crop management in the water scarcity context in order to have a proper allocation of water over the watershed, especially for studies assessing the potential impacts of climate change or human practices changes. The question is: Can we have an indicator to help us in the choice of calibration/validation procedure for one watershed? Table 4. Percentage of variable differences for conventional calibration (CC) and additional calibration (AC) between the regular precipitation and a decrease of 20% of rainfall from 2000 to 2010. A positive percentage represents an increase of the variable whereas a negative value means a decrease.

Variables
Differences of CC Differences of AC In order to decide which calibration/validation procedure is the most appropriate, this study analyzed the variations of the IHA according to the deviation of KGE (∆KGE). Figure 12 shows a negative correlation between the IHA and ∆KGE for the 47 gauging stations representing the entire range of subbasins over the SUDOE territory. Two groups can be distinguished: (1) Subbasins with IHA above 80% and (2) subbasins under 80%. With an IHA above 80%, the ∆KGE is lower than 0.1, the CC and AC calibrations have similar trends so it might not be useful to calibrate the human activities as dams and crop management. When the IHA is under 80%, the ∆KGE variations are high (0; 0.65)-meaning that in some of these cases, the addition and the calibration of human activities, such as dams and crop management, is necessary to obtain a representative simulation of the reality. Our recommendation will be to apply the AC procedure in the case of when the IHA is under 80%. The IHA index can become a tool to determine the calibration/validation procedure. It is available in the south-west region but could be transferable to regions where water resources are impacted by agriculture and water management (dams, channels). Water 2020, 12, x FOR PEER REVIEW 25 of 32 This study explores an alternative methodology to model water quantity and quality resources at multiple scales, for the first time, by coupling various data sources (agricultural, urban effluents, quality and quantity data) at the SUDOE scale-that has been never approached [101][102][103]. Previous studies made analysis on uncalibrated simulations [9] or calibrated only on streamflow [11]. For hydrological models, only the streamflow and the water balance are usually analyzed based on gauging station records and previous studies [8]. For agronomic models, biomass production and crop yield help to calibrate the model [104]. This study combines hydrological and agronomic calibration approaches in order to improve the model estimation thanks to multiple datasets with different scales and time resolutions. In this study, a suggestion is made: Calibrating the model with different data sources, easily available coming from national open-access sources such as national statistics department or water agency reports and samplings. The multiple and diversified databases used showed that the water balance and human activities were better represented in AC calibration. The multi-watershed model is complex. Sometimes a trade-off of one parameter needs to be found because watershed responses to the parameter could be opposite. The need to cross data for calibrating and validating models is underlined. This study should not be reduced to the SWAT model only but it could be extrapolated to every hydro-agronomic model. Outputs from calibrated models come along with model uncertainties and assumptions. It is up to the analysts to quantify model prediction uncertainty and calibrate with their own field expertise of the watershed. In order to establish this methodology, running a soft calibration first and then launching an optimization procedure is advised. The soft calibration is a manual calibration based on watershed knowledge (characteristics, specificities). The optimization procedure is allowed to refine the parameters calibrated during the soft calibration, as a statistical calibration, for example, SWAT-CUP for SWAT [69,71].
This paper provides some tools and feedbacks on the SWAT model set up in order (1) to understand the importance of datasets choices in the calibration and validation steps, (2) to help future users in the large scale set up and (3) to choose the calibration/validation procedure. So far, the AC calibration is thus the most appropriate simulation to run further research on ecosystem services evaluations, climate change impact or direct human activities impact (e.g., the effect of implementing new water policy).

Limitations and Uncertainties of This Study
We calculated the percentage of rivers of Table 3 the statistics criteria R 2 , NSE, PBIAS and KGE. According to Moriasi [13,74], we considered the performances of the model succeed when R 2 , is This study explores an alternative methodology to model water quantity and quality resources at multiple scales, for the first time, by coupling various data sources (agricultural, urban effluents, quality and quantity data) at the SUDOE scale-that has been never approached [101][102][103]. Previous studies made analysis on uncalibrated simulations [9] or calibrated only on streamflow [11]. For hydrological models, only the streamflow and the water balance are usually analyzed based on gauging station records and previous studies [8]. For agronomic models, biomass production and crop yield help to calibrate the model [104]. This study combines hydrological and agronomic calibration approaches in order to improve the model estimation thanks to multiple datasets with different scales and time resolutions. In this study, a suggestion is made: Calibrating the model with different data sources, easily available coming from national open-access sources such as national statistics department or water agency reports and samplings. The multiple and diversified databases used showed that the water balance and human activities were better represented in AC calibration. The multi-watershed model is complex. Sometimes a trade-off of one parameter needs to be found because watershed responses to the parameter could be opposite. The need to cross data for calibrating and validating models is underlined. This study should not be reduced to the SWAT model only but it could be extrapolated to every hydro-agronomic model. Outputs from calibrated models come along with model uncertainties and assumptions. It is up to the analysts to quantify model prediction uncertainty and calibrate with their own field expertise of the watershed. In order to establish this methodology, running a soft calibration first and then launching an optimization procedure is advised. The soft calibration is a manual calibration based on watershed knowledge (characteristics, specificities). The optimization procedure is allowed to refine the parameters calibrated during the soft calibration, as a statistical calibration, for example, SWAT-CUP for SWAT [69,71].
This paper provides some tools and feedbacks on the SWAT model set up in order (1) to understand the importance of datasets choices in the calibration and validation steps, (2) to help future users in the large scale set up and (3) to choose the calibration/validation procedure. So far, the AC calibration is thus the most appropriate simulation to run further research on ecosystem services evaluations, climate change impact or direct human activities impact (e.g., the effect of implementing new water policy).

Limitations and Uncertainties of This Study
We calculated the percentage of rivers of Table 3 the statistics criteria R 2 , NSE, PBIAS and KGE. According to Moriasi [13,74], we considered the performances of the model succeed when R 2 , is higher to 0.6, NSE to 0.5, and PBIAS ranged between >15% or <−15%. According to the Knoben criteria [77], we considered the performance succeed when the KGE is higher to −0.41. Based on these criteria, the hydrology performances of the model for both calibration and validation period and both conventional and additional calibration are succeeded 100% for KGE, 90% for R 2 , 67% for PBIAS, and 31% for NSE. The performances obtained in this work for some rivers highly anthropized in the South of Europe are similar or better than the ones published by Abbaspour et al. [9]. Previous studies tried to compare observed and modeled continental river discharge and nitrate loads [9,105]. These studies point out the difficulties to calibrate and validate hydrological processes at the continental scale and are rather more complex for nitrate loads. In fact, our study model had to simplify the model set up and the input because of the lack of data related to anthropization. The quality of the outputs is significantly impacted by the degree of simplification, the input data availability and uncertainty [106], the temporal and spatial degree of discretization [107] which also impacts the quality of calibration and validation procedure. In the context of high anthropized systems, the performances of the calibration are related to water resources management (dams, canals, irrigation practices). All these data set should be at the scale of daily time step but unfortunately for dams, canals, and crop practices, we have data set at an annual time scale. In this context, the input data increase the uncertainties of the output. These uncertainties also impact the nitrate and sediment loads and the associated processes. The SWAT model used simplify equations [43,47] for quantifying the sediment and nitrate loads into the river. Therefore, these simplifications can cause deviations for sediment routing and nitrogen modeling. In fact, the uncertainty in model inputs influenced the N soil input such as the fertilizer use or the allocation of animal manure (that we do not consider in this study). The uncertainties in the exchange fractions of different types of N are still not completely known. Improving N leaching and runoff fraction should receive more attention in future studies in order to improve the nitrogen cycle modeling.
In order to optimize the calibration procedure, in this paper, we proposed a method to evaluate the level of the anthropization by the IHA index for large systems and we show that the model performance is directly related to this index (low IHA, high pressure, and low-performance model). In this context, the application of the Moriasi statistical thresholds is not adapted. Finally, our study has some limitations and high uncertainties which should be taking into consideration for further studies.

Conclusions
This study shows for the first time the evaluation of a guideline for calibrating multi-watersheds hydrological models as a complexity according to a level of anthropization in the context of global changes. The IAH is a good index to determine the complexity level of calibration procedure. To determine the most appropriate calibration/validation procedures, the global hydrological alteration index, IHA, demonstrated its feasibility: IHA values under 80% indicate that a complex procedure of calibration/validation would yield better results. Many applications of the model should be done in further studies. An assessment of ecosystem services and ecological functions could be interesting as well as the economic evaluations of these services such as the economic cost of maintaining good water quality. The model might also help to conduct river and land use planning studies in order to help stakeholders. Finally, the model can be used to conduct climate and land-use change impact studies.
Supplementary Materials: The following are available online at http://www.mdpi.com/2073-4441/12/1/115/s1, Figure S1: Comparison between observed nitrogen loads (coming from the UWWTP databases) and calculated one with the Zessner and Lindtner (2005) method, Figure S2: Sediment load regression between observation and LOADEST simulation at daily time step in kg d −1 , Figure S3: Nitrate load regression between observation and LOADEST simulation at daily time step in kg d −1 , Table S1: Parameters used to calibrate water quantity and quality in the SUDOE territory. The parameters to calibrate irrigated volume and crop yield were used only for the AC calibration, Table S1: Parameters sensitivity ranking for streamflow model outputs for each part of SUDOE territory for CC and AC models, Table S2: Sensitivity analysis for CC streamflow models outputs, Table  S3: Sensitivity analysis for AC streamflow models outputs.