1. Introduction
The big data evolution in recent years has paved the way for remote sensing-integrated watershed modeling. To limit models’ perceived tendency to give the “right answers for wrong reasons” [
1,
2], and therefore ensure realistic watershed management alternatives [
3,
4,
5], the hydrogeoscience community is increasingly using remotely sensed big data. It is already evident that remotely sensed estimates of vertical water fluxes, such as soil moisture and evapotranspiration, offer realistic constraints to watershed models, leading to improved simulation of hydrologic processes [
6,
7,
8,
9,
10]. However, such improved simulation is feasible despite misrepresented vegetation dynamics in the model [
7,
8]. Because spatiotemporal vegetation dynamics (i.e., when, where, what, and how vegetation grows on the land surface) is a primary driver of land-atmosphere interaction and hence the water-energy exchanges across the watershed [
11,
12], achieving the so-called improved hydrologic simulation while overlooking vegetation dynamics is not physically meaningful. Nonetheless, the implications of misrepresented vegetation dynamics on water quality simulations (e.g., in-stream nutrient load [
13]) remains underexplored.
With the advent of advanced earth observation algorithms, remotely sensed Leaf Area Index (LAI; leaf surface area per unit land surface area) has emerged as an effective measure of global vegetation dynamics [
11,
14]. Today’s state-of-the-science biophysical models simulating gross/net primary productivity, surface wetness, crop yield, and CO
2 storage often assimilate remotely sensed LAI data as the proxy for real-life vegetation dynamics [
12,
15,
16,
17,
18,
19,
20]. However, watershed models simulating hydrology and water quality continue to rely on semiempirical LAI equations (e.g., leaf development curve) and associated biotic/abiotic user inputs (e.g., vegetation types, available solar energy, crop plantation, and harvest dates) [
21,
22,
23,
24,
25]. This LAI parameterization approach lacks physical realism in many widely used watershed models (e.g., Soil and Water Assessment Tool, Variable Infiltration Capacity model) [
21,
22,
23,
24,
25]. In some of these models, a vegetation/LAI development module does not exist (e.g., Hydrologic Engineering Center-Hydrologic Modeling System) [
26]. Therefore, remedial measures such as modification of existing LAI equations and calibration of LAI parameters (see, e.g., [
13,
22,
23]) are not always applicable. To address these limitations, assimilation of remotely sensed LAI data has been recently introduced in traditional watershed modeling practices [
24,
25,
27,
28,
29,
30].
The few recent studies which have assimilated remotely sensed LAI data in traditional watershed modeling practices invariably showed that better LAI representation improves model predictability [
24,
25,
27,
28,
29,
30]. Whereas these results promote a widespread use of remotely sensed LAI data in watershed modeling, we identified two major knowledge gaps that require targeted investigations:
- (1)
Previous studies were predominantly focused on hydrologic processes [
24,
25,
27,
28,
29]. Although studies involving water quality simulations evaluated the potential effect of improved LAI on sediment yield [
30], the extent of such effects on nutrient (e.g., nitrate) loads—the most critical issue in agricultural watersheds from water quality management standpoint—is still unknown.
- (2)
Evaluations conducted in the previous hydrologic studies were mostly limited to vertical water flux and storage simulations (e.g., evapotranspiration and soil moisture) [
24,
27,
28,
29], with very little emphasis on the watersheds’ cumulative response to downstream waters (i.e., streamflow) [
25,
30].
In this study, we intended to fill these knowledge gaps by assimilating Moderate Resolution Imaging Spectroradiometer LAI data (MODIS MCD15A3H) [
31] into the Soil and Water Assessment Tool (SWAT) [
21]. The specific objective of this study was to quantify the extent of improvements that the assimilation of MODIS LAI data would convey to streamflow, soil moisture, and nitrate load simulations at a daily timescale. Another unique contribution of our study was a new, highly efficient SWAT source code, which can perform multisource, multivariate assimilation of remotely sensed data regardless of watershed sizes and geolocations.
2. Methodology
We developed two contrasting model configurations to evaluate the effect assimilating MODIS LAI data on simulated streamflow, soil moisture, and nitrate load:
- (1)
The basic model: LAI was simulated by the model based on input land use and associated biophysical parameters—a common approach in watershed modeling. This was our baseline to measure the degree of improvement in model results in the subsequent configuration.
- (2)
The LAI assimilation model: The same setup as in the basic model, except MODIS LAI was directly inserted by replacing the simulated LAI values in each of the spatial units of the model (e.g., hydrologic response units or subbasins).
Our model testbed, the 16,860-km
2 Cedar River Watershed (CRW;
Figure 1), drains a major portion of eastern Iowa in the United States (
Figure 1). This entire region has experienced increased flood frequency in the last two decades [
32] as a result of climate and land use changes [
33,
34]. Studies have shown that this altered hydrologic dynamic is likely to continue in future years [
35,
36]. Furthermore, with more than 70% of the land area used for agricultural purposes (
Figure 1a), CRW has long been noted as a water quality hotspot exporting large nitrate loads to the Mississippi River system [
3,
37]. Considering these challenges, CRW is frequently used as a testbed in watershed modeling and management studies [
3,
35,
37,
38,
39].
Our modeling experiments were conducted using SWAT [
21], which is a process-based, semidistributed tool capable of simulating landscape water balances, floods and droughts, nonpoint source water pollution, crop yield, and best management practices across different physiographic settings (see, e.g., [
5,
36,
40,
41,
42,
43]). Correspondingly, many previous studies addressing the heterogeneous water issues in Iowa watersheds applied SWAT as their primary research tool [
3,
35,
37,
38,
39,
44]. SWAT has also been used as the simulation tool in many data assimilation experiments (e.g., [
7,
8]). Therefore, our SWAT-based analyses to understand the effect of assimilating MODIS LAI data on hydrology and water quality simulations will have immediate applications-both to CRW and other agricultural watersheds worldwide.
In the following subsections, we describe the data and method used to set up the two contrasting model configurations (with and without MODIS LAI data assimilation) and the model calibration-verification procedure.
2.1. The Basic Model
The basic model configuration was based on the CRW model originally developed by Golden et al. [
3]. The geospatial inputs, primarily weather, topography, land use, soil properties, and agricultural management data, required to setup the CRW model are summarized in
Table 1. In short, a topographic analysis of flow direction and flow accumulation divided the CRW into 95 subbasins, with an average drainage area of ~176 km
2. The original CRW model by Golden et al. [
3] discretized the subbasins into 1860 Hydrologic Response Units (HRUs). However, to reflect the common modeling issues of limited data availability, model infidelity, and computational liability (e.g., [
5,
7,
45]), we spatially discretized the CRW model at the subbasin level, meaning that subbasins were the smallest units of process simulation in our modeling experiments. As a result, each of the 95 subbasins functioned like a single HRU (hence, total 95 HRUs). This approach aggregated land use input data and spatially explicit information on crop types to the subbasin level, and therefore allowed us to assess the full potential of MODIS LAI data for minimizing these deficiencies in the LAI assimilation model.
Representation of landscape vegetation dynamics in our basic model configuration was based on a commonly followed semiempirical leaf development approach (e.g., [
21,
22,
23]). Specifically, in our approach, the temporal variation of LAI was determined by plant/crop-specific biophysical parameters primarily related to heat units, radiation use efficiency, vapor pressure deficit, maximum canopy height, and input plantation and harvest schedules (for row crops) [
21,
22,
23,
24,
25]. The spatial variation of these parameters, hence the spatial variability of LAI, was directly linked with the level of specificity in model’s land use representation [
7]. Therefore, the best possible spatial representation of LAI in the basic model configuration was at the subbasin level.
2.2. The LAI-Assimilation Model
The LAI-assimilation model was the same as the basic model configuration, except with the integration of MODIS LAI data across the watershed. Following a well-accepted approach of earth data processing for watershed-scale modeling and analyses [
7,
8,
9], we constructed representative LAI time series from the original gridded MODIS product. Briefly, we applied a recently developed semiautomatic web-based tool [
51] to account for the heterogeneity in size, shape, and locations of any number of subbasins. The ~500-m-gridded four-day total LAI data (
Table 1) [
31] were first georeferenced and then assigned to respective subbasin(s) by taking an area-weighted average value from encompassing and/or intersecting MODIS grid(s). In the subsequent step, we performed a linear temporal interpolation (by averaging) [
28] to transform the subbasin-level four-day total LAI values into a continuous daily LAI time series. We assimilated this spatially distributed and temporally continuous LAI data into our basic model configuration using a new SWAT source code. In each instance of simulation (days), our new source code applied the
direct insertion approach (e.g., [
24,
25,
30,
52]) to replace the simulated LAI (
Section 2.1) across all subbasins and all simulation time steps with the corresponding MODIS LAI data.
The new SWAT source code is an advanced version of the source code developed by Rajib et al. [
7], which was initially equipped with the capacity to assimilate remotely sensed potential evapotranspiration (PET) data. In this study, we added additional features to this source code so that it was capable of assimilating both PET and LAI data simultaneously, thus allowing a holistic approach for improving SWAT’s process representations. To the best of authors’ knowledge, this is the first SWAT source-code with such functionalities.
2.3. Model Calibration
We followed an identical calibration protocol for both model configurations (basic and LAI- assimilation). The calibration length was four years (2009–2012), with a prior two-year initialization of model state (2007–2008). Both configurations were calibrated at daily time step, first using streamflow data at five gage stations throughout the watershed and then using nitrate load data at the watershed outlet (
Figure 1b;
Table 1). It was feasible to conduct our water quality calibration at daily time step because of the abundant in-situ nutrient concentration data in CRW [
53,
54]. To ensure compatibility with model simulations, we converted nitrate concentrations (mg/L) into nitrate loads (kg/day) using the corresponding daily average streamflow data (m
3/s) as a multiplier (i.e., load = concentration × streamflow × 3600 × 24/1000) [
3].
Both basic and LAI assimilation models used an identical set of 42 calibration parameters. We selected these calibration parameters from our previous CRW study [
3]. These parameters represent hydrology and water quality (
Table A1 and
Table A2, respectively) while being indirectly related to vegetation dynamics. Although SWAT is relatively well equipped with many biophysical parameters and process conceptualizations that directly influence vegetation dynamics (e.g., maximum root depth, radiation use efficiency, stomatal conductance, day/nighttime thresholds of vapor pressure deficit) [
7,
55], other advanced watershed models are not [
23,
24,
56]. Therefore, we excluded such parameters from our calibration [unlike previous studies, (e.g., [
8,
13]) to isolate the effects of MODIS LAI assimilation into the model and to maintain focus on our study’s hypothesis that assimilated LAI data can serve as a proxy for vegetation dynamics when limited (or completely absent) vegetation-related parameters and processes are available in watershed models.
We used Sequential Uncertainty Fitting version 2 (SUFI-2), which is a semiautomated calibration algorithm available inside SWAT-CUP calibration platform [
57]. A weighted Kling-Gupta Efficiency (KGE) [
8,
58] was the objective function to measure the association between daily simulated and measured data at the stream gages. KGE decomposes Nash-Sutcliffe Efficiency and Mean Squared Error into a three-dimensional criteria space and finds out a Pareto front in terms of the shortest Euclidean distance [
8,
9]. KGE ranges from −∞ to 1. Model accuracy increases as KGE value moves closer to 1. To find the best parameter combination and hence the most optimal model state, SUFI-2 seeks the highest possible KGE value across all calibration locations.
2.4. Model Verification
We performed a series of tasks to assess the relative improvement of simulation accuracy between the two model configurations (with and without MODIS LAI data assimilation). These verification tasks include the following:
- (1)
Assessments of hydrologic simulation:
We assessed the accuracy of daily streamflow simulation for a five-year period (2013–2017) and at a separate gage station not included in the calibration process (for model verification) (
Figure 1b).
We then compared simulated LAI with MODIS data to identify how the calibrated models differ in their respective spatiotemporal representation of vegetation dynamics. For evaluating the spatial accuracy in LAI simulation, we considered specific day(s) in the 2016 crop growing season (June–August). We focused on crop growing season because this is the most active period in terms of vegetation growth and associated water-energy exchanges. We selected 2016 for temporal evaluation because this was a year with average precipitation, hence representative of the watershed’s general hydrologic response.
To further assess improvements in hydrologic simulation, we compared the spatial consistency of soil moisture between model simulations and Soil Moisture Active Passive (SMAP) mission level 4 estimates (
Table 1). SMAP mission provides 9-km gridded estimates of rootzone soil moisture, which is a model-assimilated product of remotely sensed surface moisture observations [
50]. Briefly, we produced subbasin-level aggregated SMAP data using the same web-based tool and areal averaging procedure described in
Section 2.2, took the daily average of rootzone volumetric moisture content over the 2016 crop growing season (June–August), and subsequently generated a watershed soil moisture distribution map to enable spatial comparison with model outputs. We considered the same period for assessment of LAI and soil moisture as it would produce a consistent comparison across two mutually dependent processes. Furthermore, selecting summer months ensured that SMAP data used in our model assessments were not affected by snow cover and frozen soil—the common sources of inaccuracies in remotely sensed soil moisture [
59].
- (2)
Assessment of water quality simulation:
We assessed the accuracy of daily nitrate load simulation for the same five-year period and at the same location used for streamflow verification (
Figure 1b). To the best of our knowledge, this is the first study to verify the effect of LAI data assimilation on water quality simulations and that using long-term daily observations as the reference.
5. Summary
We applied a remote sensing-integrated watershed modeling approach assimilating MODIS LAI data across the 16,800-km2 Cedar River Watershed in Iowa, United States. We compared the two models, i.e., LAI assimilation configuration (with MODIS data) and basic configuration (with the model’s default semiempirical LAI representation), over a nine-year period and concluded the following:
The basic model gave right answers for wrong reasons, with reasonably good daily streamflow simulation despite a large bias in LAI. The accuracy of daily streamflow simulation improved throughout the nine-year period. However, this improvement was significant during medium-to-low flow conditions.
The LAI assimilation model adopted a physically realistic water balance by increasing rootzone soil moisture storage, therefore improving the model’s spatial consistency with reference estimates (from the SMAP satellite mission).
Assimilation of LAI data into our watershed model substantially improved nitrate load simulations, reproducing long-term in-situ observations at a daily timescale. Our study is the first to show such an effect.
We also addressed conceptual and technical challenges associated with LAI assimilation in hydrology and water quality modeling. Conceptually, we disentangled how overestimation or underestimation of LAI cascades through watershed processes and how the model responds to it by partitioning hydrologic fluxes and nutrient pools. From a technical standpoint, we highlighted that high-quality remotely sensed global LAI datasets are becoming increasingly available. However, widespread community adoption of these emerging data resources will not be feasible without minimizing the lack of interoperability between remotely sensed big data and complex watershed models. By making SWAT model interoperable with a semiautomatic earth data processing tool, our study demonstrated how assimilation of LAI data in watershed models can be done efficiently regardless of watershed size and location. Our next step is to apply this efficient big data-driven modeling approach for watershed management decisions in various geophysical settings.