Watershed Modeling with Remotely Sensed Big Data: MODIS Leaf Area Index Improves Hydrology and Water Quality Predictions

: Traditional watershed modeling often overlooks the role of vegetation dynamics. There is also little quantitative evidence to suggest that increased physical realism of vegetation dynamics in process-based models improves hydrology and water quality predictions simultaneously. In this study, we applied a modiﬁed Soil and Water Assessment Tool (SWAT) to quantify the extent of improvements that the assimilation of remotely sensed Leaf Area Index (LAI) would convey to streamﬂow, soil moisture, and nitrate load simulations across a 16,860 km 2 agricultural watershed in the midwestern United States. We modiﬁed the SWAT source code to automatically override the model’s built-in semiempirical LAI with spatially distributed and temporally continuous estimates from Moderate Resolution Imaging Spectroradiometer (MODIS). Compared to a “basic” traditional model with limited spatial information, our LAI assimilation model (i) signiﬁcantly improved daily streamﬂow simulations during medium-to-low ﬂow conditions, (ii) provided realistic spatial distributions of growing season soil moisture, and (iii) substantially reproduced the long-term observed variability of daily nitrate loads. Further analysis revealed that the overestimation or underestimation of LAI imparted a proportional cascading e ﬀ ect on how the model partitions hydrologic ﬂuxes and nutrient pools. As such, assimilation of MODIS LAI data corrected the model’s LAI overestimation tendency, which led to a proportionally increased rootzone soil moisture and decreased plant nitrogen uptake. With these new ﬁndings, our study ﬁlls the existing knowledge gap regarding vegetation dynamics in watershed modeling and conﬁrms that assimilation of MODIS LAI data in watershed models can e ﬀ ectively improve both hydrology and water quality predictions.


Introduction
The big data evolution in recent years has paved the way for remote sensing-integrated watershed modeling. To limit models' perceived tendency to give the "right answers for wrong reasons" [1,2], (1) Previous studies were predominantly focused on hydrologic processes [24,25,[27][28][29]. Although studies involving water quality simulations evaluated the potential effect of improved LAI on sediment yield [30], the extent of such effects on nutrient (e.g., nitrate) loads-the most critical issue in agricultural watersheds from water quality management standpoint-is still unknown. (2) Evaluations conducted in the previous hydrologic studies were mostly limited to vertical water flux and storage simulations (e.g., evapotranspiration and soil moisture) [24,[27][28][29], with very little emphasis on the watersheds' cumulative response to downstream waters (i.e., streamflow) [25,30].
In this study, we intended to fill these knowledge gaps by assimilating Moderate Resolution Imaging Spectroradiometer LAI data (MODIS MCD15A3H) [31] into the Soil and Water Assessment Tool (SWAT) [21]. The specific objective of this study was to quantify the extent of improvements that the assimilation of MODIS LAI data would convey to streamflow, soil moisture, and nitrate load simulations at a daily timescale. Another unique contribution of our study was a new, highly efficient SWAT source code, which can perform multisource, multivariate assimilation of remotely sensed data regardless of watershed sizes and geolocations.

Methodology
We developed two contrasting model configurations to evaluate the effect assimilating MODIS LAI data on simulated streamflow, soil moisture, and nitrate load: Remote Sens. 2020, 12, 2148 3 of 17 (1) The basic model: LAI was simulated by the model based on input land use and associated biophysical parameters-a common approach in watershed modeling. This was our baseline to measure the degree of improvement in model results in the subsequent configuration. (2) The LAI assimilation model: The same setup as in the basic model, except MODIS LAI was directly inserted by replacing the simulated LAI values in each of the spatial units of the model (e.g., hydrologic response units or subbasins).
Our model testbed, the 16,860-km 2 Cedar River Watershed (CRW; Figure 1), drains a major portion of eastern Iowa in the United States ( Figure 1). This entire region has experienced increased flood frequency in the last two decades [32] as a result of climate and land use changes [33,34]. Studies have shown that this altered hydrologic dynamic is likely to continue in future years [35,36]. Furthermore, with more than 70% of the land area used for agricultural purposes (Figure 1a), CRW has long been noted as a water quality hotspot exporting large nitrate loads to the Mississippi River system [3,37]. Considering these challenges, CRW is frequently used as a testbed in watershed modeling and management studies [3,35,[37][38][39].
Remote Sens. 2020, 12, x FOR PEER REVIEW 3 of 18 (1) The basic model: LAI was simulated by the model based on input land use and associated biophysical parameters-a common approach in watershed modeling. This was our baseline to measure the degree of improvement in model results in the subsequent configuration. (2) The LAI assimilation model: The same setup as in the basic model, except MODIS LAI was directly inserted by replacing the simulated LAI values in each of the spatial units of the model (e.g., hydrologic response units or subbasins).
Our model testbed, the 16,860-km 2 Cedar River Watershed (CRW; Figure 1), drains a major portion of eastern Iowa in the United States ( Figure 1). This entire region has experienced increased flood frequency in the last two decades [32] as a result of climate and land use changes [33,34]. Studies have shown that this altered hydrologic dynamic is likely to continue in future years [35,36]. Furthermore, with more than 70% of the land area used for agricultural purposes (Figure 1a), CRW has long been noted as a water quality hotspot exporting large nitrate loads to the Mississippi River system [3,37]. Considering these challenges, CRW is frequently used as a testbed in watershed modeling and management studies [3,35,[37][38][39]. The corresponding station IDs are as follows: C1 = 05464500, C2 = 05458900, C3 = 05462000, C4 = 05458000, C5 = 05459500, V1 = 05464420. Streamflow and nitrate data can be accessed from https://waterdata.usgs.gov/nwis using respective stations IDs.
Our modeling experiments were conducted using SWAT [21], which is a process-based, semidistributed tool capable of simulating landscape water balances, floods and droughts, nonpoint source water pollution, crop yield, and best management practices across different physiographic settings (see, e.g., [5,36,[40][41][42][43]). Correspondingly, many previous studies addressing the heterogeneous water issues in Iowa watersheds applied SWAT as their primary research tool [3,35,[37][38][39]44]. SWAT has also been used as the simulation tool in many data assimilation experiments (e.g., [7,8]). Therefore, our SWAT-based analyses to understand the effect of assimilating MODIS LAI data on hydrology and water quality simulations will have immediate applications-both to CRW and other agricultural watersheds worldwide.
In the following subsections, we describe the data and method used to set up the two contrasting model configurations (with and without MODIS LAI data assimilation) and the model calibration-verification procedure.

The Basic Model
The basic model configuration was based on the CRW model originally developed by Golden et al. [3]. The geospatial inputs, primarily weather, topography, land use, soil properties, and agricultural management data, required to setup the CRW model are summarized in Table 1. In short, a topographic analysis of flow direction and flow accumulation divided the CRW into 95 subbasins, with an average drainage area of~176 km 2 . The original CRW model by Golden et al. [3] discretized the subbasins into 1860 Hydrologic Response Units (HRUs). However, to reflect the common modeling issues of limited data availability, model infidelity, and computational liability (e.g., [5,7,45]), we spatially discretized the CRW model at the subbasin level, meaning that subbasins were the smallest units of process simulation in our modeling experiments. As a result, each of the 95 subbasins functioned like a single HRU (hence, total 95 HRUs). This approach aggregated land use input data and spatially explicit information on crop types to the subbasin level, and therefore allowed us to assess the full potential of MODIS LAI data for minimizing these deficiencies in the LAI assimilation model. Subsurface drainage Tile drains were assigned on low slope (<2%) with row crops and poorly drained soils (hydrologic soil group D) [3,37,38] Agricultural management operations (a) Timing of plantation and harvest; (b) timing, type, rate, and place of fertilizer applications; (c) corn-soybean annual rotation [3,37] Precipitation Total daily precipitation from 1-km gridded DAYMET product [49] Energy-related weather forcing Minimum and maximum daily temperature from 1-km gridded DAYMET product [49]; solar radiation, wind speed and relative humidity from the historical weather generator within the SWAT source-code [21]; potential evapotranspiration (PET) estimated with built-in Penman-Monteith equation [21] Streamflow and nitrate data Six U.S. Geological Survey (USGS) gage stations; five for model calibration and one for model verification ( Figure 1b) Rootzone (100 cm) soil moisture data Soil Moisture Active Passive (SMAP) mission 3-hourly L4 global 9-km rootzone soil moisture; used for model verification [50] LAI data MODIS MCD15A3H LAI/FPAR four-day L4 global 500 m v006 [31]; assimilated in the model replacing the model's default LAI values Representation of landscape vegetation dynamics in our basic model configuration was based on a commonly followed semiempirical leaf development approach (e.g., [21][22][23]). Specifically, in our approach, the temporal variation of LAI was determined by plant/crop-specific biophysical parameters primarily related to heat units, radiation use efficiency, vapor pressure deficit, maximum canopy height, and input plantation and harvest schedules (for row crops) [21][22][23][24][25]. The spatial variation of these parameters, hence the spatial variability of LAI, was directly linked with the level of specificity in model's land use representation [7]. Therefore, the best possible spatial representation of LAI in the basic model configuration was at the subbasin level.

The LAI-Assimilation Model
The LAI-assimilation model was the same as the basic model configuration, except with the integration of MODIS LAI data across the watershed. Following a well-accepted approach of earth data processing for watershed-scale modeling and analyses [7][8][9], we constructed representative LAI Remote Sens. 2020, 12, 2148 5 of 17 time series from the original gridded MODIS product. Briefly, we applied a recently developed semiautomatic web-based tool [51] to account for the heterogeneity in size, shape, and locations of any number of subbasins. The~500-m-gridded four-day total LAI data (Table 1) [31] were first georeferenced and then assigned to respective subbasin(s) by taking an area-weighted average value from encompassing and/or intersecting MODIS grid(s). In the subsequent step, we performed a linear temporal interpolation (by averaging) [28] to transform the subbasin-level four-day total LAI values into a continuous daily LAI time series. We assimilated this spatially distributed and temporally continuous LAI data into our basic model configuration using a new SWAT source code. In each instance of simulation (days), our new source code applied the direct insertion approach (e.g., [24,25,30,52]) to replace the simulated LAI (Section 2.1) across all subbasins and all simulation time steps with the corresponding MODIS LAI data.
The new SWAT source code is an advanced version of the source code developed by Rajib et al. [7], which was initially equipped with the capacity to assimilate remotely sensed potential evapotranspiration (PET) data. In this study, we added additional features to this source code so that it was capable of assimilating both PET and LAI data simultaneously, thus allowing a holistic approach for improving SWAT's process representations. To the best of authors' knowledge, this is the first SWAT source-code with such functionalities.

Model Calibration
We followed an identical calibration protocol for both model configurations (basic and LAIassimilation). The calibration length was four years (2009-2012), with a prior two-year initialization of model state (2007)(2008). Both configurations were calibrated at daily time step, first using streamflow data at five gage stations throughout the watershed and then using nitrate load data at the watershed outlet ( Figure 1b; Table 1). It was feasible to conduct our water quality calibration at daily time step because of the abundant in-situ nutrient concentration data in CRW [53,54].
Both basic and LAI assimilation models used an identical set of 42 calibration parameters. We selected these calibration parameters from our previous CRW study [3]. These parameters represent hydrology and water quality (Tables A1 and A2, respectively) while being indirectly related to vegetation dynamics. Although SWAT is relatively well equipped with many biophysical parameters and process conceptualizations that directly influence vegetation dynamics (e.g., maximum root depth, radiation use efficiency, stomatal conductance, day/nighttime thresholds of vapor pressure deficit) [7,55], other advanced watershed models are not [23,24,56]. Therefore, we excluded such parameters from our calibration [unlike previous studies, (e.g., [8,13]) to isolate the effects of MODIS LAI assimilation into the model and to maintain focus on our study's hypothesis that assimilated LAI data can serve as a proxy for vegetation dynamics when limited (or completely absent) vegetation-related parameters and processes are available in watershed models.
We used Sequential Uncertainty Fitting version 2 (SUFI-2), which is a semiautomated calibration algorithm available inside SWAT-CUP calibration platform [57]. A weighted Kling-Gupta Efficiency (KGE) [8,58] was the objective function to measure the association between daily simulated and measured data at the stream gages. KGE decomposes Nash-Sutcliffe Efficiency and Mean Squared Error into a three-dimensional criteria space and finds out a Pareto front in terms of the shortest Euclidean distance [8,9]. KGE ranges from −∞ to 1. Model accuracy increases as KGE value moves closer to 1. To find the best parameter combination and hence the most optimal model state, SUFI-2 seeks the highest possible KGE value across all calibration locations.

Model Verification
We performed a series of tasks to assess the relative improvement of simulation accuracy between the two model configurations (with and without MODIS LAI data assimilation). These verification tasks include the following: (1) Assessments of hydrologic simulation: i.
We assessed the accuracy of daily streamflow simulation for a five-year period (2013-2017) and at a separate gage station not included in the calibration process (for model verification) (Figure 1b). ii.
We then compared simulated LAI with MODIS data to identify how the calibrated models differ in their respective spatiotemporal representation of vegetation dynamics. For evaluating the spatial accuracy in LAI simulation, we considered specific day(s) in the 2016 crop growing season (June-August). We focused on crop growing season because this is the most active period in terms of vegetation growth and associated water-energy exchanges. We selected 2016 for temporal evaluation because this was a year with average precipitation, hence representative of the watershed's general hydrologic response. iii.
To further assess improvements in hydrologic simulation, we compared the spatial consistency of soil moisture between model simulations and Soil Moisture Active Passive (SMAP) mission level 4 estimates (Table 1). SMAP mission provides 9-km gridded estimates of rootzone soil moisture, which is a model-assimilated product of remotely sensed surface moisture observations [50]. Briefly, we produced subbasin-level aggregated SMAP data using the same web-based tool and areal averaging procedure described in Section 2.2, took the daily average of rootzone volumetric moisture content over the 2016 crop growing season (June-August), and subsequently generated a watershed soil moisture distribution map to enable spatial comparison with model outputs. We considered the same period for assessment of LAI and soil moisture as it would produce a consistent comparison across two mutually dependent processes. Furthermore, selecting summer months ensured that SMAP data used in our model assessments were not affected by snow cover and frozen soil-the common sources of inaccuracies in remotely sensed soil moisture [59].
(2) Assessment of water quality simulation: We assessed the accuracy of daily nitrate load simulation for the same five-year period and at the same location used for streamflow verification (Figure 1b). To the best of our knowledge, this is the first study to verify the effect of LAI data assimilation on water quality simulations and that using long-term daily observations as the reference.

Effect of LAI Data Assimilation on Hydrologic Simulation Accuracy
The basic model configuration simulated daily streamflow reasonably well (Figure 2a), with KGE = 0.8 at the watershed outlet and KGE ≥ 0.5 across all other locations (Table A2). However, such accuracy in streamflow simulation was obtained despite a poor representation of landscape vegetation dynamics. For example, simulated LAI in the basic model on a given day of the crop growing season showed little spatial variability across the watershed, differing substantially from the spatial variability of MODIS LAI data (Figure 2b). In addition to an overestimation tendency throughout the late spring and summer crop growing seasons, the temporal variability of MODIS LAI data was largely inconsistent with the basic model (time series comparison in Figure 2b; note thẽ 30-day lag between the respective occurrences of maximum LAI). The LAI assimilation model increased streamflow simulation accuracy, with higher KGE values compared to the basic model at all locations (Table 2). We demonstrate this relative improvement in Figure 3 using the combined exceedance probability of streamflow at two outlet/near-outlet locations (i.e., based on a combined streamflow time series from the outlet calibration location, 2009-2012, and the near-outlet verification location, 2013-2017). Streamflow simulated by the LAI assimilation model was closer to gage data throughout the nine-year calibration-verification period (Figure 3a), with significantly improved accuracy across medium-to-low flow conditions (p < 0.05; Figure 3b). Whereas the LAI assimilation model produced improved streamflow in medium-to-high flow conditions as well, such improvements were not significant (p ≥ 0.05; Figure 3c).
Remote Sens. 2020, 12, x FOR PEER REVIEW 7 of 18 was closer to gage data throughout the nine-year calibration-verification period (Figure 3a), with significantly improved accuracy across medium-to-low flow conditions (p < 0.05; Figure 3b). Whereas the LAI assimilation model produced improved streamflow in medium-to-high flow conditions as well, such improvements were not significant (p ≥ 0.05; Figure 3c).  Figure 1b maps these locations across CRW's stream network. b C1 is at the watershed outlet.   MODIS LAI data altered the model's overall water balance. For example, despite the basic model's reasonable accuracy in streamflow simulations, the spatial pattern of rootzone soil moisture in the model was largely inconsistent with SMAP data (comparing Figure 4a with Figure 4c). Compared to SMAP, this indicated an unrealistically "dry" condition across the watershed. However, after assimilating MODIS LAI data, the basic model showed increased spatial consistency with SMAP data, hence an overall improved water balance (comparing Figure 4b with Figure 4c).  The spatial pattern of rootzone soil moisture ( Figure 4) afforded additional insights into how MODIS LAI data altered the model's overall water balance. For example, despite the basic model's reasonable accuracy in streamflow simulations, the spatial pattern of rootzone soil moisture in the model was largely inconsistent with SMAP data (comparing Figure 4a with Figure 4c). Compared to SMAP, this indicated an unrealistically "dry" condition across the watershed. However, after assimilating MODIS LAI data, the basic model showed increased spatial consistency with SMAP data, hence an overall improved water balance (comparing Figure 4b with Figure 4c).
Remote Sens. 2020, 12, x FOR PEER REVIEW 8 of 18 The spatial pattern of rootzone soil moisture ( Figure 4) afforded additional insights into how MODIS LAI data altered the model's overall water balance. For example, despite the basic model's reasonable accuracy in streamflow simulations, the spatial pattern of rootzone soil moisture in the model was largely inconsistent with SMAP data (comparing Figure 4a with Figure 4c). Compared to SMAP, this indicated an unrealistically "dry" condition across the watershed. However, after assimilating MODIS LAI data, the basic model showed increased spatial consistency with SMAP data, hence an overall improved water balance (comparing Figure 4b with Figure 4c).

Effect of LAI Data Assimilation on Water Quality Simulation Accuracy
The assimilation of LAI data produced considerable improvements in the model's simulated nitrate loads ( Figure 5). First, the LAI assimilation model could better reproduce the temporal variability of observed daily nitrate loads than the basic model (Figure 5a). This was also confirmed by increased KGE values (e.g., 0.20 versus 0. 74 and 0.10 versus 0.41, respectively, during calibration and verification). Second, daily nitrate loads simulated by the basic model showed noticeably large departures from gage nitrate data (1:1 line in Figure 5b). In comparison, the LAI assimilation model exhibited closer adherence to the 1:1 line, suggesting reduced bias in nitrate load simulations. Specifically, the basic model underestimated overall nitrate loads that the watershed conveyed to downstream waters. With improved LAI representation, the model reduced this bias by 47% and 17%, respectively, during the calibration (2009-2012) and verification (2013-2017) periods (Figure 5b).

Effect of LAI Data Assimilation on Water Quality Simulation Accuracy
The assimilation of LAI data produced considerable improvements in the model's simulated nitrate loads ( Figure 5). First, the LAI assimilation model could better reproduce the temporal variability of observed daily nitrate loads than the basic model (Figure 5a). This was also confirmed by increased KGE values (e.g., 0.20 versus 0. 74 and 0.10 versus 0.41, respectively, during calibration and verification). Second, daily nitrate loads simulated by the basic model showed noticeably large departures from gage nitrate data (1:1 line in Figure 5b). In comparison, the LAI assimilation model exhibited closer adherence to the 1:1 line, suggesting reduced bias in nitrate load simulations. Specifically, the basic model underestimated overall nitrate loads that the watershed conveyed to downstream waters. With improved LAI representation, the model reduced this bias by 47% and 17%, respectively, during the calibration (2009-2012) and verification (2013-2017) periods (Figure 5b).

Discussion
Our results suggest that traditional watershed modeling practices overlooking the physical realism of vegetation dynamics inadequately represent landscape water balance and nutrient loads at the daily timescale. Whereas the assimilation of remotely sensed LAI data can considerably improve hydrology and water quality simulations, three key questions remain: (1) What is the link between improved LAI data and improved model predictions? (2) Which LAI datasets are appropriate for watershed-scale modeling? and (3) What modeling factors are important to make LAI data assimilation efficient? We address these questions below.

What Is the Link between Improved LAI Data and Improved Model Predictions?
Although calibration results may indicate a high-performing model, misrepresentation of LAI drove the model state to a direction where the simulated water balance was not physically meaningful. Figure 6 explains this "pulling of model-state" [8,60] between the basic and LAI assimilation model configurations. Clearly, compared to the LAI assimilation model, higher LAI values in the basic model resulted into proportionally higher ET and lower soil moisture. The overestimation of ET was logical because higher LAI values mean lower canopy resistance (see Penman-Monteith equation in [21]), which subsequently improves the efficiency of aerodynamic processes by lowering the threshold for vapor and momentum transfers. Higher ET further indicates more water being depleted from the rootzone storage (Figure 6c), hence an unrealistic dry condition across the watershed (e.g., Figure 4). A dry rootzone explains why the basic model showed a significantly underestimated baseflow in the rivers (Figure 3a,b).

Which LAI Datasets Are Appropriate for Watershed-Scale Modeling?
Various multisource "fusion" LAI datasets have emerged in recent years, primarily combining the Advanced Very High-Resolution Radiometer (AVHRR) products with machine learning algorithms (e.g., AVH15C1, LAI3g, and GLASS) [61][62][63][64]. Some of these relatively new LAI datasets are available in daily timescale with short latency from present time (e.g., AVH15C1). Despite the potential high temporal resolution, almost all these datasets are available at coarse spatial resolutions  Higher LAI values in the basic model also led to higher plant nitrogen uptake, although nutrient input (i.e., fertilization; Table 1) was identical across the two model configurations. High nitrogen consumption by the vegetation resulted in smaller volumes of mineral nitrogen on the land surface, which caused the basic model to substantially (and inaccurately) underestimate nitrate loads in the rivers. The LAI assimilation model, despite being calibrated with the same reference datasets as the basic model, pulled the model state toward improved accuracy by decreasing ET, increasing soil moisture, and decreasing plant nitrogen uptake.

Which LAI Datasets Are Appropriate for Watershed-Scale Modeling?
Various multisource "fusion" LAI datasets have emerged in recent years, primarily combining the Advanced Very High-Resolution Radiometer (AVHRR) products with machine learning algorithms (e.g., AVH15C1, LAI3g, and GLASS) [61][62][63][64]. Some of these relatively new LAI datasets are available in daily timescale with short latency from present time (e.g., AVH15C1). Despite the potential high temporal resolution, almost all these datasets are available at coarse spatial resolutions (e.g.,~5 km for AVH15C1, LAI3g, and GLASS). Furthermore, studies that have assessed the quality of these relatively new datasets are still very limited. Whereas these emergent big data offer new opportunities for improved hydrology and water quality modeling at continental/global scales [28], the use of MODIS MCD15A3H LAI data [31] seems suitable for watershed-scale modeling given the reasonably high spatiotemporal resolution (~500 m; four-day total), continuous algorithm improvisations [14,65], and more importantly, well-documented quality assessments [14,[66][67][68]].

What Modeling Factors Are Important to Make LAI Data Assimilation Efficient?
The efficiency of any remote sensing-integrated watershed modeling approach largely depends on the interoperability between the model and remotely sensed data [69]. This interoperability is essentially a "big data problem" [7], defined in terms of the degree of automation in two fundamental tasks: (i) Processing a dataset such that it is fully compatible with a model's geodatabase architecture, and (ii) seamlessly assimilating the processed dataset in spatiotemporal simulations. Previous studies which have showed the potential of assimilating remotely sensed LAI data in watershed models (e.g., [29,30]) did not exclusively address how these two tasks can be conducted in an interoperable (hence, efficient) way. We addressed this problem by developing a new model source code to perform task (ii) and by supplementing the source code with an open access companion tool [51] to perform task (i) (see Section 2.2). Thus, we implemented an interoperable MODIS-SWAT linkage, which made our approach highly efficient and reproducible regardless of watershed sizes and geolocations. Using our approach, SWAT modelers can efficiently assimilate MODIS LAI data without going through the learning curve of big data processing and model source code programming.

Summary
We applied a remote sensing-integrated watershed modeling approach assimilating MODIS LAI data across the 16,800-km 2 Cedar River Watershed in Iowa, United States. We compared the two models, i.e., LAI assimilation configuration (with MODIS data) and basic configuration (with the model's default semiempirical LAI representation), over a nine-year period and concluded the following: 1.
The basic model gave right answers for wrong reasons, with reasonably good daily streamflow simulation despite a large bias in LAI. The accuracy of daily streamflow simulation improved throughout the nine-year period. However, this improvement was significant during medium-to-low flow conditions.

2.
The LAI assimilation model adopted a physically realistic water balance by increasing rootzone soil moisture storage, therefore improving the model's spatial consistency with reference estimates (from the SMAP satellite mission).

3.
Assimilation of LAI data into our watershed model substantially improved nitrate load simulations, reproducing long-term in-situ observations at a daily timescale. Our study is the first to show such an effect.
We also addressed conceptual and technical challenges associated with LAI assimilation in hydrology and water quality modeling. Conceptually, we disentangled how overestimation or underestimation of LAI cascades through watershed processes and how the model responds to it by partitioning hydrologic fluxes and nutrient pools. From a technical standpoint, we highlighted that high-quality remotely sensed global LAI datasets are becoming increasingly available. However, widespread community adoption of these emerging data resources will not be feasible without minimizing the lack of interoperability between remotely sensed big data and complex watershed models. By making SWAT model interoperable with a semiautomatic earth data processing tool, our study demonstrated how assimilation of LAI data in watershed models can be done efficiently regardless of watershed size and location. Our next step is to apply this efficient big data-driven modeling approach for watershed management decisions in various geophysical settings. Software Availability: The modified SWAT source-code is available upon request.
Funding: This research was partially funded by the early career faculty startup grant from the Texas A&M University, Kingsville, TX, USA.

Conflicts of Interest:
The authors declare no conflict of interest. a Each parameter, except CN2, SOL_AWC, SOL_K and GW_DELAY, was iterated such that the original value was replaced by a value from respective initial ranges. During every iteration, a value from GW_DELAY's initial range was added to its original value; original value of CN2, SOL_AWC and SOL_K was multiplied by an adjustment factor (1 + a value from the initial range). b See Neitsch et al. [21] for detail description of these parameters. c Selection of parameters and their initial ranges were based on our previous study [3]. d Based on the calibration results of Hutchinson and Christiansen [37]; the value was assigned directly in the model without further iteration.  Figure 1b) provide in-situ estimates of nitrate plus nitrite as nitrogen (NO 3 -N + NO 2 -N). These estimates mainly represented NO 3 in the stream because NO 2 was not the dominant N species. See Golden et al. [3]. b Each parameter was iterated such that the original value was replaced by a value from respective initial ranges. See Neitsch et al. [21] for detail description of these parameters. c Selection of parameters and their initial ranges were based on our previous study [3]. d These parameters are related to sediment and phosphorus. Calibration of these parameters was not possible because of the unavailability of observed/estimated datasets. The values listed above were obtained from the calibration results of Hutchinson and Christiansen [37] and Wu and Liu [38] for a much larger watershed which included our study area. These values were therefore assigned directly in our Cedar River Watershed model without further iteration.