A Web-based Tool to Estimate Pollutant Loading Using Loadest

Collecting and analyzing water quality samples is costly and typically requires significant effort compared to streamflow data, thus water quality data are typically collected at a low frequency. Regression models, identifying a relationship between streamflow and water quality data, are often used to estimate pollutant loads. A web-based tool using LOAD ESTimator (LOADEST) as a core engine with four modules was developed to provide user-friendly interfaces and input data collection via web access. The first module requests and receives streamflow and water quality data from the U.S. Geological Survey. The second module retrieves watershed area for computation of pollutant loads per unit area. The third module examines potential error of input datasets for LOADEST runs, and the last module computes estimated and allowable annual average pollutant loads and provides tabular and graphical LOADEST outputs. The web-based tool was applied to two watersheds in this study, one agriculturally-dominated and one urban-dominated. It was found that annual sediment load at the urban-dominant watershed exceeded the target load; therefore, the web-based tool identified correctly the watershed requiring best management practices to reduce pollutant loads.


Introduction
Total Maximum Daily Load (TMDL) is a governmental regulation designed to preserve and protect water quality of waterbodies in the USA. The Clean Water Act indicates that the states and the authorities need to establish priority rankings and to develop TMDL plans to improve water quality of contaminated streams and rivers. TMDL planning is composed of the identification of pollutant sources, watershed monitoring, and an effort to reduce pollutant sources to meet water quality target [1][2][3]. Thus, it is fundamental process for TMDL planning to estimate existing pollutant loads in a watershed and to define allowable pollutant loads, since it is required to identify the required reduction needed to meet water quality targets. To estimate pollutant loads, water quality samples would ideally have an identical temporal resolution to the streamflow data, but water quality samples are typically collected less frequently than streamflow because of collection and analysis costs.
A wide range of approaches are used to estimate pollutant loads, including watershed models. One of the benefits of using watershed models is that they allow consideration of the specific characteristics or conditions of a watershed, using temporal and spatial data. However, this is also somewhat of a disadvantage, because such models require a wide range of inputs and expertise for the input preparation and model runs [1,4].
Another approach uses regression models to identify a relationship between streamflow and water quality data and to estimate water quality data on days for which water quality samples are not available. Regression models evolved from simple linear relationships between streamflow and water quality data and now can use logarithmic transformations to estimate water quality concentrations (e.g., mg/L) or loads (e.g., kg) [5][6][7]. Furthermore, regression models provided acceptable pollutant load estimates with biweekly or monthly with storm chasing water quality data [8][9][10], although regression results should be used with caution on smaller watersheds [11]. LOAD ESTimator (LOADEST) [12] is software to estimate pollutant loads by regression models using streamflow, water quality concentration data, and regression model coefficients. LOADEST has been widely used for various water quality parameters with various sampling frequencies [1,[13][14][15][16][17]. For instance, LOADEST was used monthly sampling for mercury [14], bimonthly sampling for suspended sediment [15], monthly sampling of total nitrogen and total phosphorus [16].
LOADEST has eleven regression models (Equations (1)-(11)), and the regression model coefficients are calibrated by three statistical methods: adjusted maximum likelihood estimation (AMLE), maximum likelihood estimation (MLE), and least absolute deviation (LAD) [12]. LOADEST provides three pollutant load estimations. The estimated pollutant loads by AMLE and MLE can be used when the residuals (or model error) follow a normal distribution, and both methods allow use of water quality datasets with censored data. The LAD method assumes that model errors are identically and independently distributed random variables. Park and Engel [17] and Park [1] applied the regression models of LOADEST to estimate annual nitrogen, phosphorus, and sediment load estimations from 21, 69, and 211 water quality datasets, respectively. LOADEST displayed less than 10% error in annual phosphorus and sediment load estimations when water quality datasets include approximately 30% storm samples. Furthermore, the regression models numbered 1, 3, 4, and 7 in LOADEST provided smaller differences relative to measured loads compared to the regression models numbered 2, 5, 6, 8, and 9 in LOADEST.
where, a0-6 are coefficients; Q is streamflow, dtime is decimal time; and per is the period defined by the user.
LOADEST is an efficient software to estimate pollutant loads and requires only streamflow and water quality data as model inputs, but it often requires significant effort to prepare the model inputs (i.e., streamflow and water quality data) and to handle input data format for the software runs. Thus, the objective of the study was to develop a web-based tool using LOADEST as a core engine to: (1) estimate pollutant loads associated with streamflow and (2) provide streamflow and water quality data retrieval via web access.

Materials and Methods
A web-based tool was developed in this study, providing user-friendly interfaces and input data collection via web access (Web-based Load Calculation using LOADEST; LOADEST WEB) [18] ( Figure 1). Three modules were developed to collect and to handle input data, and one module was imported and modified from another web-based tool (Long-Term Hydrology Impact Analysis Tool; L-THIA) [19] to retrieve watershed area ( Figure 2).
Streamflow and water quality data required to run LOADEST can be input by users, or obtained from the U.S. Geological Survey (USGS) by the first module. The datasets can be prepared by the user or can be collected from USGS through web access (USGS Water-Quality Data for the Nation) [20]. The module provides a Google map interface displaying USGS stations in all U.S. states. The module requests and receives streamflow and water quality data from the USGS server by the station number of which the user finds and selects the USGS station of interest on the Google map. Then, the user selects one of the water quality parameters, since the datasets from USGS are for many water quality parameters. Once the user selects the water quality parameter, the module extracts the water quality data and makes the dataset available for use in LOADEST WEB without data formatting. The third module examines potential error of input datasets for LOADEST runs. LOADEST requires both streamflow and water quality data to calibrate the regression model coefficients, and uses logarithm terms. Therefore, streamflow data must be given for the date at which water quality data are given, and streamflow cannot be a negative value. Such errors occur occasionally due to user typographical mistakes. In addition, the USGS datasets may have missing data or negative values for streamflow due to dramatic changes in atmospheric pressure or by a sudden drop in the wind speed [21]. The module examines both streamflow and water quality datasets and allows correcting the error manually or automatically. The Output Module computes estimated average annual load and allowable average annual load (i.e., target load), and provides user-friendly table and graphical interfaces from LOADEST outputs ( Figure 3). The module computes the estimated average annual load from the IND file which is one of LOADEST outputs and which is estimated daily loads, and computes the allowable daily load from target concentration (i.e., water quality target) and daily streamflow. The module computes estimated and allowable average annual loads per unit area dividing the estimate and allowable loads by watershed area. In addition, the module allows downloading all input and output files of LOADEST.

Demonstration of USGS Data Retrievals and Input Error Checking
One of the benefits of using the web-based tool is that the web-based tool examines potential error of input datasets for LOADEST runs. A USGS station was selected to demonstrate use of the Input Error Examining Module. The USGS station 040851385 is located at Brown County ( Figure 4). only date for missing data; therefore, the dates without streamflow values needed to be removed before LOADEST runs.
Two options for negative streamflow data are given by the module, one is to convert the negative value to positive value if it is typographical error. The other is to remove the streamflow data, in other words, not to use the streamflow data for LOADEST run. It is suggested that the negative streamflow data be removed if the data was from USGS gage station, instead of converting the data into positive values for reasons expressed in [21].

Application for Average Annual Sediment Load Reduction
Two watersheds were selected to demonstrate sediment load estimations using LOADEST WEB, Fall Creek near Fortville (FCF; USGS Station Number 03351500) and the Little Buck Creek near Indianapolis (LBC; USGS Station Number 03353637) watersheds in Indiana ( Figure 5). The FCF watershed area was 438 km 2 , with 74% (326 km 2 ) of the watershed land use being agricultural area ( Table 1). The LBC watershed area was 52 km 2 , with 90% (47 km 2 ) of the watershed land use being urban area (Table 1). Streamflow and sediment data ('Suspended sediment concentration, milligrams per liter', USGS water quality parameter code: 80154) were retrieved from USGS server via web access by LOADEST WEB. Sediment data from 27 February 2007 to 7 December 2009 for USGS station 03351500 and sediment data from 6 January 2000 to 7 September 2004 for 03353637 with streamflow data from 1 January 2000 to 31 December 2013 were used to estimate average annual sediment loads. The water quality target was set to 80.0 mg/L [22]. Park and Engel [17] suggested to select the regression model numbered 3 in LOADEST for average annual sediment load estimations, since the regression model provided the most accurate and precise sediment load estimates. Therefore, the regression model 3 was selected for sediment load estimations. Daily sediment load predictions by LOADEST and observed data at USGS station 03351500 had Nash-Sutcliffe efficiency (NSE) of 0.84 and coefficient of determination (R 2 ) of 0.88. And, NSE and R 2 at USGS station 03353637 were 0.51 and 0.97.  Estimated average annual sediment load in the FCF watershed was 12.6 × 10 9 kg/year (or 288.6 kg/ha/year), while target average annual sediment load (or allowable load) was 17.2 × 10 9 kg/year (or 391.8 kg/ha/year); therefore, the required reduction for the FCF watershed was 0.0% (Table 2). However, estimated average annual sediment load in the LBC watershed was 2.5 × 10 9 kg/year (or 487.3 kg/ha/year), while target average annual sediment load (or allowable load) was 1.8 × 10 9 kg/year (or 336.9 kg/ha/year); therefore, the required reduction for the LBC watershed was 30.9% ( Table 2).
The average annual sediment load in the FCF watershed did not exceed water quality target; in other words, existing sediment load in the watershed already met the water quality target. However, the average annual sediment load in the LBC watershed exceeded the water quality target; therefore, best management practices (BMPs) to reduce sediment load are required for the watershed.

Conclusions
Typically, TMDL plans include the identification of pollutant sources, watershed monitoring, and reduction of pollutant sources to meet water quality target. Various approaches or models are used to estimate pollutant loads, and LOADEST is one widely used approach. However, LOADEST often requires considerable effort to prepare and format model inputs.
A web-based tool using LOADEST as a core engine was developed in this study to integrate four modules and provide user-friendly interfaces and input data collection via web access. The first and second modules allow preparing LOADEST model inputs (i.e., streamflow and water quality data) from USGS server via web access and watershed area, and both modules provide a Google Maps interface. The third module examines inputs to reduce potential errors. Finally, the fourth module provides user-friendly tabular and graphical interfaces from LOADEST outputs. These modules facilitate the use of LOADEST by providing a user-friendly interface. In addition, the tool does not require installation on users' computers since it runs on the web server. In this study, the module to examine input data was demonstrated, and the web-based tool was applied to two watersheds for the required sediment reduction to meet water quality target.
Currently, the web-based tool provides streamflow and water quality data retrievals from USGS; however, the National Water Quality Monitoring Council [23] provides use of water quality data from the U.S. Environmental Protection Agency Storage and Retrieval (EPA STORET). Therefore, the web-based tool will be upgraded to provide water quality data from EPA STORET.