Exploring PCSWMM for Large Mixed Land Use Watershed by Establishing Monitoring Sites to Evaluate Stream Water Quality

: Extensive hydrologic and water quality modeling within a watershed benefits from long-term flow and nutrient data sets for appropriate model calibration and validation. However, due to a lack of local water quality data, simpler water quality modeling techniques are generally adopted. In this study, the monitoring sites were established at two different locations to collect hydraulic data for the hydraulic calibration and validation of the model. In addition, water quality samples were collected at eight monitoring sites and analyzed in the lab for various parameters for calibration. This includes total suspended solids (TSS), soluble phosphorus, five-day biochemical oxygen demand (BOD 5 ), and dissolved oxygen (DO). The Personal Computer Storm Water Management Model (PCSWMM) 7.6 software was used to simulate all the pollutant loads using event mean concentrations (EMCs). The performance of the model for streamflow calibration at the two USGS gauging stations was satisfactory, with Nash–Sutcliffe Efficiency (NSE) values ranging from 0.51 to 0.54 and coefficients of determination (R 2 ) ranging from 0.71 to 0.72. The model was also validated with the help of historical flow data with NSE values ranging from 0.5 to 0.79, and R 2 values ranging from 0.6 to 0.95. The hydraulic calibration also showed acceptable results with reasonable NSE and R 2 values. The water quality data recorded at the monitoring stations were then compared with the simulated water quality modeling results. The model reasonably simulated the water quality, which was evaluated through visual inspection using a scatter plot. Our analysis showed that the upstream tributaries, particularly from agricultural areas, were contributing more pollutants than the down-stream tributaries. Overall, this study demonstrates that the PCSWMM, which was typically used for modeling urban watersheds, could also be used for modeling larger mixed land use watersheds with reasonable accuracy.


Introduction
Water quality problems in rivers and streams have become a critical issue across the world over the past few decades due to the contamination of sediments [1], nutrients [1][2][3], and pesticides, [4], from both point and nonpoint sources of pollution [2,5].Non-point source pollution from agricultural land, industrial waste, and urbanized areas carries the pollutants and deposits into the rivers and streams [6-8], impacting the water quality of receiving waters and leading to adverse effects on aquatic life [9].In addition, point sources of pollution can also significantly degrade water quality [9,10].The stream water quality is sensitive to various anthropogenic activities [11,12], and depends upon different geographical locations [13,14].For instance, agricultural practices like livestock farming and using artificial fertilizers can lead to eutrophication in nearby water bodies [15].While point source pollution has been extensively regulated to minimize the nutrient loadings in the streams, nonpoint source pollution has emerged as a significant and challenging contributor to waterway contamination, with agriculture notably standing out as the largest and most complex contributor [4,6,16].The nonpoint source pollution is affected by a number of factors, such as the physical and chemical characteristics of the watershed, amounts of pollutants, land use, soil types [17,18], basin slope [19], vegetation of the catchment [20], rainfall intensity and duration [21], and antecedent dry days [22].As a result, several watershed models such as the HEC-HMS [23], SWAT [24], LSPC [25], and HSPF [26], have been used to simulate the flow and nutrients in the large mixed land use watershed.One of the major challenges in hydrologic and water quality modeling is the lack of enough data for adequate model calibration and validation.For example, in order to develop a robust watershed model, the calibration of both hydrologic and water quality components of the model is critical, which generally relies on the availability of continuous long-term streamflow data.However, in many cases, hydrologists have to deal with a lack of data in modeling investigations.For example, the location of interest may just include one or two gauging stations with sporadic streamflow records and a lack of continuous flow data especially for ungauged watersheds.As a result, modeling is generally challenging with ungauged basins where streamflow measurements are not easily accessible [27].In these circumstances, the model cannot be adequately calibrated and is exceedingly challenging to use [27,28], for decision-making.In such situations, alternative methods, such as the use of HOBO loggers, can be employed to gather data and bridge the data gap, enhancing the accuracy of hydrological models [29].Therefore, to overcome the data scarcity, this study sought to introduce an innovative approach to depth data collection and water quality samples, which is further substantiated by the calibration and validation of the model against observed data.The majority of the watershed models listed above rely on hydrologic components for model calibration and do not offer any hydraulic model calibration.On the other hand, for hydrologic calibration, the rating curve should be developed beside the continuous measurement of stage data.In order to overcome this issue, the Storm Water Management Model (SWMM), which can utilize the flow depth data for hydraulic calibration, was used in this study.Although this model has shown great performance in simulating urban and suburban watersheds, its efficiency in a large heterogeneous watershed has not been extensively assessed [30].
The primary focus of this involved establishing monitoring stations and conducting strategic model calibration and validation for hydrologic, hydraulic, and water quality parameters.The goal was to assess the effectiveness of the PCSWMM model in large watersheds.Due to the availability of continuous data, the model has been developed by recording stage data in the field.In addition, water quality samples were taken and analyzed in the lab for a number of parameters for water quality calibration in PCSWMM.Since many of the earlier studies using the PCSWMM model have been limited to smaller catchments and have mainly been conducted in urbanized watersheds, we investigated the model's performance in larger catchments where hydrologic data were unavailable and evaluated the model's predicted water quality vs observed water quality.

Theoretical Description Personal Computer Storm Water Management Model (PCSWMM)
The Storm Water Management Model was developed by the United States Environmental Protection Agency (US EPA) in 1971.It serves as a rainfall-runoff model that accurately simulates the quantity and quality of runoff from single events and continuous simulations.This model applies to both urban and rural environments [31,32].
In 1984, a commercially available improved version called the PCSWMM with a Geographic Information System (GIS) interface was developed to provide a diverse range of applications [33].It offers a flexible and comprehensive solution for analyzing rainfallrunoff dynamics in both one-dimensional and two-dimensional scenarios [33].PCSWMM enables the simulation of water movement through channels and overland flow [32,34].It has been reported that PCSWMM is also capable of modeling natural watersheds [34,35], performing comprehensive hydrological analysis on catchments [30], by incorporating precipitation, runoff, and pollutant hydrographs, along with crucial factors like evapotranspiration, infiltration, and groundwater percolation in its calculations.Buildup equations in terms of power, saturation, and exponential, and wash-off equations in terms of event mean concentration, exponential, and rating curve are used in SWMM [31].

Study Area
The study was conducted in the Mill Creek watershed (Figure 1) of the Mahoning River basin located in Northeastern Ohio.The watershed, which is 78.3 square miles in size, starts just south of the city of Columbiana in the northeast part of Columbiana County and runs north through Boardman Township and Mill Creek Park before finally draining into the Mahoning River at Youngstown, Ohio.The watershed is dominated by developed land, accounting for 51.7% of the entire watershed area.Similarly, the percentage covered by water bodies and wetlands is 5.52%, as determined by the National Land Cover Datasets (NLCD).Agricultural land use comprises 5.2% of the northern sub-watershed, 40% of the mid-western sub-watershed, and 51.9% of the southern sub-watershed [36].The watershed varies the elevation range from a maximum of 1286 feet to a minimum of 826 feet.The average annual precipitation in the watershed is 35.1 inches.Mill Creek is the major waterway in the watershed, along with seven tributaries: Bears Den Run, Ax Factory Run, Andersons Run, Cranberry Run, Indian Run, Sawmill Run, and Turkey Run.In general, there is a series of three lakes from upstream to downstream, including Newport Lake, Lake Cohasset, and Lake Glacier, which possess significant water-holding capacities, ultimately imposing a considerable impact on the hydrology and water quality of Mill Creek.There are various potential sources of contamination in the Mill Creek watershed, such as animal waste, agricultural land, combined sewer overflows, failing septic systems, and runoff from urban areas [36].These sources cause several water quality issues, including bacterial contamination, algal blooms, turbidity, and fish killing.Since Mill Creek is a recreational park with various activities, including fishing and boating, water contamination can be detrimental to human health.
runoff dynamics in both one-dimensional and two-dimensional scenarios [33].PCSWMM enables the simulation of water movement through channels and overland flow [32,34].It has been reported that PCSWMM is also capable of modeling natural watersheds [34,35], performing comprehensive hydrological analysis on catchments [30], by incorporating precipitation, runoff, and pollutant hydrographs, along with crucial factors like evapotranspiration, infiltration, and groundwater percolation in its calculations.Buildup equations in terms of power, saturation, and exponential, and wash-off equations in terms of event mean concentration, exponential, and rating curve are used in SWMM [31].

Study Area
The study was conducted in the Mill Creek watershed (Figure 1) of the Mahoning River basin located in Northeastern Ohio.The watershed, which is 78.3 square miles in size, starts just south of the city of Columbiana in the northeast part of Columbiana County and runs north through Boardman Township and Mill Creek Park before finally draining into the Mahoning River at Youngstown, Ohio.The watershed is dominated by developed land, accounting for 51.7% of the entire watershed area.Similarly, the percentage covered by water bodies and wetlands is 5.52%, as determined by the National Land Cover Datasets (NLCD).Agricultural land use comprises 5.2% of the northern sub-watershed, 40% of the mid-western sub-watershed, and 51.9% of the southern sub-watershed [36].The watershed varies the elevation range from a maximum of 1286 feet to a minimum of 826 feet.The average annual precipitation in the watershed is 35.1 inches.Mill Creek is the major waterway in the watershed, along with seven tributaries: Bears Den Run, Ax Factory Run, Andersons Run, Cranberry Run, Indian Run, Sawmill Run, and Turkey Run.In general, there is a series of three lakes from upstream to downstream, including Newport Lake, Lake Cohasset, and Lake Glacier, which possess significant water-holding capacities, ultimately imposing a considerable impact on the hydrology and water quality of Mill Creek.There are various potential sources of contamination in the Mill Creek watershed, such as animal waste, agricultural land, combined sewer overflows, failing septic systems, and runoff from urban areas [36].These sources cause several water quality issues, including bacterial contamination, algal blooms, turbidity, and fish killing.Since Mill Creek is a recreational park with various activities, including fishing and boating, water contamination can be detrimental to human health.

Watershed Model Configuration with Input Data
The PCSWMM model was employed to simulate the entire hydrologic process.The modeling of stream flows requires several inputs, such as land use, soil data, digital elevation model (DEM), meteorological data (Table 1), and stream cross sections whenever possible.In order to depict the geographical features of the sites with precision, we obtained high-resolution digital elevation models (DEM) with a resolution of 10 m from the USGS National Elevation Dataset (NED).These models were in raster format and included detailed information of the topography, including slope gradient, stream networks, and slope length.The DEM datasets were used to delineate the watershed and divide it into 36 sub-basins using the PCSWMM automated watershed delineation tool.To accurately capture the existing land use characteristics of the watershed, the study utilized data from the National Land Cover Database (NLCD) to precisely incorporate the current land use features of the watershed.Since soil plays an important role in hydrological processes, high-resolution soil data sourced from the Soil Survey Geographic Database (SSURGO) were employed within ArcGIS Pro to compute the essential curve number and percentage imperviousness, which significantly affects the determination of infiltration abstractions.The curve number infiltration model is widely accepted for computing infiltration abstractions, with its accuracy influenced by soil properties and land use types (SCS 1964;SCS 1972).The climate data, including the precipitation data were sourced from the National Oceanic and Atmospheric Administration (NOAA) for station (USW00014852), while streamflow data from two USGS gauging stations were seamlessly integrated into the model.Furthermore, two HOBO loggers were strategically positioned to gather daily depth data, enabling comprehensive multi-site calibration and validation of the model.Water quality samples were also collected from the eight designated stations (Figure 1) using the grab sampling method and sent to the laboratory for analysis, which was further used for water quality calibration and validation.

Hydraulic Model Configuration
The representation of Mill Creek and its tributaries downstream included the use of several cross-sections that were determined based on the geometric characteristics of the stream network over the entire watershed.The hydraulic model comprises a primary branch, 20.9-mile creek, and seven tributaries, namely Bears Den Run, Ax Factory Run, Andersons Run, Cranberry Run, Indian Run, Sawmill Run, and Turkey Run.The data used to create the hydraulic model include cross-sectional information for river segments at various places, as well as measured depth data and observed flows from USGS for the purpose of model calibration.The water depth of the stream was acquired by deploying HOBO loggers at two distinct locations throughout the whole watershed.Additionally, the cross-sectional data were retrieved from the 10 m Digital Elevation Model (DEM) sourced from the USGS National Elevation Dataset (NED) in raster format.The river cross-section was further verified from the FEMA HEC-2 flood-forecast studies, HEC-RAS modeling, and site surveying.Channel slope was further cross validated from the watershed delineation in ArcGIS 9.2.The manning's roughness for the channel was adopted 0.03 from various research studies (Table 2).Meanwhile, the hydrodynamic model necessitated the specification of both the downstream initial condition and boundary condition.Due to the unavailability of observed downstream water level data, we deployed HOBO loggers to gather depth data and use them as the downstream boundary condition.Additionally, this observed depth data were used for the hydraulic calibration and validation of the model.

Hydrologic Model Calibration and Validation
Since the latest data offered by USGS were from late 1999 to mid 2000, The PCSWMM model was set up for the period from 1999 to 2000 and ran on a daily time scale after an initial 6-month warm period.Six months of daily observed flow data from 1/1/2000 to 6/10/2000 at two USGS gauging stations within the watershed were used for the model calibration.The PCSWMM sensitivity-based radio-tuned calibration (SRTC) tool was used to identify the most sensitive parameters for hydrologic simulation [32].In addition, a manual calibration was performed after the automatic calibration to precisely adjust the model parameters.During the calibration process, the measured parameters were considered to be free from errors, but the inferred values were adjusted as suggested by [45].As a result, inferred parameters such as depression storage, percentage of impervious infiltration parameters, channel and catchment roughness, and pervious area were corrected during the calibration process as presented in Table 2.
In the next step, optimized model parameters were tested against the 20 years of historical daily observed streamflow data ranging from 1952 to 1972 at the USGS site for validation purposes.As reported earlier, we conducted a hydrologic investigation in the historical time period due to the lack of recent data.The hydrologic model calibration (Figure 2) and model validation (Figure 3) were accomplished using historical data.Additionally, the model was also calibrated and validated for the recent period by recording the hydraulic depth data ranging from May to December 2023 as shown in Figures 4 and 5, respectively.

Hydrologic Model Calibration and Validation
Since the latest data offered by USGS were from late 1999 to mid 2000, The PCSWMM model was set up for the period from 1999 to 2000 and ran on a daily time scale after an initial 6-month warm period.Six months of daily observed flow data from 1/1/2000 to 6/10/2000 at two USGS gauging stations within the watershed were used for the model calibration.The PCSWMM sensitivity-based radio-tuned calibration (SRTC) tool was used to identify the most sensitive parameters for hydrologic simulation [32].In addition, a manual calibration was performed after the automatic calibration to precisely adjust the model parameters.During the calibration process, the measured parameters were considered to be free from errors, but the inferred values were adjusted as suggested by [45].As a result, inferred parameters such as depression storage, percentage of impervious infiltration parameters, channel and catchment roughness, and pervious area were corrected during the calibration process as presented in Table 2.
In the next step, optimized model parameters were tested against the 20 years of historical daily observed streamflow data ranging from 1952 to 1972 at the USGS site for validation purposes.As reported earlier, we conducted a hydrologic investigation in the historical time period due to the lack of recent data.The hydrologic model calibration (Figure 2) and model validation (Figure 3) were accomplished using historical data.Additionally, the model was also calibrated and validated for the recent period by recording the hydraulic depth data ranging from May to December 2023 as shown in Figure 4 and Figure 5, respectively.

Hydraulic and Water Quality Monitoring
This study involved the monitoring and sampling of water quality in the Mill Creek watershed.Considering different land use patterns, potential sources of contamination, and natural features, eight monitoring sites were chosen (Figure 1) after consulting with the stakeholders, representatives from CT Consultant and Ohio Austin town Boardman and Canfield (ABC) Stormwater District.These stations were placed carefully to cover the entire watershed.In order to ensure quality assurance and consistency, certain protocols were deployed during the collection of water samples.The water samples were collected by grab sampling method at the designated sites and sent to the laboratory for the water quality analysis to prevent any degradation of the materials during transport.The samples were carefully analyzed in a laboratory setting under strict control.This analytical procedure provided thorough insights into the natural water quality conditions, which covered a wide range of variables, including straightforward assessments of the physical properties of the water, like temperature, pH, and turbidity, for intricate analyses of its chemical makeup, such as soluble phosphorus, dissolved oxygen, five-day biochemical oxygen demand (BOD 5 ), and total suspended solids.

Lab Analysis
Water temperature, conductivity, and pH measurements were conducted at each sampling site by the YSI Pro Plus meter.In the laboratory, water samples underwent comprehensive analysis to assess various parameters crucial for environmental assessment.

Biochemical Oxygen Demand
The determination of Biochemical Oxygen Demand (BOD) followed the 5-day Standard Test Method 5210 [46].We promptly evaluated within a maximum of 24 h after collection, with a preference for analysis completion within 6 h post sampling.In cases of delayed analysis, samples were stored at a controlled temperature of 4 • C until examination, with meticulous documentation of any duration surpassing the 6-h holding period.Each site sample underwent multiple dilutions, typically ranging from three to four, and was augmented with approximately three milliliters of a standard seed solution (Poly Seed).Subsequently, containers were filled to their maximum capacity with dilution water comprising phosphate buffer solution, magnesium sulfate solution, calcium chloride solution, and ferric chloride solution, as stipulated in the Standard Method.Dissolved oxygen (DO) levels were quantified using a YSI 5100 instrument, with measures taken to prevent the presence of air bubbles by inverting the bottle post-measurement and securely stoppering it before sealing it with water and capping it.Incubation of all samples, including unseeded blanks, seeded blanks, and the 2% glucose-glutamic acid (GGS) standard solution, was carried out for 5 days at a controlled temperature of 20 • C. Following the incubation period, samples were retrieved, and dissolved oxygen (DO) levels were reassessed and the BOD 5 was subsequently determined.

Total Suspended Solids
Also, we determined the total suspended solids and total dissolved solids in duplicate following Standard Method 2540 [46].To ensure the accuracy and precision of the methodology, standard solutions were periodically analyzed alongside samples.Samples were analyzed within 7 days.For total solids (TS), subsamples from each site were measured and transferred into pre-cleaned, dried, and weighed porcelain crucibles.These samples were then subjected to oven drying at 105 • C, followed by cooling in a desiccator, and subsequently reweighed to determine the total solids content using Equation (2) in [46].This process adhered to standardized procedures to accurately quantify the total solids content in the water samples.

Soluble Reactive Phosphorus
Orthophosphate, or soluble reactive phosphorus (SRP), was quantified using Standard Method 4500-P [46], commonly referred to as the Ascorbic Acid Method (4500-P E.), and consistent with EPA Method 365.2.This analytical procedure necessitated completion within 48 h of sample collection.Prior to analysis, samples were filtered through a 0.45 µm filter to remove particulate matter.Subsequently, subsamples of each water sample were extracted in duplicate.A combined reagent, comprising 100 mL of 5N H 2 SO 4 , 10 mL of potassium antimony tartrate solution, 15 mL of ammonium molybdate solution, and 60 mL of ascorbic acid, was freshly prepared just before analysis and remained effective for 4 h.Calibration standards were meticulously prepared using stock standard solution (RICCA Chemical) to generate a calibration curve extending up to 1 mg/L PO4-P.Additionally, a spiked sample was analyzed for every 20 samples to ensure method accuracy and reliability.
For analysis, all standards, spike samples, a blank, and water samples had 4 mL of combined reagent added to each 25 mL of sample.After allowing the samples to sit for 10-15 min to produce a blue color, absorbance was measured at 880 nm using a spectrophotometer (GENESYS 10S VIS).The blank was utilized to zero the spectrophotometer.Absorbance readings were recorded no sooner than 10 min after the addition of the combined reagent but before 30 min.A standard curve was constructed with concentration on the x-axis and absorbance on the y-axis.The resulting regression equation derived from the standard curve was employed to determine the concentration of soluble reactive phosphorous in the samples.

Water Quality Calibration
Since the water quality data were sporadically collected from 2017 to 2018, an average of six observed data were recorded for each designated monitoring station for the period from 2022 to 2023.For nutrient simulation, we chose manual calibration in PCSWMM using available observed nutrient data.PCSWMM is capable of simulating pollutant delivery using buildup and wash-off equations.The parameterization of these equations for different land uses is important for utilizing the full potential of the software [38].Simplified data-based model representations using event mean concentration have proved efficient in achieving reasonable calibration and validation meanwhile managing computational burden [47,48].So, for water quality simulations, the EMC wash-off function was chosen.
There are numerous techniques for the estimation of stormwater quality and predicting pollutant loads which employ the build-up and wash-off parameters generated by specific land use patterns [49].The efficiency of this approach relies on the modeling objective and the data inputs available, which offer information on the accumulation of pollutants on the ground surface during dry conditions and their wash-off during wet conditions [50].PCSWMM simulates the buildup process using exponential functions, power functions, and saturation equations.In contrast, the wash-off process is approximated using the event mean concentration, exponential function, or rating curve equations [31].The buildup and wash-off equations consist of multiple parameters that are challenging to calibrate which requires different parameter values for various pollutants and study areas.However, the average EMC values for different pollutants and different land uses are available.For efficient modeling, simpler data-driven modeling techniques were adopted utilizing the event mean concentration [47,48].This method considers that the concentrations of pollutants in the runoff remain consistent throughout an event [50], and eliminates the need for a buildup parameter [31].Using the average EMC throughout the simulation of multiple events, it may provide reasonably accurate estimates of total pollutant loads, similar to those obtained by process-based accumulation and wash-off approaches [50].Therefore, instead of the buildup function [31], the EMC wash-off function was selected for water quality simulations in SWMM.Thus, the water quality simulations simply need the land cover type's unique EMC values as parameter inputs.The pollutants that were examined were the total suspended solids (TSS), soluble phosphorus, dissolved oxygen (DO), and biochemical oxygen demand (BOD 5 ).
Upon completion of the water quality simulation, the model's predictions regarding water quality at different time intervals were juxtaposed with observed data.This comparative analysis aimed to ascertain the model's efficacy in assessing the water quality status of tributaries and to provide a basis for comparative evaluations.

Results and Discussion
In the absence of local guideline values for EMC, a comprehensive assessment of relevant literature was conducted to establish potential values.The findings from such literature review are shown in Table 3.

Hydraulic and Hydrologic Model Calibration
The PCSWMM model demonstrated satisfactory performance in both the calibration and validation phases, which was evaluated using statistical parameters, such as Nash-Sutcliffe Efficiency (NSE) and R 2 as 0.54, 0.51, and 0.72, 0.71, as reported in Table 4. Additionally, the model's performance was visually inspected by comparing the time series of simulated and observed data as R 2 as 0.5 and 0.9 and NSE as 0.48-0.89,as illustrated in Figures 2 and 3. Since the USGS discontinued the streamflow record in early 2000, we utilized some historical flow data from 1970 to see how the model would perform at different rainfall events.Since Mill Creek is situated in the Youngstown area, which was developed in the mid 1900s, and development activities were not so rampant due to the collapse of steel industries, the effect of land use change in hydrological modeling was considered nominal in comparison to the effect due to the lack of spatially distributed rainfall data across the Mill Creek watershed.There is a single rain gauge station near the watershed and that is also roughly 27 miles away from the watershed boundary.Therefore, the event-based model was developed when there was a period of significant rainfall, and the hydrological model was validated with the historical flow data.The goodness of fit, specifically NSE, fell within the range from 0.51 to 0.53, and R 2 values fell within the range from 0.71 to 0.72.Subsequent validation of the model involved historical streamflow data, yielding NSE values ranging from 0.43 to 0.89 and R 2 values spanning from 0.51 to 0.97 as tabulated in Table 5.On the other hand, two strategic locations were selected to deploy HOBO loggers, tactically covering a major part of the watershed, including the outlet of the watershed at the downstream, as shown in Figure 1.The hydraulic model calibration using the recently collected data from May to August 2023 was satisfactory at the two monitoring stations, as depicted in Figure 4.One of the reasons for the model's good performance at this station was mainly because it was located in close proximity to the rain gauging station.Subsequently, the hydraulic model validation was also conducted using recently collected data from September to December 2023, as illustrated in Figure 5.Moreover, the varying cross-section of the Mill Creek stream and its tributaries posed challenges in obtaining detailed cross-sections to adequately calibrate the hydraulic model.While several cross-sections in the stream were taken, the intricacies associated with cross-sectional data collection somehow influenced the calibration outcomes of the model, suggesting the importance of meticulous consideration in such hydraulic modeling efforts.

Water Quality Calibration
During the water quality assessment of the model, a wash-off parameter called event mean concentration (EMC) was selected for the whole simulation period.This parameter was chosen based on the documented value from past studies, which is listed in Table 3.Also, the ranges of EMC used in the previous studies are reported in Figure 6.The performance of the water quality calibration and validation was assessed through the graphical representation of observed and simulated nutrient flow through visual inspection for eight different monitoring stations as shown in Figure 7.We collected the water quality samples intermittently mostly after certain storm events.As a result, data were sporadic and clustered with a limited availability for the model calibration.Water quality data from the period between 2022 and 2023 were selected for calibration, while data from 2017 and 2018 were chosen for validation purposes.However, the water quality calibration of a single water quality monitoring site is presented in Figure 7.The nutrient calibration exhibited satisfactory results downstream of the watershed for BOD 5 , DO, TSS, and soluble phosphorus (Figure 7).However, the assessment of water quality calibration at the headwater tributaries showed less satisfactory outcomes, mainly because of the limited data.Additionally, only a small portion of these data sets were included in the simulated datasets.The water quality calibration results were considerably affected in the upstream part of the watershed primarily because of the rain gauge being positioned outside the watershed boundary.It is noteworthy to report that water quality simulation relies on the hydrologic model performance and the hydrologic/hydraulic model performance at those particular stations was largely affected due to the remotely located rain gauge station.The distance between the rain gauge and the farthest point in the watershed was approximately 27 miles.However, it was in closer proximity to the downstream part of the watershed, which was 10.9 miles, which revealed the notably improved calibration results at the downstream of the watershed when compared with the observed upstream results.This outcome implies that upstream tributaries are perhaps more sensitive to isolated rainfall events.Similarly, we understand that short-duration high-intensity storms usually affect smaller geographies.Larger watershed models are more easily calibrated to longduration, high-volume events as the impact of peak intensities is somewhat attenuated, and long-duration, high-volume events are more likely widespread events by their nature.
For the simulation of pollutants, a detailed sub-catchment discretization is crucial.In stormwater modeling, catchments are often categorized in accordance with land use classifications, which generally consist of residential, agricultural, commercial, industrial, and forested areas.It is critical to discretize the model into fine land use to adequately capture pollutant loading from various land uses for better model simulations.Different land use categories generate different EMCs values [50,63].While the use of these EMCs to model stormwater quality is a common approach, there are a lot of unknowns when modeling pollution loads with EMCs, especially when there is not enough local data for calibration [64].This technique relies on concentration values found in literature and does not need substantial monitoring or parameter calibration.Since the watershed is significantly large, a comprehensive model discretization using land cover is essential to adequately divide the catchment into several sub-catchments to thoroughly represent the various land use characteristics in the model [64].This generally demands higher levels of effort compared to a typical and less detailed model discretization.Therefore, we discretized the study area into three land use categories, such as residential, agricultural, and forested, and assigned them different EMCs values from the past studies as shown in Figure 6.The stormwater pollutant load modeling using these literature-derived EMCs resulted in signif-icantly fluctuating loads at some of the locations in the watershed (Figures 7 and 8).The reason for this disparity is attributed to the absence of local data for the EMCs in the source location.The simulations using alternative EMC values (Table 3) resulted in variability in the loading of soluble phosphorus at some monitoring locations (Figure 7).The site surrounded by the forested area exhibited the highest concentration of TSS resulting in the erosion and runoff of sediments due to the occurrence of heavy rainfall.During the water quality investigation, data from all the tributaries were collected and examined for parameters such as BOD 5 , TSS, DO, and soluble phosphorus.The highest concentration of soluble phosphorus (3474 µg/L) was observed upstream near the first monitoring location, likely due to the presence of cattle manure or commercial fertilizer which is surrounded by agricultural land use.Conversely, the lowest concentration (9.44 µg/L) was found downstream near monitoring station number 14, in which the land use comprises of open space and low-intensity developed area.The concentration of dissolved oxygen was found to be consistent throughout all monitoring points along the stream, as shown in Figure 10.
the catchment into several sub-catchments to thoroughly represent the various land use characteristics in the model [64].This generally demands higher levels of effort compared to a typical and less detailed model discretization.Therefore, we discretized the study area into three land use categories, such as residential, agricultural, and forested, and assigned them different EMCs values from the past studies as shown in Figure 6.The stormwater pollutant load modeling using these literature-derived EMCs resulted in significantly fluctuating loads at some of the locations in the watershed (Figures 7 and 8).The reason for this disparity is attributed to the absence of local data for the EMCs in the source location.The simulations using alternative EMC values (Table 3) resulted in variability in the loading of soluble phosphorus at some monitoring locations (Figure 7).The site surrounded by the forested area exhibited the highest concentration of TSS resulting in the erosion and runoff of sediments due to the occurrence of heavy rainfall.During the water quality investigation, data from all the tributaries were collected and examined for parameters such as BOD5, TSS, DO, and soluble phosphorus.The highest concentration of soluble phosphorus (3474 µg/L) was observed upstream near the first monitoring location, likely due to the presence of cattle manure or commercial fertilizer which is surrounded by agricultural land use.Conversely, the lowest concentration (9.44 µg/L) was found downstream near monitoring station number 14, in which the land use comprises of open space and low-intensity developed area.The concentration of dissolved oxygen was found to be consistent throughout all monitoring points along the stream, as shown in Figure 10.In order to conduct a comprehensive water quality analysis, we examined simulated nutrient concentrations against observed values for TSS, BOD5, DO, and soluble phosphorus particularly for high-volume rainfall events (P > 0.5 inches).We analyzed the water quality for greater than 0.5 inches of rainfall for two reasons: (i) first, the hydrologic model was performing adequately well when there was high precipitation because this was the period when the entire watershed was contributing to the runoff and the single rain gauge station near to the watershed was also representing the rainfall pattern to some extent; (ii) second, we took the water quality samples in the streams after the rainfall, which was generally expected to be higher than 0.5 inches.
Overall, the model performed well in replicating mean nutrient concentrations across stations, with some discrepancies observed at a few locations.For TSS, the box plot analysis revealed that mean concentrations of simulated and observed data fell within a specific range, as depicted in Figure 9. Additionally, the simulated and observed DO concentrations were consistently in the similar range between 8 and 10 mg/L (Figure 10).BOD simulated concentration through the model closely resembled the observed BOD concentration (Figure 11).In general, the minimum observed value was 0.44 mg/L, and the max- In order to conduct a comprehensive water quality analysis, we examined simulated nutrient concentrations against observed values for TSS, BOD 5 , DO, and soluble phosphorus particularly for high-volume rainfall events (P > 0.5 inches).We analyzed the water quality for greater than 0.5 inches of rainfall for two reasons: (i) first, the hydrologic model was performing adequately well when there was high precipitation because this was the period when the entire watershed was contributing to the runoff and the single rain gauge station near to the watershed was also representing the rainfall pattern to some extent; (ii) second, we took the water quality samples in the streams after the rainfall, which was generally expected to be higher than 0.5 inches.
Overall, the model performed well in replicating mean nutrient concentrations across stations, with some discrepancies observed at a few locations.For TSS, the box plot analysis revealed that mean concentrations of simulated and observed data fell within a specific range, as depicted in Figure 9. Additionally, the simulated and observed DO concentrations were consistently in the similar range between 8 and 10 mg/L (Figure 10).BOD simulated concentration through the model closely resembled the observed BOD concentration (Figure 11).In general, the minimum observed value was 0.44 mg/L, and the maximum BOD was 10.18 mg/L, which was high for the freshwater body.
quality for greater than 0.5 inches of rainfall for two reasons: (i) first, the hydrologic model was performing adequately well when there was high precipitation because this was the period when the entire watershed was contributing to the runoff and the single rain gauge station near to the watershed was also representing the rainfall pattern to some extent; (ii) second, we took the water quality samples in the streams after the rainfall, which was generally expected to be higher than 0.5 inches.
Overall, the model performed well in replicating mean nutrient concentrations across stations, with some discrepancies observed at a few locations.For TSS, the box plot analysis revealed that mean concentrations of simulated and observed data fell within a specific range, as depicted in Figure 9. Additionally, the simulated and observed DO concentrations were consistently in the similar range between 8 and 10 mg/L (Figure 10).BOD simulated concentration through the model closely resembled the observed BOD concentration (Figure 11).In general, the minimum observed value was 0.44 mg/L, and the maximum BOD was 10.18 mg/L, which was high for the freshwater body.Soluble phosphorus concentrations did not exhibit as good performance as other water quality parameters at some stations (Figure 12) as it was crucial to calibrate the PCSWMM model for soluble phosphorus concentration.The comparison between modeled and observed data for Sites 2-15 indicate a notable degree of similarity in pollutant concentrations, suggesting a reliable representation of water quality dynamics by the model.This parity underscores the model's effectiveness in accurately simulating pollutant behavior across multiple sites.Leveraging the land use interface within PCSWMM, the model has the capability to identify critical sites within the stream network.By analyzing pollutant levels across tributaries, the model can distinguish between highly polluted and less polluted areas, providing valuable insights for prioritizing remediation efforts and resource allocation.In summary, our analysis demonstrates that the model reasonably captured mean concentration trends, as illustrated in Figures 9 through Figure   Soluble phosphorus concentrations did not exhibit as good performance as other water quality parameters at some stations (Figure 12) as it was crucial to calibrate the PCSWMM model for soluble phosphorus concentration.The comparison between modeled and observed data for Sites 2-15 indicate a notable degree of similarity in pollutant concentrations, suggesting a reliable representation of water quality dynamics by the model.This parity underscores the model's effectiveness in accurately simulating pollutant behavior across multiple sites.Leveraging the land use interface within PCSWMM, the model has the capability to identify critical sites within the stream network.By analyzing pollutant levels across tributaries, the model can distinguish between highly polluted and less polluted areas, providing valuable insights for prioritizing remediation efforts and resource allocation.In summary, our analysis demonstrates that the model reasonably captured mean concentration trends, as illustrated in Figure 9 through Figure 12.The highest total suspended solids concentrations were observed at the upstream part of the watershed, whereas consistently lowest TSS concentrations were observed at the downstream part of the watershed.This trend was consistent in both the simulated and observed data, and the pattern can be attributed mainly due to the distinct land use characteristics of the watershed areas surrounding these sites.Site 2, located upstream, is surrounded by agricultural farmland, where runoff and erosion contribute to higher TSS levels due to soil disturbance and agricultural activities (Figure 9).Conversely, Site 15 is The highest total suspended solids concentrations were observed at the upstream part of the watershed, whereas consistently lowest TSS concentrations were observed at the downstream part of the watershed.This trend was consistent in both the simulated and observed data, and the pattern can be attributed mainly due to the distinct land use characteristics of the watershed areas surrounding these sites.Site 2, located upstream, is surrounded by agricultural farmland, where runoff and erosion contribute to higher TSS levels due to soil disturbance and agricultural activities (Figure 9).Conversely, Site 15 is predominantly urbanized, with impervious surfaces and stormwater runoff contributing to lower TSS concentrations compared to agricultural areas (Figure 9).A similar trend was realized both in the observed and simulated data, which underscored the reliability of the modeling approach in capturing the influence of land use characteristics on TSS concentrations across the watershed.Based on the observed data, Sites 2, 10, and 14 exhibited high dissolved oxygen (DO) levels, while Site 15 showed lower DO levels (Figure 10).A similar trend was detected by model simulation except in Site 2. Several factors may contribute to these discrepancies between observed and simulated DO levels at Site 2. While the observed data suggested high DO in Site 2, this site is surrounded by agricultural farmland with buffer land and may experience increased nutrient runoff and organic matter decomposition leading to decreased DO levels as suggested by the simulated outcome.Site 14 is surrounded by predominantly housing development and impervious surfaces.Both Sites 10 and 14 received less agricultural runoff, exhibiting higher DO as detected both from the simulated outcome and observed data.Site 15, being located near the urbanized area, may suffer from oxygen depletion due to organic pollution from sewage and runoff, resulting in lower DO levels, which was detected both by simulated and observed data.Both the observed and simulated results consistently demonstrated the lowest BOD levels at Sites 5, 7, 8, and 10.Conversely, Sites 14 and 15 consistently displayed higher BOD levels in both the observed and simulated data (Figure 11).Additionally, Site 2 had mean BOD levels comparable to Sites 5, 7, 8, and 10 both in the observed data and simulated outcome, whereas it was lower compared to Sites 14 and 15, especially in the observed data.Similarly, soluble phosphorus at Site 2 was higher both from the observed records as well as simulated outcomes (Figure 12).The trend of observed and simulated soluble phosphorus was consistent except for Site 5. Site 5 received the nutrients from the mixed land use with a more or less equal proportion of forest, developed areas, and pasture on the upstream of Site 5.By and large, the model captured the trend of the simulated results which was consistent with the observed data.

Conclusions and Recommendations
This comprehensive study was conducted in the Mill Creek watershed, which is part of the larger Mahoning River Basin.Our primary objective was to calibrate and validate a hydraulic and hydrological model with the help of the PCSWMM modeling.For the calibration of the model, some necessary observed data were required to verify against the simulated data, such as the streamflow and water depth.However, due to the lack of continuous data, this study investigated a new approach for the collection of primary field data, including precise water level measurements with the help of HOBO loggers and the systematic assessment of the creek's cross-section, which was crucial for finetuning the model's hydraulic parameters.Additionally, we relied on the streamflow data from two strategically placed USGS gauging stations, which helped us in the hydrologic calibration of the model.For further validation, the model was run for the historical events ranging from 1952 to 1972, and it was found that the model performed very well during high-volume rain events yielding good results.The model did not perform well during periods of low rainfall events, potentially due to the rain gauge being located outside the watershed.The nearest monitoring station was located 10.9 miles away, while the farthest monitoring station was 23.3 miles away from the rain gauge.The spatially distributed rain gauge station inside the watershed is critical for the proper calibration and validation of the model.
For the calibration process, the SRTC was utilized, which helps adjust the settings in the model to account for uncertainties in certain parameters.It does this by running the model at the high and low ends of these uncertainties.The most sensitive parameter was the sub-catchment percentage imperviousness followed by Manning's perviousness and imperviousness.It was observed that Manning's roughness value for the open channel did not have a great influence on the calibration of the model.
Similarly, for the water quality calibration, various sites were monitored in the watersheds for recording a number of water quality parameters for several events.The site selection was based on comprehensive engagement with stakeholders to gather data and obtain their input on the current land use practices and agricultural trends.The water quality samples were collected, analyzed, and calibrated for TSS, BOD 5 , DO, and soluble phosphorus at various monitoring locations of the watershed.In this study, the water quality calibration and validation were not as good as the hydrologic and hydraulic model calibration due to many reasons.The stormwater pollutant load modeling using literaturederived EMCs resulted in significantly fluctuating loads, both at the watershed level and for each land cover type.The simulated loads at times underestimated when compared to the measured values.The reason for this underestimate might be attributed to the EMC values in the literature that did not correlate with the unique regional conditions of the watershed.Estimating pollutant loads using EMCs is likely to have substantial errors, particularly when there is insufficient local data available for calibration and validation of the model.However, the use of EMCs for modeling remains the prevailing method employed by experts and is regarded as feasible in the absence of other data sources.Moreover, our study extended to identify the poor water quality sites within the watershed at various locations in the stream.The monitoring sites were selected in such a way that they could cover the entire watershed from upstream to downstream at key tributary inputs.The assessment of stream quality revealed that the uppermost portion of the watershed experienced higher levels of soluble phosphorus, which was surrounded by agricultural land use, whereas its concentrations were notably reduced downstream in the watershed, which was mostly urbanized.In contrast, the research revealed elevated concentrations of TSS in the immediate vicinity downstream of the watershed which is covered by urban land use.It was noticed that the DO concentration was constantly maintained within the same range across the whole length of the watershed.
Further investigation is required in the future to accurately define the typical levels of pollutants in specific local weather circumstances and obtain a deeper understanding of the mechanisms that influence the movement of pollutants across various interconnected land cover types.However, to enhance its calibration accuracy in such conditions, it is crucial to gather more local data, especially rainfall and streamflow data that will ultimately increase confidence in the model.
This study aimed to investigate the use of the PCSWMM model in a large mixed land use watershed, despite its typical usage within urban stormwater modeling sheds.The model was able to capture the spatial and temporal variability of the stream's water quality with acceptable accuracy.The hydraulic modeling in PCSWMM, particularly for the headwater stream, was quite challenging.However, the model exhibited satisfactory performance regarding hydrologic and water quality considerations at the upstream of the watershed.The discrepancy in the downstream modeling results may be attributed to the non-availability of rainfall gauging stations within reasonable proximity.It was observed that the farthest point of the watershed was approximately 24 miles from the rain gauge, causing spatial variability.Therefore, it is advisable to install spatially distributed rain gauge stations inside the watershed for appropriate modeling during flow and water quality data collection periods.
This developed model is anticipated to be beneficial for the stakeholders, particularly the ABC Stormwater District and local watershed groups in the future.The model could be used to identify the critical sources of pollution and take appropriate actions by applying effective best management practices in the future.The study also concludes that the PCSWMM model could be utilized in the large mixed land use watershed with some reasonable accuracy if the spatially distributed precipitation data and hydrologic data are available.

Figure 1 .
Figure 1.Map of Mill Creek watershed, consisting of water quality sampling stations, HOBO loggers' location, and USGS gauge stations for PCSWMM model development.

Figure 4 .Figure 5 .Figure 4 .
Figure 4. PCSWMM model hydraulic model calibration: (a) station 1 at the outlet of the watershed, (b) station 2 located at the East Golf hike trail.

Figure 4 .Figure 5 .Figure 5 .
Figure 4. PCSWMM model hydraulic model calibration: (a) station 1 at the outlet of the watershed, (b) station 2 located at the East Golf hike trail.

Figure 7 .
Figure 7. Water quality calibration at the monitoring station 14 for the period from 2022 to 2023: (a) TSS (b) BOD 5 (c) DO (d) soluble phosphorus.

Figure 9 .
Figure 9. (a) Observed concentration of TSS in the stream for the period from 2022 to 2023, (b) simulated concentration of TSS by the model for the period from 2022 to 2023 when the precipitation depth was greater than 0.5 inches.

Figure 9 .Figure 10 .
Figure 9. (a) Observed concentration of TSS in the stream for the period from 2022 to 2023, (b) simulated concentration of TSS by the model for the period from 2022 to 2023 when the precipitation depth was greater than 0.5 inches.Hydrology 2024, 11, x FOR PEER REVIEW 19 of 25

Figure 10 .Figure 10 .Figure 11 .
Figure 10.(a) Observed concentration of DO in the stream for the period from 2022 to 2023, (b) simulated concentration of DO by the model for the period from 2022 to 2023 when the volume of precipitation was greater than 0.5 inches.

Figure 11 .
Figure 11.(a) Observed concentration of BOD 5 in the stream for the period from 2022 to 2023, (b) simulated concentration of BOD 5 by the model for the period from 2022 to 2023 when the volume of precipitation was greater than 0.5 inches.

Figure 12 .
Figure 12.(a) Observed concentration of soluble phosphorus in the stream for the period from 2022 to 2023, (b) simulated concentration of soluble phosphorus by the model for the period from 2022 to 2023 when the volume of precipitation was greater than 0.5 inches.

Figure 12 .
Figure 12.(a) Observed concentration of soluble phosphorus in the stream for the period from 2022 to 2023, (b) simulated concentration of soluble phosphorus by the model for the period from 2022 to 2023 when the volume of precipitation was greater than 0.5 inches.

Table 1 .
Data used for the study with their source and information.

Table 2 .
Model calibration parameters from different literature sources.

Table 3 .
EMC values from different literature sources.

Table 4 .
(a) Model calibration performance on the daily streamflow data at two USGS gauging stations within the watershed, (b) Model event-based calibration performance on the daily streamflow data at the outlet of the watershed.

Table 5 .
Historical validation of the daily streamflow data at the USGS gauging stations of the watershed.