Support Vector Regression Models of Stormwater Quality for a Mixed Urban Land Use

: The present study is an attempt to model the stormwater quality of a stream located in Pune, India. The city is split up into twenty-three basins (named A to W) by the Pune Municipal Corporation. The selected stream lies in the haphazardly expanded peri-urban G basin. The G basin has constructed stormwater drains which open up in this selected open stream. The runoff over the regions picks up the non-point source pollutants which are also added to the selected stream. The study becomes more complex as the stream is misused to dump trash materials, garbage and roadside litter, which adds to the stormwater pollution. Experimental investigations include eleven distinct locations on a naturally occurring stream in the G basin. Stormwater samples were collected for twenty-two storm events, for the monsoon season over four years from 2018–2021, during and after rainfall. The physicochemical characteristics were analyzed for twelve water quality parameters, including pH, Conductivity, Turbidity, Total solids (TS), Total Suspended Solids (TSS), Total Dissolved Solids (TDS), Bio-chemical Oxygen Demand (BOD5), Chemical Oxygen Demand (COD), Dissolved Oxygen (DO), Phosphate, Ammonia and Nitrate. The Water Quality Index (WQI) ranged from 46.9 to 153.9 and from 41.20 to 87.70 for samples collected during and immediately after the rainfall, respectively. Principal Component Analysis was used to extract the most signiﬁcant stormwater quality parameters. To understand the non-linear complex relationship of rainfall characteristics with signiﬁcant stormwater pollutant parameters, a Support Vector Regression (SVR) model with Radial Basis Kernel Function (RBF) was developed. The Support Vector Machine is a powerful supervised algorithm that works best on smaller datasets but on complex ones with the help of kernel tricks. The accuracy of the model was evaluated based on normalized root-mean-square error (NRMSE), coefﬁcient of determination (R 2 ) and the ratio of performance to the interquartile range (RPIQ). The SVR model depicted the best performance for parameter TS with NRMSE (0.17), R 2 (0.82) and RPIQ (2.91). The unit increase or decrease in the coefﬁcients of rainfall characteristics displays the weighted deviation in the values of pollutant parameters. Non-linear Support Vector Regression models conﬁrmed that both antecedent dry days and rainfall are correlated with signiﬁcant stormwater quality parameters. The conclusions drawn can provide effective information to decision-makers to employ an appropriate treatment train approach of varied source control measures (SCM) to be proposed to treat and mitigate runoff in an open stream. This holistic approach serves the stakeholder’s objectives to manage stormwater efﬁciently. The research can be further extended by selecting a multi-criteria decision-making tool to adopt the best SCM and its multiple potential combinations.


Introduction and Background
The world population reached 8 billion in November 2022 [1]. The lack of fresh water is the foremost concern today. The strain on the water systems will further rise by 2050 when the world population will be between 9.4 and 10.2 billion, an increase from 22% to 34% [2]. Rising water demand is a result of a rising population, a growing economy, and is in a hot, semi-arid region that borders a tropical, wet, and dry climate with an average temperature ranging from 19 • C (66 • F) to 33 • C (91 • F). Mula, Mutha and Mula-Mutha are three rivers that flow through the Pune Municipal Corporation area [26]. The area of Pune is split up into 23 basins, named A to W, as shown in the map in Figure 1. Each of these basins has a network of one or more naturally occurring streams that transport the stormwater into the Mula and Mutha rivers. The slopes of these networks are generally sufficient to carry reasonable stormwater runoff. These natural streams and their tributaries serve as the main drainage routes. Out of these 23 basins, G basin as highlighted in Figure 1, is a peri-urban area located on expanded peripheral boundaries of the city. G basin is a mix of land uses which includes residential, commercial, developing, and urban-rural areas.
The world urbanization prospects point out that the Pune urban clusters will have a population of 8.1 million by 2030 [27]. The urbanization rates inflate significantly, laying acute pressure on the already stretched infrastructure. This unplanned and uncontrolled expansion typically consists of randomly placed different land uses such as residential, commercial, agricultural, industrial, recreational and urban poor localities [28]. This gives rise to a mixed urban fabric with loads of pollution. Urbanization has compounded the impervious areas by around 70%, as in Figure 2 and discussed in Section 3.1 below, leaving behind very small space for green covers or sustainable stormwater drainage services. This surge in the impervious area has also raised the runoff to an enormous magnitude [29]. Rapid urbanization has substantially altered the nature of a city's drainage patterns over time [29]. rainfall conditions [26]. Rainfall is unevenly distributed within the district due to geo-graphical conditions. During the summer, the southwest monsoon winds bring most of the rain, accounting for approximately 87% of total rainfall. The city's annual rainfall, which is estimated to be 722 mm, falls between June to September, with July being the wettest month of the year. The city is in a hot, semi-arid region that borders a tropical, wet, and dry climate with an average temperature ranging from 19 °C (66 °F) to 33 °C (91 °F). Mula, Mutha and Mula-Mutha are three rivers that flow through the Pune Municipal Corporation area [26]. The area of Pune is split up into 23 basins, named A to W, as shown in the map in Figure 1. Each of these basins has a network of one or more naturally occurring streams that transport the stormwater into the Mula and Mutha rivers. The slopes of these networks are generally sufficient to carry reasonable stormwater runoff. These natural streams and their tributaries serve as the main drainage routes. Out of these 23 basins, G basin as highlighted in Figure 1, is a peri-urban area located on expanded peripheral boundaries of the city. G basin is a mix of land uses which includes residential, commercial, developing, and urban-rural areas.
The world urbanization prospects point out that the Pune urban clusters will have a population of 8.1 million by 2030 [27]. The urbanization rates inflate significantly, laying acute pressure on the already stretched infrastructure. This unplanned and uncontrolled expansion typically consists of randomly placed different land uses such as residential, commercial, agricultural, industrial, recreational and urban poor localities [28]. This gives rise to a mixed urban fabric with loads of pollution. Urbanization has compounded the impervious areas by around 70%, as in Figure 2 and discussed in Section 3.1 below, leaving behind very small space for green covers or sustainable stormwater drainage services. This surge in the impervious area has also raised the runoff to an enormous magnitude [29]. Rapid urbanization has substantially altered the nature of a city's drainage patterns over time [29].  Over the period paved areas have significantly increased while the open pervious spaces disappeared. The increase in land demand, rapid urbanization, encroachment, and expansion of concrete roads have adversely affected the current natural streams, reducing their widths in several places. The urban growth along the streams has not been planned and executed scientifically. Due to this, new areas have emerged that are vulnerable to flooding even during periods of moderate rainfall [30]. This is particularly true for these newly included fringe areas in Pune Municipal Corporation. Before they entered the boundaries of the Corporation, these areas had a rural character with no control over developmental activities. The uncontrolled development of "urban poor localities" generally happened near these nallas (the naturally occurring open streams). The drainage paths have become susceptible to the build-up of various kinds of solid waste, and wastewater. For this reason, the carrying capacity has decreased, the silt load has increased and maintenance has become more challenging [28]. The natural drains cannot remove this amount of stormwater from the city's vastly expanded settlements without the aid of an engineered stormwater system. Pune Municipal Corporation is putting efforts to line and widen these drains wherever possible to accommodate this growing volume, although the ground reality is alarming [31]. Therefore, there is a need to analyze and model the pollutants entering the natural water bodies.
In the present study, considering the mixed land use nature, the focus is on the open channel/stream/nallas where all such stormwater drains open up along with the surface runoff due to its natural slopes turning up into these natural streams. There are four main streams/natural drains flowing through the selected G basin. Out of these four streams, the longest stream (of length nearly about 4 km) located in the central part of G Basin was selected, which ultimately meets the Mula river. For the selection of sampling stations, a site survey was conducted. This site survey was done with the help of a drainage map provided by the Pune Municipal Corporation. During the survey, a total of eleven sampling stations were identified. The sampling stations were selected at the points where several small open streams or constructed stormwater drains were joining this selected stream. The sampling stations were numbered from the peripheral region towards the river from 11 to 1, respectively. The outfall sampling point, where this mainstream meets the Mula river, was also designated as shown in Figure  3. Over the period paved areas have significantly increased while the open pervious spaces disappeared. The increase in land demand, rapid urbanization, encroachment, and expansion of concrete roads have adversely affected the current natural streams, reducing their widths in several places. The urban growth along the streams has not been planned and executed scientifically. Due to this, new areas have emerged that are vulnerable to flooding even during periods of moderate rainfall [30]. This is particularly true for these newly included fringe areas in Pune Municipal Corporation. Before they entered the boundaries of the Corporation, these areas had a rural character with no control over developmental activities. The uncontrolled development of "urban poor localities" generally happened near these nallas (the naturally occurring open streams). The drainage paths have become susceptible to the build-up of various kinds of solid waste, and wastewater. For this reason, the carrying capacity has decreased, the silt load has increased and maintenance has become more challenging [28]. The natural drains cannot remove this amount of stormwater from the city's vastly expanded settlements without the aid of an engineered stormwater system. Pune Municipal Corporation is putting efforts to line and widen these drains wherever possible to accommodate this growing volume, although the ground reality is alarming [31]. Therefore, there is a need to analyze and model the pollutants entering the natural water bodies.
In the present study, considering the mixed land use nature, the focus is on the open channel/stream/nallas where all such stormwater drains open up along with the surface runoff due to its natural slopes turning up into these natural streams. There are four main streams/natural drains flowing through the selected G basin. Out of these four streams, the longest stream (of length nearly about 4 km) located in the central part of G Basin was selected, which ultimately meets the Mula river. For the selection of sampling stations, a site survey was conducted. This site survey was done with the help of a drainage map provided by the Pune Municipal Corporation. During the survey, a total of eleven sampling stations were identified. The sampling stations were selected at the points where several small open streams or constructed stormwater drains were joining this selected stream. The sampling stations were numbered from the peripheral region towards the river from 11 to 1, respectively. The outfall sampling point, where this mainstream meets the Mula river, was also designated as shown in Figure 3. Hydrology 2023, 10, x FOR PEER REVIEW 5 of 17

Field and Laboratory Data Collection
Manual grab sampling was carried out twice, during the rainfall and after the rainfall, at eleven sampling locations in the natural drain in an urbanized area, along with the outfall location. The samples were collected for twenty-two storm events, during the monsoon season over four years from 2018-2021. A rain logger with a data acquisition system was installed in the G basin to acquire daily precipitation data for all the twenty-two storm events. The depth and velocity were measured at all the stations, except station numbers 3,6 and 10, to ensure the continuity of the flow in the stream. These stations were inaccessible to take up the readings with depth and velocity meters. The samples were obtained in containers that had been cleaned beforehand with 10% HNO3, rinsed with tap water and distilled water, completely dried, and then sealed in the lab. The collected samples were brought to the Environmental Engineering Laboratory of Symbiosis International University. All standard sampling procedures and sampling protocols were followed during sampling using the procedures specified in the "Caltrans Stormwater Monitoring Protocol Guidance Manual" [32]. The quality assurance and quality control procedure include the suggested practices for sampling, preservation, storage, transport, laboratory testing, field blanks and laboratory blanks. Water samples were analyzed for water quality parameters, including pH, Conductivity, Turbidity, Total solids (TS), Total Suspended Solids (TSS), Total Dissolved Solids (TDS), Bio-chemical Oxygen Demand (BOD5), Chemical Oxygen Demand (COD), Dissolved Oxygen (DO), Phosphate, Ammonia and Nitrate.

Data Pre-Processing and Analysis Techniques
Univariate statistical analysis was performed on data collected from twenty-two storm events. The missing values were determined by the predictive mean method and outliers were removed using the box detection technique. The trend analysis of spatial variation of pollutants was carried out from the farthest point towards the outfall. Furthermore, to understand the behavioral pattern, the water quality index (WQI) was calculated at all the station points [33,34]. The water quality index has been determined utilizing the standard method of drinking water quality. The calculated values were compared to the BIS standard and recommendations. The "weighted arithmetic index method" by Brown et al. 1970 [35] as shown in the following equations is used.

Field and Laboratory Data Collection
Manual grab sampling was carried out twice, during the rainfall and after the rainfall, at eleven sampling locations in the natural drain in an urbanized area, along with the outfall location. The samples were collected for twenty-two storm events, during the monsoon season over four years from 2018-2021. A rain logger with a data acquisition system was installed in the G basin to acquire daily precipitation data for all the twenty-two storm events. The depth and velocity were measured at all the stations, except station numbers 3,6 and 10, to ensure the continuity of the flow in the stream. These stations were inaccessible to take up the readings with depth and velocity meters. The samples were obtained in containers that had been cleaned beforehand with 10% HNO3, rinsed with tap water and distilled water, completely dried, and then sealed in the lab. The collected samples were brought to the Environmental Engineering Laboratory of Symbiosis International University. All standard sampling procedures and sampling protocols were followed during sampling using the procedures specified in the "Caltrans Stormwater Monitoring Protocol Guidance Manual" [32]. The quality assurance and quality control procedure include the suggested practices for sampling, preservation, storage, transport, laboratory testing, field blanks and laboratory blanks. Water samples were analyzed for water quality parameters, including pH, Conductivity, Turbidity, Total solids (TS), Total Suspended Solids (TSS), Total Dissolved Solids (TDS), Bio-chemical Oxygen Demand (BOD 5 ), Chemical Oxygen Demand (COD), Dissolved Oxygen (DO), Phosphate, Ammonia and Nitrate.

Data Pre-Processing and Analysis Techniques
Univariate statistical analysis was performed on data collected from twenty-two storm events. The missing values were determined by the predictive mean method and outliers were removed using the box detection technique. The trend analysis of spatial variation of pollutants was carried out from the farthest point towards the outfall. Furthermore, to understand the behavioral pattern, the water quality index (WQI) was calculated at all the station points [33,34]. The water quality index has been determined utilizing the standard method of drinking water quality. The calculated values were compared to the Hydrology 2023, 10, 66 6 of 17 BIS standard and recommendations. The "weighted arithmetic index method" by Brown et al. 1970 [35] as shown in the following equations is used.
where, W i = the relative weight, W i = the weight of each parameter, and n = the number of parameters where, q i = the quality rating, c i = the concentration of each parameter in each water sample in mg/L, and s i = the Indian drinking water standard for each parameter in mg/L, according to the guidelines of the ISI7 10500, 2012 [36].
For computing the WQI, SI i = the sub-index of ith parameter. q i = the rating based on the concentration of ith parameter and n = the number of parameters.
The WQI values are classified into five levels of water quality, corresponding grade and probable use as given in Table 1 below [34,35]. For Calculating WQI, six parameters, namely, pH, Conductivity, Turbidity, BOD, TDS and DO were considered. Principal component analysis (PCA) was conducted to find out the most influential pollutant parameters. PCA rationalises a set of raw data into a few principal components that retain the most variance within the original data. RStudio software (version 1.1.383) was used for undertaking the multivariate data analysis methods [37].
In the present study, multivariate linear regression analysis was carried out considering the stormwater quality parameter as the dependent variable, and "Rainfall" and "Antecedent Dry days (ADD)" as the independent variables.
Several non-linear methods are available for regression such as "artificial neural networks, kernel discriminant analysis, kernel partial least squares, and support vector machines" [38][39][40]. Support vector machine (SVM) is a prominent model that demonstrates an advanced form of machine learning and is well-recognized for its capability to augment regression and classification [41,42]. N. Sapankevych reviewed that "Using kernel techniques as part of a time series prediction results in a more accurate estimation of the data, even when the data series is nonlinear, non-stationary, and not characterized" [43]. SVM has the capacity to generalize due to the implementation of structural risk minimization for objective functions [44]. Support Vector Regression initially considers a basic function: M = the number of independent variables. To evaluate β and β 0 , minimize the: The solution forf where, Considering the β 0 to be consumed in the kernel function. The N × N matrix HH t comprises inner products between pairs of observations i, i j ; i.e., the calculation of an inner product kernel {HH t } i, i j = K (x i , x i j ) and I is the Identity matrix 'W' Coefficients that can be extracted are: where, i = position of the observation.
To implement Support Vector Regression on the data, RStudio software was used. In RStudio, Radial Basis Kernel Function (RBF) was found to be best suited. The Kernel Function other than RBF also gives a similar Normalized Root Mean Square Error, hence, RBF was used as a kernel function since it is provided in the RStudio Software by default.
For all the parameters radial basis kernel function was used.
In SVM, using an RBF kernel, the data sets were tuned by the two hyperparameters C and gamma (γ). C hyperparameter contains cost and epsilon(e) parameters and it lies (0, ∞). 'C' value closer to zero suggests less penalty for any misfit of training data and in return reduces the training accuracy. The Gamma (γ) parameter of RBF controls the distance of the influence of a single training point. Normalized Root Mean Square Error, Coefficient of determination (R 2 ) and the ratio of performance to the interquartile range (RPIQ) [45] was used to evaluate the predictive performance of a model, by Equations (13) In general, having a large R 2 and small NRMSE means that the prediction is good. Furthermore, having larger RPIQ values means that the model predicts well.

Urbanization Trend of G-Basin over Two Decades
The trend of change in the land covers of the G basin over the past two decades was studied. Landsat satellite imagery of the year 2005, 2010, 2015 and 2020 was classified by the Maximum likelihood classification method using ArcMap 10.6.1 to obtain the land use maps. In this study, the image was classified into Urban areas (mixed development), Vegetation cover, Barren land, and waterbody, as shown in Figure 2. The graph in Figure 4 shows the patterns of change in land cover type over twenty years from 2005 to 2020. The graph shows a burgeoning increase in urban land between the years 2005 to 2020, reducing barren land to almost half of its area, and indicating the onset of urban sprawl in the G-basin. Urban areas increased up to 70% in the year 2020 with a reduction of 10% in vegetation cover and up to 40% reduction in barren land. The urbanization trend also portrays a marginal decrease in water bodies; for this there are many possible reasons, including human encroachment and climate change.
In general, having a large R 2 and small NRMSE means that the prediction is good. Furthermore, having larger RPIQ values means that the model predicts well.

Urbanization Trend of G-Basin over Two Decades
The trend of change in the land covers of the G basin over the past two decades was studied. Landsat satellite imagery of the year 2005, 2010, 2015 and 2020 was classified by the Maximum likelihood classification method using ArcMap 10.6.1 to obtain the land use maps. In this study, the image was classified into Urban areas (mixed development), Vegetation cover, Barren land, and waterbody, as shown in Figure 2.
The graph in Figure 4 shows the patterns of change in land cover type over twenty years from 2005 to 2020. The graph shows a burgeoning increase in urban land between the years 2005 to 2020, reducing barren land to almost half of its area, and indicating the onset of urban sprawl in the G-basin. Urban areas increased up to 70% in the year 2020 with a reduction of 10% in vegetation cover and up to 40% reduction in barren land. The urbanization trend also portrays a marginal decrease in water bodies; for this there are many possible reasons, including human encroachment and climate change.

Stormwater Quality Analysis of Urban Surface Runoff
As discussed in Section 2.3, the predictive mean method was used to determine 9% of the missing values. It was mainly because of the inaccessibility of the stream site due to the large growth of weeds as shown in Figure 5f. Table 1 provides the mean concentration and standard deviation (SD) of quality parameters for various station points on the stream. The Bureau of Indian Standards for categorizing surface water sources describe that the quality of water is below class E [46,47]. The parameters were deteriorated beyond class E and appear like sewage or industrial effluents. Therefore, this stormwater certainly demands treatment before discharge.

Stormwater Quality Analysis of Urban Surface Runoff
As discussed in Section 2.3, the predictive mean method was used to determine 9% of the missing values. It was mainly because of the inaccessibility of the stream site due to the large growth of weeds as shown in Figure 5f. Table 1 provides the mean concentration and standard deviation (SD) of quality parameters for various station points on the stream. The Bureau of Indian Standards for categorizing surface water sources describe that the quality of water is below class E [46,47]. The parameters were deteriorated beyond class E and appear like sewage or industrial effluents. Therefore, this stormwater certainly demands treatment before discharge. from lawns, parks and agricultural lands present in the study area located slightly away from the open channel [8].
The standard deviation is observed to be higher for solids parameters, turbidity, BOD, COD and phosphate, which indicates huge variations in stormwater quality. This makes urban stormwater quality control more challenging to understand. The variations in the quality parameters at all the stations are attributed to the mixed land usage, the haphazard  For further analysis, the spatial behavioral trends were observed station-wise from the farthest station number 11 towards the outfall. The data were also analyzed to compare pollutant concentrations during rainfall and after rainfall at each station. The pollutant concentration of turbidity, TSS, TDS and COD show an increasing trend from peripheral stations towards the discharging point at the outfall. It is also observed that the pollution concentration of TDS, TSS, DO, BOD, COD and phosphates were higher for the samples collected during the rainfall. The higher concentrations of BOD and COD may be organic in nature possibly because of the erosion of roadsides due to surface runoff. The presence of animal excreta, human faeces due to open defecation and leakage in sewage drainage lines were also identified as additional sources. There were no curbs on the roads in several places. The channeling at most of the places was also missing. Thus, the grass cover, loose soil, or dust present on the roadside was picked up by the surface runoff. The main reason for an increase in phosphate concentration is likely to be originated from decomposed plant materials such as leaves, grass clippings and eroded soil [8].
The variation in the spatial distribution of impervious regions significantly affects pollution concentrations [6,48]. The time and velocity of travel of surface runoff towards these naturally occurring open channels certainly affect the pollution concentrations at these sampling locations. The pollution concentration ranges of ammonia and nitrate were observed to be more for the samples collected after the rainfall. The nitrogenous concentrations can be attributed to the leaching of fertilizers from lawns, parks and agricultural lands present in the study area located slightly away from the open channel [8].
The standard deviation is observed to be higher for solids parameters, turbidity, BOD, COD and phosphate, which indicates huge variations in stormwater quality. This makes urban stormwater quality control more challenging to understand. The variations in the quality parameters at all the stations are attributed to the mixed land usage, the haphazard distribution of impervious and pervious areas and the lack of appropriate management practices with a primary focus on quality. These variations can also be justified by field observations such as improper solid waste management practices near the roadside drains as well as in the streams. The variation in stormwater quality can also be attributed to the throwing of paints, chemicals and other liquid wastes in these stormwater drains/open streams, and the encroachment of these streams by carrying out construction activities. The images of these field observations of the study area are shown in Figure 5.
The WQI values were calculated as shown in Equation (1) and, as shown in Figure 6, ranged from 46.9 to 153.9 for the samples collected during rainfall, and from 41.20 to 87.70 for samples collected after rainfall. It is noted that the values of the physicochemical parameters increased substantially for samples taken during the storm event than those taken after the storm event. This trend was observed at all the sampling stations. The WQI values during the storm event were greater than the samples collected after the storm event. This also reflects the first-flush effect during the storm event, which reduces after the rainfall. In both cases, the WQI values exhibited an increasing trend towards the outfall. Moreover, the majority of the WQI parameters are of poor to very poor water quality levels, as referred to in Table 2. This analysis strongly justifies the need for a specific degree of source control treatment measures before the stormwater merges into the Mula river. distribution of impervious and pervious areas and the lack of appropriate management practices with a primary focus on quality. These variations can also be justified by field observations such as improper solid waste management practices near the roadside drains as well as in the streams. The variation in stormwater quality can also be attributed to the throwing of paints, chemicals and other liquid wastes in these stormwater drains/open streams, and the encroachment of these streams by carrying out construction activities. The images of these field observations of the study area are shown in Figure 5.
The WQI values were calculated as shown in equation 1 and, as shown in Figure  6, ranged from 46.9 to 153.9 for the samples collected during rainfall, and from 41.20 to 87.70 for samples collected after rainfall. It is noted that the values of the physicochemical parameters increased substantially for samples taken during the storm event than those taken after the storm event. This trend was observed at all the sampling stations. The WQI values during the storm event were greater than the samples collected after the storm event. This also reflects the first-flush effect during the storm event, which reduces after the rainfall. In both cases, the WQI values exhibited an increasing trend towards the outfall. Moreover, the majority of the WQI parameters are of poor to very poor water quality levels, as referred to in Table 2. This analysis strongly justifies the need for a specific degree of source control treatment measures before the stormwater merges into the Mula river.

Significant Stormwater Quality Parameters
There were considerable ambiguities in the attempts to formulate the process of pollution generation, its transmission and dispersal. This implies that the urban form has an impact on the properties of primary stormwater pollutants, which suggests that the effectiveness of structural measures cannot be universal but needs to be addressed locally [8]. To explore this association among different pollutant parameters for mixed land use in the current study of the peri-urban area, Principal Component Analysis (PCA) was used, as discussed in Section 2.3.

Significant Stormwater Quality Parameters
There were considerable ambiguities in the attempts to formulate the process of pollution generation, its transmission and dispersal. This implies that the urban form has an impact on the properties of primary stormwater pollutants, which suggests that the effectiveness of structural measures cannot be universal but needs to be addressed locally [8]. To explore this association among different pollutant parameters for mixed land use in the current study of the peri-urban area, Principal Component Analysis (PCA) was used, as discussed in Section 2.3.
Out of the twelve parameters, the components with the highest total variance were considered the most significant. The number of significant principal components was determined using Kaiser's criteria. The scree plot of components, as shown in Figure 7, also exhibited that only four components, having eigenvalues greater than one, were retained. These components reported for 77.7% of the total variance, out of which the first component (PC1) and the second component (PC2) accounted for 34.60% and 20.42% of the total variance, respectively. They also strongly exhibited a positive relationship with TS and dissolved oxygen, respectively. Whereas the third component (PC3) accounted for 13% of the total variance and exhibited a positive association of TSS, Phosphate, and Nitrate. However, the fourth component (PC4) accounted for only 9% of the total variance and showed a positive relationship with turbidity. Thus, the most significant stormwater quality parameters are TS, DO, TSS, Phosphate, Nitrate and Turbidity. Biplots were used to determine the type of relationship among the parameters present. Each vector represents individual parameters. The angle between the vectors is inversely proportional to the correlation. The closer the vectors are to one another more the correlation. Vectors are negatively correlated when they lie opposite each other. Vectors which are perpendicular to each other are said to be uncorrelated. Out of the twelve parameters, the components with the highest total variance were considered the most significant. The number of significant principal components was determined using Kaiser's criteria. The scree plot of components, as shown in Figure 7, also exhibited that only four components, having eigenvalues greater than one, were retained. These components reported for 77.7% of the total variance, out of which the first component (PC1) and the second component (PC2) accounted for 34.60% and 20.42% of the total variance, respectively. They also strongly exhibited a positive relationship with TS and dissolved oxygen, respectively. Whereas the third component (PC3) accounted for 13% of the total variance and exhibited a positive association of TSS, Phosphate, and Nitrate. However, the fourth component (PC4) accounted for only 9% of the total variance and showed a positive relationship with turbidity. Thus, the most significant stormwater quality parameters are TS, DO, TSS, Phosphate, Nitrate and Turbidity. Biplots were used to determine the type of relationship among the parameters present. Each vector represents individual parameters. The angle between the vectors is inversely proportional to the correlation. The closer the vectors are to one another more the correlation. Vectors are negatively correlated when they lie opposite each other. Vectors which are perpendicular to each other are said to be uncorrelated. From the biplot in Figure 8, between the first two components, which accounts for 55% of the total variance, a positive correlation was observed between Ammonia and TDS. From this, it can be concluded that most of these nutrients were in the dissolved state. A positive correlation was also observed between Phosphate, Turbidity and BOD, which implies that these types of nutrients and organic load are particle bound. Similarly, it was observed from a biplot, as shown in Figure 9, for PC1 and PC3, that DO, TDS and pH were strongly correlated. The positive correlation among turbidity, COD and BOD highlights that the pollutant load is particle bound, as together they negatively correlated with TS. Stormwater management begins at the point when the raindrops strike the ground surfaces. Firstly, non-structural measures include source control, where practices can be incorporated which remove the pollutants before contact with rainfall. One of them is regular sweeping of roads, cleaning of open streams before monsoon, washing of the roads, maintenance of the stormwater drainage facilities, etc. The second part, that is the structural measures, includes the adoption of one or more multiple source control measures/low impact development practices, From the biplot in Figure 8, between the first two components, which accounts for 55% of the total variance, a positive correlation was observed between Ammonia and TDS. From this, it can be concluded that most of these nutrients were in the dissolved state. A positive correlation was also observed between Phosphate, Turbidity and BOD, which implies that these types of nutrients and organic load are particle bound. Similarly, it was observed from a biplot, as shown in Figure 9, for PC1 and PC3, that DO, TDS and pH were strongly correlated. The positive correlation among turbidity, COD and BOD highlights that the pollutant load is particle bound, as together they negatively correlated with TS. Stormwater management begins at the point when the raindrops strike the ground surfaces. Firstly, non-structural measures include source control, where practices can be incorporated which remove the pollutants before contact with rainfall. One of them is regular sweeping of roads, cleaning of open streams before monsoon, washing of the roads, maintenance of the stormwater drainage facilities, etc. The second part, that is the structural measures, includes the adoption of one or more multiple source control measures/low impact development practices, which are reducing the volumes and pollution from the stormwater. The analysis of significant stormwater quality parameters, the inter-relation and their nature will help the decision makers to develop the strategies for source control as well as to prevent the build-up of these pollutants. Furthermore, it is vital to understand if any correlation exists between these stormwater quality parameters and rainfall characteristics further discussed in the next section. logy 2023, 10, x FOR PEER REVIEW 12 of 17 which are reducing the volumes and pollution from the stormwater. The analysis of significant stormwater quality parameters, the inter-relation and their nature will help the decision makers to develop the strategies for source control as well as to prevent the buildup of these pollutants. Furthermore, it is vital to understand if any correlation exists between these stormwater quality parameters and rainfall characteristics further discussed in the next section.

Relationship of Stormwater Quality Parameters with Rainfall Characteristics
From the literature, many studies were undertaken to understand the influence of rainfall characteristics on pollutant concentrations in stormwater. Yongwei Gong and Xiaoying Liang [49] analyzed the temporal distribution of rainfall, rainfall depth and rainfall duration under different dry days to understand the dual effect of rainfall characteristics and surface flooding on TSS at the outfall of the catchment. Arora et al. (2013) took up the regression analysis of varied sub-watersheds [8]. Rainfall and antecedent dry days were the two independent variables that were subjected to linear regression analysis with pollution concentrations as BOD, COD, TSS, TDS, TKN, TP, oil and grease, total and faecal coliforms and heavy metals acting as the dependent variables [8]. It was observed that the coefficient of determination for each model was not significantly higher than 50% for any of the parameters. This inference suggests that the behaviour of the data was not linear. The previous studies have considered sampling at a single point at upland surfaces, where the liner models worked aptly. In this study, the in-stream behaviour of pollutants for various sampling points for sub-watersheds of varied mixed land use may be the reason attributed to the observed non-linearity.

Relationship of Stormwater Quality Parameters with Rainfall Characteristics
From the literature, many studies were undertaken to understand the influence of rainfall characteristics on pollutant concentrations in stormwater. Yongwei Gong and Xiaoying Liang [49] analyzed the temporal distribution of rainfall, rainfall depth and rainfall duration under different dry days to understand the dual effect of rainfall characteristics and surface flooding on TSS at the outfall of the catchment. Arora et al. (2013) took up the regression analysis of varied sub-watersheds [8]. Rainfall and antecedent dry days were the two independent variables that were subjected to linear regression analysis with pollution concentrations as BOD, COD, TSS, TDS, TKN, TP, oil and grease, total and faecal coliforms and heavy metals acting as the dependent variables [8]. It was observed that the coefficient of determination for each model was not significantly higher than 50% for any of the parameters. This inference suggests that the behaviour of the data was not linear. The previous studies have considered sampling at a single point at upland surfaces, where the liner models worked aptly. In this study, the in-stream behaviour of pollutants for various sampling points for sub-watersheds of varied mixed land use may be the reason attributed to the observed non-linearity.
For this non-linear regression, a machine learning algorithm 'Support Vector Regression (SVR)' was considered, as discussed in Section 2.3. This machine learning algorithm works fine for a small set of observations, unlike other machine learning regression algorithms. One of the key features of Support Vector Regression (SVR) is that it aims to minimize the generalized error bound, rather than the observed training error to accomplish generalized performance. This generalization error bound is the blend of the training error and a regularization term that monitors the complexity of the For this non-linear regression, a machine learning algorithm 'Support Vector Regression (SVR)' was considered, as discussed in Section 2.3. This machine learning algorithm works fine for a small set of observations, unlike other machine learning regression algorithms. One of the key features of Support Vector Regression (SVR) is that it aims to minimize the generalized error bound, rather than the observed training error to accomplish generalized performance. This generalization error bound is the blend of the training error and a regularization term that monitors the complexity of the hypothesis space. In classical support vector regression, it is challenging to define the proper value for the parameter ε in advance. Fortunately, a new algorithm-"ν support vector regression" (ν-SVR)-partially solves this issue, where, "ε itself is a variable in the optimization process", and is regulated by another new parameter "ν ∈ (0, 1)". "ν is the upper bound on the fraction of error points or the lower bound on the fraction of points inside the ε-insensitive tube. Thus, a right ε can be automatically found by choosing the appropriate ν, which adapts the accuracy level to the data at hand. This makes ν a more suitable parameter than the one used in ε-SVR".
After fitting the SVR model for this parameter, the parameters were tuned to better optimize the kernel function. The complete data for each parameter was divided into four different combinations of training and testing sizes, viz. 70-30%, 80-20%, 90-10% and 95-5%, as input. For each parameter, the combination depicting the least difference in training and testing error was the best fit, as shown in Table 3. Hence, this size of training and testing data will give the most accurate prediction. Table 4 shows the results of the support vector regression models, where the coefficients of rainfall and antecedent dry days were derived. The coefficients shown in Table 4 are in the form given in Equation (16). The coefficients suggest that a unit increase/decrease in the value of Rainfall (mm) and ADD (days) results in a weighted (α i , for i th observation) increase/decrease in the value of the stormwater quality parameter. Non-linear SVR models confirmed that both antecedent dry days and rainfall are correlated with stormwater quality. Table 4 depicts NRMSE, R 2 and RPIQ values to evaluate the accuracy of the models. The SVR model depicted the best performance for parameter TS with NRMSE (0.17), R 2 (0.82) and RPIQ (2.91). Except for turbidity NRMSE (0.85), R 2 (0.39) and RPIQ (0.79), all the other parameters have a good fit for the SVM model of regression for the radial basis function. This suggests turbidity also be dependent on other rainfall characteristics, apart from the rainfall and ADD.
Knowing the two variables of ADD and Rainfall, this modelling was performed to predict the concentration levels of the pollutant parameters that fit well within the model. It is vital to first assess runoff characteristics and meticulously analyse the actual situation with respect to rainfall and catchment characteristics before initiating structural control strategies. During storm events, hydraulic and physical processes remove larger solids and associated pollutants, while biological and chemical processes treat finer solids and dissolved pollutants [50]. These holistic solutions depend on many factors, including the availability of appropriate space in the peri-urban area, physical site conditions, as well as regulatory requirements. This study will aid to design a treatment train approach of source control measures with the purpose to control the pollutants at the source and further a stormwater treatment facility to minimize the volume and pollution loads entering the open stream.

Conclusions
It is observed from the experimental investigations that the stormwater quality parameters have deteriorated below class E, which is equivalent to sewage or industrial effluents.
The values of the physicochemical parameters significantly increased for samples taken during the storm event. This pattern was observed across all sampling stations. Water Quality Index (WQI) values for the samples collected during the storm event were higher than those for samples collected after the storm event. The WQI values show an increasing trend from the peripheral region towards the outfall location in the river. Most of the WQI values fall in poor to very poor water quality levels, with few of them above the unfit mark.
Principal component analysis (PCA) identified TS, DO, TSS, Phosphate, Nitrate and Turbidity as the most significant stormwater quality parameters. The PCA biplots showed a positive correlation among various parameters. The SVR model with radial basis kernel function (RBF) is developed to understand the non-linear complex behaviour of rainfall characteristics with these stormwater pollutant parameters. The normalized root mean square error determines the accuracy of each model. The unit increase or decrease in the coefficients of rainfall characteristics displays the weighted deviation in the values of pollutant parameters.
This study demonstrates that assessing the stormwater characteristics and meticulous consideration of all the existing conditions is crucial before embarking on expensive source control strategies. Location-specific analysis can more accurately handle pollutant reduction efforts to achieve sustainable solutions for such scenarios in developing cities. Overall, given the cost of treatment in developing countries, source control strategies should be the main focus of management practices, rather than stormwater runoff treatment. An integrated land use planning and design strategy to mitigate land use planning impacts on the environment is increasingly being promoted as an impactful method of reducing runoff and pollutant loadings into streams. The conclusions drawn can provide effective information to decision-makers to employ an appropriate treatment train approach of varied source control measures to be proposed to treat and mitigate runoff in an open stream rather than an end pipe approach. This will holistically serve the stakeholder's objectives to manage stormwater efficiently. The research work can be further augmented by adopting selecting a multi-criteria decision-making tool to adopt the best SCM and its multiple potential combinations.

Data Availability Statement:
The data that support the findings of this study are available from the corresponding author, upon reasonable request.