Water Quality Analysis of Drinking Water Resource Lake Sapanca and Suggestions for the Solution of the Pollution Problem in the Context of Sustainable Environment Approach

: Lake Sapanca is the drinking water source of the Sakarya province of Turkey. Intensive urbanization in the region is the main obstacle to implementing appropriate physical planning and measures to adapt to rapid change. The monitoring of the water quality parameters in the planning and management of the lakes is signiﬁcant. The Artiﬁcial Neural Network (ANN), a mathematical representation of the human brain’s functioning, was employed to estimate the Lake’s Dissolved Oxygen (DO) concentration. pH, Magnesium (Mg), Temperature (Temp), Chemical Oxygen Demand (COD), Orthophosphate (o-PO 4 ), Nitrite Nitrogen (NO 2 -N), and Nitrate Nitrogen (NO 3 -N) were used as independent parameters. The successful ANN model gives better results compared to the traditional multiple linear regression (MLR) analysis. The developed model can be used for forecast purposes to complete the missing data in the future and support the decision process for pollution reduction through sustainable environmental management. The eutrophication threat for Lake Sapanca has been revealed. The main objective is to create the scientiﬁc infrastructure that will draw attention to the rapid urbanization problem with ANN and eutrophication models’ outputs. It has been understood that the protection of the water budget of Lake Sapanca is the primary solution method in terms of ecological sustainability to eliminate the existing pollution.


Introduction
Rapid urbanization with a fast-growing population and migration dynamics causes problems that are difficult to solve in environmental and natural resources [1]. Uncontrolled urbanization affects many ecological issues and causes concerns about the cities' future and the quality of life [2]. Increasing construction and unplanned development parallel with the population increase makes it impossible to preserve rural ecological areas and establish sustainable cities [3]. The green spaces planning strategy is one of the well-known planning methods to conserve natural resources, and it is crucial among the ecological sustainability [4].
The main objective of this study, which is to draw attention to the rapid urbanization problem, has three major interrelated components. The first of these is to create an Artificial Neural Network (ANN) model, which is unique to the research field and gives successful results; the ANN model can be used for water quality forecasting and can provide a prediction about the future pollution situation. The second aim is to reveal the trophic status to draw attention to eutrophication, which is the most critical problem faced by the Lake. Through scientific data that will result from the realization of the first two objectives, the third and final aim is to reveal the pollution threat the Lake meets, thus drawing attention to the rapid urbanization problem, which is the leading cause of the Lake's current water quality degradation. This research is vital in addressing the drinking water supply of two important cities and does not solely attempt to present the successful ANN modeling process, unlike the ANN models in previous studies [5,6]. However, it also reveals the irreversible alteration of the Lake's trophic status due to the water quality deterioration over the years. The primary purpose here is to show the negative effect of rapid urbanization, which is the most critical obstacle against sustainability. The urban planning process is a heavy burden on the public budget. It is possible to convince government officials to prevent the rapid urbanization problem by presenting scientifically-based research findings.
Although there have been various researches, theses, and articles on the Lake Sapanca Basin, a limited study has addressed the spatial change with the basin's current land use. Due to the deterioration of the basin's natural environment because of the rapid urbanization, Sapanca Lake Basin has been chosen as a research area. Rapid urbanization's adverse effects have been comprehensively demonstrated by analyzing the water quality of the streams feeding the Lake and the Lake's eutrophic situation.

Study Area
Lake Sapanca exists in the Marmara region, close to the northwestern part of Turkey. It is located between the two cities; while the eastern end of the Lake is within the provincial border of Sakarya, the western part is in Kocaeli. It is a freshwater tectonic source that meets both cities' drinking water demands and is 16 km in the east-west direction and 5 km in the north-south direction. It has a moderate depth of 30 m and has a surface area of 46 km 2 . Lake's volume is about 1.3 billion m 3 . The Lake basin drainage area is 250 km 2 , and the maximum 54 m deep [7]. Southern mountains and northern hills surround the Lake. The principal streams feeding the Lake are shown in Figure 1.

Inflow Streams
The inflow streams are Balıkhane, Karaçay, Kuruçay, Mahmudiye,İstanbul, Keçi, Sarp, and Arifiye, and the outlet stream is Çark. The deterioration of the quality of the mentioned streams and Lake water is the leading issue that has to be resolved. Streams are polluting Lake Sapanca by carrying nutrients [8]. Nutrients are vital for living things. However, when it is excessive, it provokes the growth of algae. Chemicals in the content of excessively used cleaning chemicals also cause the extent of the algae. When it is extreme, oxygen in water is depleted, and an algal bloom called eutrophication occurs [9].

Source of Pollution
Lake Sapanca is a critical region of the Palearctic tectonic origin, a source of biodiversity, and its formation dates back to ancient times; it is a relatively old lake [10]. Today, many negative factors such as industrial, domestic, and agricultural such as the discharge of wastewater from various activities into surface waters, and heavy pollution, restrict living areas for the species. Recent studies have inferred that Lake Sapanca has lost its oligotrophic water feature and that the trophic state has changed from mesotrophic to eutrophic [8,11,12].

The Impact of Urbanization
According to the Turkish Statistical Institute (TUİK) official statistics, the population growth of over twenty percent in the last ten years at the research site is evidence of rapid urbanization in the Lake Sapanca basin [13]. The Sapanca region population, the district of Sakarya, was 37,652 in 2010 and 43,018 in 2020. The population of Sakarya province was 872,872 in 2010 and 1,042,649 in 2020. At the same time, the population of Kocaeli province was 1,560,138 in 2010 and 1,997,258 in 2020.
The research is intended to reveal the negative trend of the historical water quality status of Lake Sapanca due to rapid urbanization. Developing a practical method to characterize problems in the water body can help us understand pollution's impact. There are both urban and rural pollution sources in Lake Sapanca Basin. Massive industry and unplanned residential settlements, up to 70,000 settlers, are primary sources for the Lake Sapanca Basin. Agricultural and residential areas comprise about 35% of the Lake Sapanca Basin. Intensive and unplanned urbanization in the region is a considerable obstacle against preserving the basin water quality [14].

The Problem of Lake Eutrophication
Domestic waste has a high impact on the current pollution status of the Lake. Altundag et al. [11] revealed that nitrate and reactive phosphate potentially limit nutrients for algae production in the Lake. Proper waste management strategies and infrastructure rehabilitation are vital for the region. The number of fish species that live in the Lake is decreased significantly [15]. According to Arslan et al. [12] and Köker et al. [16], a negative trend was observed on the Lake's trophic status.

The Need for Environmental Management
The main problem of Lake Sapanca is the rapid urbanization and the lack of environmental management. Unfortunately, there is no specific management plan. The existence of critical industrial facilities and the region's high population density makes the Lake a vital water body that has to be monitored continuously [17].
Dozens of water companies in Lake Sapanca Basin get water from streams. The industry also uses water from here. Besides this, Kocaeli and Sakarya municipalities also use water [18]. The Lake is faced with pollution and salinity due to the extensive use of freshwater and uncontrolled pollution [19].

Experimental and Data Acquisition
Investigated data for this research was obtained with the help of the State Hydraulic Works (DSI) through the monitoring program for the Lake Sapanca streams [20]. Measured parameters were Dissolved Oxygen (DO), pH, Magnesium (Mg), Temperature (Temp), Chemical Oxygen Demand (COD), Orthophosphate (o-PO 4 ), Nitrite Nitrogen (NO 2 -N), and Nitrate Nitrogen (NO 3 -N). Measurements were conducted during January, April, July, and October, between 1998 and 2018. According to the standards, samples were transported in a cold box (4 • C) and analyzed within 24 h [21][22][23]. Test methods were; instrumental for pH, Temperature (Temp), DO, titrimetric for Mg, potassium dichromate oxidation for COD; the stannous chloride method for o-PO 4 ; ion chromatography for NO 2 -N, and NO 3 -N. Sampling points and coordinates are shown in Table 1.

Data-Driven Analysis
Multiple linear regression (MLR) and Artificial Neural Network (ANN) models were applied to estimate DO concentration for the streams linked to Lake Sapanca. Monthly average values were used for the analysis. Hence, the sample size of the data (n) is 84. Both MLR and ANN analyses were applied for eight feedings and one outlet out of nine rivers. A Bivariate (Pearson) correlation test was conducted to find significance (p-value). The multiple linear regression (MLR) model was developed as a traditional statistical approach. Finally, an artificial neural network (ANN) model was created using IBM SPSS Statistics for Windows, Version 21.0. Armonk, NY, USA. The software allows making an infinite number of iterations by the automatic architecture selection option. The analysis can continue until the best network architecture is detected.
There are two commonly used ANN methods in SPSS neural network module. The first is the Multi-Layer Perceptron (MLP), the second is the Radial Basis Function (RBF) artificial neural network models [24].
MLP uses nonlinear network parameters, RBF works with linear parameters, and both methods can be used in the same application areas. In this research, the performance of the MLP network was better and generated for all data sets.
The hyperbolic tangent function (tanh) was used as the activation function, and the identify function was used in the output layer. The Hyperbolic tangent function converts the data to a value between −1 and 1. The identity function does not make any changes to the variable. The batch learning method was selected as a type of training. It is an ideal method for small data sets. In batch learning, synaptic weights are updated after passing through the entire data set, and it is a frequently used method that minimizes the error. Gradient descent was selected as an optimization algorithm to reach the global minimum value, starting with random variables.
Depending on the training error, the best type of ANN architecture was determined. The ANN architecture has one hidden layer, and neuron numbers change for every stream of data. Typical ANN architecture in this study is shown in Figure 2.
This study points out the benefits of the ANN approach over traditional methods and conventional statistical techniques. The results show that ANN is a reliable approach for accurately predicting the concentration fluctuation of quality parameters for drinking water provided from Lake Sapanca.

Temporal Trends in Eutrophication Status
The research revealed the eutrophication threat for Lake Sapanca and the eight streams that feed it. The Vollenweider model was generated depending on long spatial and temporal studies on several hundred lakes. The Vollenweider diagram was implemented in many studies to show the lakes' trophic level [7,25,26]. Vollenweider [27] defines permissible and excessive loads through some empirical equations as follows. Reference of Vollenweider [27] belongs to a cult study of a significant scientist. Hundreds of similar studies were done later and are available in the literature, but the core study is this work published in 1976 (see Equations (1)-(3)).
Rw stands for the hydraulic residence time (HRT); the Mean depth of the Lake Sapanca (Z) is 30 m; V stands for the lake volume that equals 1,300,000,000 m 3 .
Hence, the hydraulic residence time (Rw) is, Q is inflow for the reservoir. Discharges (Q) and the total phosphorus (TP) concentrations of each stream were used to calculate the phosphorus loads. Total phosphorus loads and discharges of streams drain into the Lake Sapanca were calculated. Vollenweider diagram was depicted to show the trophic status of Lake Sapanca (see Figure 3).
There is an obvious risk of eutrophication for Lake Sapanca. Uncertainty of the phosphorus loading in recent years is depicted in Figure 3. Results show the continuous fluctuation of the trophic status of Lake Sapanca.

Traditional Statistical Analysis
Multiple Linear Regression is an analysis performed to reveal the relationship between one dependent variable and a set of independent variables associated with it. In multiple linear regression, each independent variable's degree of effect on the dependent variable is different. The null hypothesis is that the variable does not correlate with the dependent variable. p-value shows the significance level, and to reject the null hypothesis, the maximum acceptable p-value is 0.05. There is a statistically significant difference if p-value is in the range of 0.01 to 0.05 [28].
First of all, the data was checked among the level of significance. Two-tailed p-values were calculated through the bivariate (Pearson) correlation test (see Table 2). Since the significance (p-value) level is at least 0.05, this model is defined as statistically significant. Results are given within a 95 percent confidence interval.

Artificial Neural Network (ANN) Analysis
This study aims to create a successful ANN model using historical data, even in malfunctions in temporal data due to critical processes such as pandemics, extreme meteorological events, and human errors.
The most important difference of this study compared to previous studies is the use of ANN architecture, which gives the best results in accordance with the characteristics of the parameters measured for each stream.
A single optimized ANN architecture was used or manually selected neuron numbers to define ANN architecture in a similar study previously performed [5,6]. As a result of ANN analysis of data belonging to each stream, parameters such as determination coefficient and mean squared error were compared, and ANN architecture with the best performance was determined for each stream data.

Comparison of Artificial Neural Network (ANN) and Multiple Linear Regression (MLR) Analysis Results
The mean squared error (MSE) and the coefficient of determination (R 2 ) are two key parameters to evaluate the statistical models' success based on data.
MLR and ANN models were compared depends on the assessment according to R 2 and MSE (see Equations (4) and (5)).
DO pre. is the average value for the predicted DO, DO M is the mean value for the measured DO, and N stands for the observation number.
The results show that the ANN model is the best reliable model compared to the MLR model to predict the prospective concentration of dissolved oxygen (DO) in the streams of Lake Sapanca (see Tables 3 and 4).
A particular portion of the total input data was used as training data, and the remaining was used for the test of the model (see Table 4).

Discussion
The fact that DO is a critical parameter in determining the water quality and expresses the receiving water's pollution level makes this parameter necessary. Thus, it has been concluded that an artificial neural network (ANN) can establish a surface water pollution model and predict future pollution problems.
Multivariate statistical techniques such as multiple linear regression (MLR) and artificial neural network (ANN) assist in the analysis of complex data for a deeper understanding of the study region's water quality and ecological status [29]. These techniques allow identifying potential sources that affect water supplies and provide a valuable tool for reliable water resource management against the rapid urbanization pollution problem.
Simultaneously, determining the receiving environment's trophic condition is essential because the most crucial pollution potential for water bodies used as the drinking water source is organic pollution due to the domestic waste. The Lake's trophic state shows an irreversible eutrophic period approach if the necessary precautions are not taken.
Research findings should be taken into account while preparing to still use lake water for drinking purposes in the future in a sustainable way.
Currently, streams that feed Lake Sapanca, which is among the world's rare sources of drinking water, resemble a dump [7,30]. There is much waste in the stream water, from plastic bottles to metal waste, from used pesticide cans to household garbage, which reduces the water quality in the streams [31].
In addition to the Kocaeli and Sakarya municipalities, 27 water companies buy water from streams in Sapanca [15]. There is also a possibility of illegal use. There is no management plan for the water budget of Lake Sapanca. Preserving the water budget of Lake Sapanca is extremely important in terms of ecological sustainability balance.
Lake Sapanca is one of the rare potable lakes in the world. It is a vital wetland where tens of thousands of migratory birds stay every year [32]. It has been determined that due to the hunting of the 80 bird species native to the Lake, the plants consumed by these species are overgrowing over the Lake [33].
Recently, rapid urbanization and industrialization are considered one of the main threats to superficial water quality deterioration in the region. Therefore, thorough management strategies for preventing and solving pollution issues are necessary. However, there is multi-headed water management in the area.
If we want to save Lake Sapanca water, it is necessary to examine what has been done for the lakes used for drinking water, which have similar water quality problems in the world. It is common in the literature that similar solution methods are applied in different regions, and positive results are obtained. Several successful examples from the world lead to an upgrade in the water quality conditions [34][35][36][37][38].
Cantor [34] states that transporting water resources for urban development has not only allowed cities to develop, but this circulation of water has also transformed the rural areas from which the water is sourced. A similar situation also happened to the Lake Sapanca basin. Sapanca district has also been affected by the rapid urbanization in neighboring cities, Kocaeli and Sakarya. On the other hand, Cantor [34] also states that California's state constitution prohibits water waste. In other words, the water law in California prevents the waste of resources, unlike Sapanca. The water law for the protection of drinking water sources is mandatory and must be enacted immediately. The way to prevent rapid urbanization in the Lake Sapanca basin is to enact the water law. All stakeholders should come together, and the water law should be passed and put into practice with all the Parliament's deputies without delay.
Considering that Lake Sapanca is a shared resource used to meet the two cities' water needs, the non-priority rule is valid, as suggested by Meshel [35]. As a general principle of international traditional water law, fair and reasonable use is undisputed, but the important thing is to make this situation controllable by local laws.
According to Tarlock [36], water law in the United States of America (USA) encompasses a unique form of property law that grants usufruct to water resources: the right to use water instead of water source ownership. Today, developed countries have also initiated efforts to adapt existing water laws to climate change. Besides this, the effects of climate change on water resources have necessitated efforts to adapt to climate change. Craig [37] states that water law in the USA has become a tool for adaptation to climate change.
If water resources management on a basin basis becomes permanent with the water law, rapid urbanization will be prevented as water resources will cease to be an endless resource for cities. In developed countries, water resources are managed on a basin level. As Benidickson [38] points out, the Canadian Water Act's primary purpose is to expand water resources management to the watershed level. Basin-based water management will contribute to protecting water resources in our country as in many developed countries.
Smithers et al. [39] and Çomaklı et al. [40] argues that creating buffer zones around the Lake is also a required solution to prevent pollution. Empty spaces around Lake Sapanca should be densely afforested. With the water law to be enacted, the water budget should be preserved. Water above its annual capacity should not be drawn from the Lake. Sapanca water should only be used as drinking water. If the Lake is left alone apart from the human impact, it will renew itself in time.
The pollution situation revealed through measurement and data-driven statistics showed that the primary source of pollution in Lake Sapanca is rapid urbanization. The basin population has increased by twenty percent in the last decade [13]. The way to eliminate this situation is to protect the basin through the water law. With the principle of basin-based water resources management, water resource utilization can be limited, and the basin ceases to be attractive for rapid urbanization. This study's results point to the root of the issue and provide a scientific foundation for governors. Beyond developing ANN models like performed in previous studies [5,6], this study is critical in addressing the drinking water supply of two major cities and does not strictly attempt to present the effective ANN modeling process. The novelty of this research is that it not only effectively presents the pollution issue, as successfully performed in previous studies [5,7,8], but it also recommends a permanent management solution to the situation on Lake Sapanca, based on sustainable solutions in the literature.

Conclusions
Spatial and temporal water quality data is highly essential for water resources management. Modeling the water body's future conditions can be accomplished using a large quantity of available temporal data. Lake Sapanca is the most significant freshwater resource located in Adapazarı. Due to illegal settlement and rapid urbanization, it is observed that the population in the basin region of the Lake has increased significantly. Various pollution parameters have been studied to determine lake pollution status. This study attracts researchers who want to find a solution against the adverse effects of unplanned urbanization on water quality.
The purpose of using the neural network model was to reveal the relation within the leading parameter dissolved oxygen (DO) and all other pollution parameters: pH, Magnesium (Mg), Temperature (Temp), Chemical Oxygen Demand (COD), Orthophosphate (PO 4 ), Nitrite Nitrogen (NO 2 -N), and Nitrate Nitrogen (NO 3 -N), and conduct a simulation through this nonlinear model. ANN was used as a useful modeling tool for hydrological systems. MLR and ANN analyses show that ANN provides higher accuracy compared to the traditional statistical technique of MLR.
Furthermore, the evolution of the trophic status of the Lake has been observed. Annual total phosphorus (TP) loads were calculated. Depicted Vollenweider diagram indicates the Lake's trophic level. Findings reveal that there is a significant eutrophication risk for Lake Sapanca.
Regarding ecological sustainability, management efforts such as erosion control by afforestation and implementing buffer strips reduce total nutrient emissions. Improvement in treatment capacity in the Lake Sapanca basin mitigates the effects of rapid urbanization in the region. Hence, rehabilitation of infrastructure linked to treatment plants reduces the pollution in Lake Sapanca. Defining the features of developing a planning approach and determining the planning process will provide a landscape "planning-design" approach focused on the region's potential green infrastructure and natural resources protection. As required by the understanding of a sustainable environment, Lake Sapanca resources abuse should be prevented as soon as possible.
Although the studies on the water quality of the streams that feed the Lake in the Lake Sapanca Basin have been done in the previous years, these analyses have not been carried out regularly until now. The study has successfully modeled the dissolved oxygen parameter, a vital parameter for the Lake's habitat, using spatial and temporal data for an extended period, 21 years. Besides this, the problem of eutrophication, which has been a significant threat to the Lake recently due to excessive nutrient concentration, has been graphically presented in this research. All these are pollution problems that did not exist in the past years and are emerging today. The basin of this important Lake, which provides drinking water to two neighboring cities, Kocaeli and Sakarya, is on the eve of a pollution problem that is very difficult to make disappear due to rapid urbanization. This study's findings point out the source of the problem and provide a scientific basis for the authorities' decision-making organs such as municipalities. In this sense, our study differs from previous studies and provides a unique scientific contribution specifically for Lake Sapanca and lake basins that experience similar problems worldwide.